My experiences with Rcpp

May 18, 2012
By
My experiences with Rcpp

The last seven days till Tuesday I have been working on the conversion of the code of my master thesis from scripted R (statistics) to compiled C++ using the Rcpp package from Dirk Eddelbuettel. Despite the initial effort necessary to … Continue reading →

Read more »

R is to SAS as Java is to COBOL

May 18, 2012
By

An interview with Revolution Analytics CEO Dave Rich was published this week by BeyeNetwork. During the interview, Dace was asked about how the statistical modeling platforms have changed over the decades: People have been doing statistical modeling and predictive analytics for 50 years now, SAS and SPSS have been around since the early ‘70s. What’s different now -- what’s...

Read more »

In Mexico, more marriages ending in divorce, and sooner

May 18, 2012
By
In Mexico, more marriages ending in divorce, and sooner

R user Diego Valle analyzed the rate of divorces in Mexican marriage since 1993 (the earliest date for which data are available) and found that not only have more marriages ended in divorce over time, but marriages that do end are ending sooner: This chart is a bit complicated, but it bears close inspection. Each line you see is...

Read more »

Non transitivity of correlation for random vectors in dimension 3

May 18, 2012
By
Non transitivity of correlation for random vectors in dimension 3

Dependence in dimension 2 is difficult. But one has to admit that dimension 2 is way more simple than dimension 3 ! I recently rediscovered a nice paper, Langford, Schwertman & Owens (2001), on transitivity of the property of being positively c...

Read more »

Criticism 4 of NHST: No Mechanism for Producing Substantive Cumulative Knowledge

May 18, 2012
By

In this fourth part of my series of criticisms of NHST, I’m going to focus on broad

Read more »

Should I be nice?

May 18, 2012
By
Should I be nice?

I got the following email. Subject: i have a question? Date: May 18, 2012 7:57:56 AM CDT how can i enter the data of QTL analysis. That was the whole thing. I presume that the writer wishes to use my R/qtl software. I could probably respond helpfully (for example, “See the sample data files and

Read more »

A minimal network example in R

May 18, 2012
By
A minimal network example in R

Network science is potentially useful for certain problems in data analysis, and I know close to nothing about it. In this short post I present my first attempt at network analysis: A minimal example to construct and visualize an artificial undirected network with community structure in R. No network libraries are loaded. Only basic R-functions

Read more »

Example Reproducible Report using R Markdown: Analysis of California Schools Test Data

May 18, 2012
By
Example Reproducible Report using R Markdown:  Analysis of California Schools Test Data

This is a quick set of analyses of the California Test Score dataset. The post was produced using R Markdown in RStudio 0.96. The main purpose of this post is to provide a case study of using R Markdown to prepare a quick reproducible report. It provides examples of using plots, output, in-line R code, and...

Read more »

Proportion of marriages ending in divorce

May 17, 2012
By
Proportion of marriages ending in divorce

Over the last two decades families in Mexico have undergone rapid social changes. The proportion of marriages ending in divorce has risen for each cohort since data became available, this is independant of the recently approved express divorce law in...

Read more »

Visualizing the CRAN: Graphing Package Dependencies

May 17, 2012
By
Visualizing the CRAN:  Graphing Package Dependencies

I had been meaning to start toying with the igraph package for a while. So a few weeks ago (lay off, I'm busy), I decided to grab a bunch of CRAN data about package dependencies. The easiest way that I could think to get this information was to just grab the html files for all the package descriptions and...

Read more »

Bar Graph Colours That Work Well

May 17, 2012
By
Bar Graph Colours That Work Well

Ever since I started using ggplot2 more often at work in order to do graphs, I’ve realized something about the use of colour in bar graphs vs. dot plots: When I’m looking at a graph displayed on the brilliant Viewsonic … Continue reading →

Read more »

Monitor: Removing zero values from the data set.

May 17, 2012
By
Monitor: Removing zero values from the data set.

I continue developing the Monitor function. This time a video from  "r twotorials”: “how to access different records within a data frame by using logical tests in r”, gave me the idea to remove the zero values from a data set.When somebody g...

Read more »

Where’s Waldo? Image Analysis in R

May 17, 2012
By
Where’s Waldo? Image Analysis in R

R user Arthur Charpentier attempts to use the raster library and R functions to find Waldo in a "Where's Waldo" image: Sadly, it turned out that Waldo was a bit too tricky to spot using these techniques. But Arthur did have more success identifying the US flag in a shot from the Apollo mission, and identifying answers in the...

Read more »

Emulating local static variables in R

May 17, 2012
By
Emulating local static variables in R

Recently I was writing a code allowing to plot multiple ggplot2 plots on one page. I wanted to replicate standard behavior of  plot  function that plots graphs in sequence according to  mfrow/ mfcol option in par...

Read more »

Orbitz: R has become the data-mining tool of choice

May 17, 2012
By

Sameer Chopra, vice president of Advanced Analytics at Orbitz Worldwide, wrote recently in Analytics magazine about the changing landscape of processes, software and systems for statistical modelers. In a section on "Big Data and Open Source Analytics", Chopra lays out the reasons why the R language "has become the data-mining tool of choice for machine learners": R has very...

Read more »

Github Follower Graph with R

May 17, 2012
By
Github Follower Graph with R

Graph a github user's followers (and follower's followers).Each programming language tends to develop its own idiomatic set of data structures.  In R, data frames are often the structure of choice.  JSON (a subset of Javascript) has emerged a...

Read more »

Excel Import into R without rJava

May 17, 2012
By

In my ongoing quest to webappify various R scripts I discovered that rApache cannot load any R packages that depend on rJava.  For several of the scripts that I've written that grab data out of MS Excel files, and therein use the xlsx package, thi...

Read more »

More Bixi Data Visualization

May 17, 2012
By
More Bixi Data Visualization

I mentioned in a previous post that our team at the recent Hack/Reduce hackathon had some fun with a data set which consisted of Bixi station states at minute level temporal resolution. In addition to pulling out and plotting the flux at each station on an hourly basis, we also plotted the system state (number

Read more »

Please Learn to Read

May 17, 2012
By

There has been a lot of chatter during the past week on HN generated by with Jeff Atwood's "Please don't learn to code".  Actual posts included:Please don't learn to code (www.codinghorror.com) Please Don't Become Anything, Especially Not A P...

Read more »

Reproducible research with markdown, knitr and pandoc

May 17, 2012
By
Reproducible research with markdown, knitr and pandoc

Over the last few weeks I was trying to optimise my workflow using markdow in combination with knitr and pandoc. Knitr is a grea new package by Yihui, expanding R’s capabilities for reproducible research. I will illustrate my work flow … Continue reading →

Read more »

R’s increasing popularity. Should we care?

May 17, 2012
By
R’s increasing popularity. Should we care?

Some people will say ‘you have to learn R if you want to get a job doing statistics/data science’. I say bullshit, you have to learn statistics and learn to work in a variety of languages if you want to … Continue reading →

Read more »

Exponential decay models

May 17, 2012
By
Exponential decay models

All models are wrong, some models are more wrong than others. The streetlight model Exponential decay models are quite common.  But why? One reason a model might be popular is that it contains a reasonable approximation to the mechanism that generates the data.  That is seriously unlikely in this case. When it is dark and … Continue reading...

Read more »

Sleep – Part I

May 16, 2012
By
Sleep – Part I

Yes, that first night was incredibly rough, thanks for asking.

Read more »

Getting Started with R Markdown, knitr, and Rstudio 0.96

May 16, 2012
By
Getting Started with R Markdown, knitr, and Rstudio 0.96

This post examines the features of R Markdown using knitr in Rstudio 0.96. This combination of tools provides an exciting improvement in usability for reproducible analysis. Specifically, this post (1) discusses getting started with R Markdown and knitr in Rstudio 0.96; (2) provides a basic example of producing console output and plots...

Read more »

Population of Iligan City from 1970 to 2010

May 16, 2012
By
Population of Iligan City from 1970 to 2010

R Codeslibrary(ggplot2)library(grDevices)IliganCity <- c(104493, 118778, 167358, 226568, 273004, 285061, 308046, 322821)CensalYear <- c("1970", "1975", "1980", "1990", "1995", "2000", "2007", "2010")qplot(CensalYear, IliganCity, xlab = expression...

Read more »

An Example of Social Network Analysis with R using Package igraph

May 16, 2012
By
An Example of Social Network Analysis with R using Package igraph

by Yanchang Zhao, RDataMining.com This post presents an example of social network analysis with R using package igraph. The data to analyze is Twitter text data of @RDataMining used in the example of Text Mining, and it can be downloaded … Continue reading →

Read more »

garch() uncertainty

May 16, 2012
By
garch() uncertainty

As part of an on-going paper with Kerrie Mengersen and Pierre Pudlo, we are using a GARCH(1,1) model as a target. Thus, the model is of the form which is a somehow puzzling object: the latent (variance) part is deterministic and can be reconstructed exactly given the series and the parameters. However, estimation is not

Read more »

Update: Parameters as Population Quantities

May 16, 2012
By

Some time ago, I had an ineloquent and less-than-cordial online discussion with a commenter on this site, partially about how statisticians define the term "parameter". This post is just to quote a relevant passage from "Bootstrap Methods and Their Application", by Davison and Hinkley (1997), that better articulates a point I had made earlier. 2.1.1

Read more »

Global Homicide Rates by Government Type

May 16, 2012
By
Global Homicide Rates by Government Type

Surprising results For purposes of this article, any mention of homicide rates refers to reported homicide rates. Open vs Closed In mostly open countries (full democracies), the homicide rates are rather low when compared to other types of...

Read more »