R Codeslibrary(ggplot2) TawiTawiGrowthRate <- as.numeric(c(2.6, 3.3, 5.9, 1.6, 1.8, 5.5, 5))CensalYear <- c("1948-1960","1960-1970","1970-1980","1980-1990","1990-1995","1995-2000","2000-2007") qplot(CensalYear, TawiTawiGrowthRate, xlab ...

An interview with Revolution Analytics CEO Dave Rich was published this week by BeyeNetwork. During the interview, Dace was asked about how the statistical modeling platforms have changed over the decades: People have been doing statistical modeling and predictive analytics for 50 years now, SAS and SPSS have been around since the early ‘70s. What’s different now -- what’s...

R user Diego Valle analyzed the rate of divorces in Mexican marriage since 1993 (the earliest date for which data are available) and found that not only have more marriages ended in divorce over time, but marriages that do end are ending sooner: This chart is a bit complicated, but it bears close inspection. Each line you see is...

In this fourth part of my series of criticisms of NHST, I’m going to focus on broad

Network science is potentially useful for certain problems in data analysis, and I know close to nothing about it. In this short post I present my first attempt at network analysis: A minimal example to construct and visualize an artificial undirected network with community structure in R. No network libraries are loaded. Only basic R-functions

This is a quick set of analyses of the California Test Score dataset. The post was produced using R Markdown in RStudio 0.96. The main purpose of this post is to provide a case study of using R Markdown to prepare a quick reproducible report. It provides examples of using plots, output, in-line R code, and...

I had been meaning to start toying with the igraph package for a while. So a few weeks ago (lay off, I'm busy), I decided to grab a bunch of CRAN data about package dependencies. The easiest way that I could think to get this information was to just grab the html files for all the package descriptions and...

R user Arthur Charpentier attempts to use the raster library and R functions to find Waldo in a "Where's Waldo" image: Sadly, it turned out that Waldo was a bit too tricky to spot using these techniques. But Arthur did have more success identifying the US flag in a shot from the Apollo mission, and identifying answers in the...

Sameer Chopra, vice president of Advanced Analytics at Orbitz Worldwide, wrote recently in Analytics magazine about the changing landscape of processes, software and systems for statistical modelers. In a section on "Big Data and Open Source Analytics", Chopra lays out the reasons why the R language "has become the data-mining tool of choice for machine learners": R has very...

In my ongoing quest to webappify various R scripts I discovered that rApache cannot load any R packages that depend on rJava. For several of the scripts that I've written that grab data out of MS Excel files, and therein use the xlsx package, thi...

I mentioned in a previous post that our team at the recent Hack/Reduce hackathon had some fun with a data set which consisted of Bixi station states at minute level temporal resolution. In addition to pulling out and plotting the flux at each station on an hourly basis, we also plotted the system state (number

There has been a lot of chatter during the past week on HN generated by with Jeff Atwood's "Please don't learn to code". Actual posts included:Please don't learn to code (www.codinghorror.com) Please Don't Become Anything, Especially Not A P...

All models are wrong, some models are more wrong than others. The streetlight model Exponential decay models are quite common. But why? One reason a model might be popular is that it contains a reasonable approximation to the mechanism that generates the data. That is seriously unlikely in this case. When it is dark and … Continue reading...

This post examines the features of R Markdown using knitr in Rstudio 0.96. This combination of tools provides an exciting improvement in usability for reproducible analysis. Specifically, this post (1) discusses getting started with R Markdown and knitr in Rstudio 0.96; (2) provides a basic example of producing console output and plots...

As part of an on-going paper with Kerrie Mengersen and Pierre Pudlo, we are using a GARCH(1,1) model as a target. Thus, the model is of the form which is a somehow puzzling object: the latent (variance) part is deterministic and can be reconstructed exactly given the series and the parameters. However, estimation is not