The main reason why I have usually chosen to use excel to make my plots at work is because I had difficulty feeding the summary stats in R into a plotting function. One thing I learned this week is how … Continue reading →

Things have been going wild since I opened this blog. Tasks were piled up while I was tight on time. At present, I’m facing a major challenge in my life. However, I decide to spare some time for self-improvements. R … Continue reading →

Few weeks ago GitHub announced, that its timeline data is available on bigquery for analysis. Moreover, it offers prizes for the best visualization of the data. Despite my art skills and minimal chances to win beauty contest, I decided to crunch GitHub data and run data analysis. After initial trial of bigquery service, I found hard

A very useful way of keeping up with blogs in a particular area is to subscribe to a blog aggregator. These will syndicate posts from a large number of blogs and provide links back to the original sources. So you only need to subscribe once to get all the good stuff in that area. There are now several blog...

We had a great Twitter conversation last Thursday on the use of big-data analytics, Revolution R Enterprise, and IBM Netezza in the search for a cure for MS. Many thanks to the other panelists: Murali Ramanathan (SUNY Buffalo), Tim Coetzee (National MS Society) and moderator Shawn Dolley (IBM) for fielding and answering questions from interested parties following #IBMDataChat. As...

Looking to learn R, or to expand your R skills for data visualization or package development? Here are some R courses presented by the experts you may be interested in: June 19-20: Visualization in R with ggplot2. This course presented by Garrett Grolemund & Dr. Winston Chang of Rice University is also a web-based course with live presentation. This...

A recent arXiv posting of the paper “On the Generalized Ratio of Uniforms as a Combination of Transformed Rejection and Extended Inverse of Density Sampling” by Martino, Luengo, and Míguez from Madrid rekindled my interest in this rather peculiar simulation method. The ratio of uniforms samples uniformly on the subgraph to produce simulations from p

Today a new version of RStudio (v0.96) is available for download from our website. The main focus of this release is improved tools for authoring, reproducible research, and web publishing. This means lots of new Sweave features as well as tight integration with the knitr package (including support for creating dynamic web reports with the

Introduction Continuing on with my series on the weaknesses of NHST, I’d like to focus on an issue that’s not specific to NHST, but rather one that’s relevant to all quantitative analysis: the destruction caused by an inappropriate reduction of dimensionality. In our case, we’ll be concerned with the loss of essential information caused by

In example 9.30 we explored the effects of adjusting for multiple testing using the Bonferroni and Benjamini-Hochberg (or false discovery rate, FDR) procedures. At the time we claimed that it would probably be inappropriate to extract the adjusted p-values from the FDR method from their context. In this entry we attempt to explain our misgivings about...

Bias in Federal Reserve Inflation Forecasts: Christopher Gandrud uses ggplot2 to visualize potential partisan bias in US Federal Reserve inflation forecasts as a PhD student at the London School of Economics.

Finally, I got round to find some time to work out all the problems in compiling the BCEA (Bayesian Cost-Effectiveness Analysis) package.I developed it as part of the work for the book. In a nutshell, what it does is the following: first, you need to s...

A Poisson process provides a good model for events that happen rarely. That's what von Bortkiewicz realized in 1898 when he modeled deaths by horse kick in Prussian cavalry; since it would be ungentlemanly to actually kill my readers, I instead represent the events in a Poisson process using a horse's whinny.

Autocorrelation of a time series can be useful for prediction because the most recent observation of the prediction target contains information about future values. At the same time autocorrelation can play tricks on you because many standard statistical methods implicitely assume independence of measurements at different times. The correlation coefficient between two variable and has

Next week I’ll present a glimpse of R and ggplot2 graphics at VUW. This is a MESA seminar on ‘Data analysis and plotting with free and open source tools’ where we’ll present spreadsheet alternatives based on gnuplot, Python, an...

My own version of bubble plot (part 1)During one of my projects, I found myself in need of visualizing more than 3 dimensions at once. Three-dimensional graphs are not a good solution, usually - they will need to be properly oriented, for a start, ad that's tricky.So, I started looking at bubble plots. The size of the bubble can...

First Post: Welcome to this new blog!!!It's been almost one years that I've started using R as my main programming/analysis tool. I like the fact that so many beautiful graphics can be produced directly within R.Although I often just use the basic func...

Neat demo real of d3 (js & svg powered interactive graphics in the browser). Hopefully there will be ggplot2 integration one day!

Some of us learn easily from the written word, but for most of us some visualization speeds up the process and generally helps with retention as well. With that in mind I was delighted to see this nice list of free videos that demonstrate the use of R, posted on Ethan Fosse's blog, "Culture, Statistics, and...

Introduction This is my second post in a series describing the weaknesses of the NHST paradigm. In the first post, I argued that NHST is a dangerous tool for a community of researchers because p-values cannot be interpreted properly without perfect knowledge of the research practices of other scientists — knowledge that we cannot hope

The Foreign Language of 'Mad Men': ggplot2 in the Atlantic

*** Call for Late-breaking Posters *** Abstracts may be submitted for posters presenting recent developments and late-breaking applications of R, on topics as indicated in the earlier call for abstracts: http://biostat.mc.vanderbilt.edu/UseR-2012#Call_for_Abstracts_and_Tutorial Late-breaking posters will be displayed during the poster session alongside regular posters, and they will appear in the electronically published book of abstracts for the conference. However, these...

Being freshly elected ASA Fellow (yay!), I just received the list of 2012 ASA Fellows. Among whose, let me mention Sudipto Banerjee, University of Minnesota, Minneapolis, Minnesota, elected “For theoretical, methodological and applied research in spatiotemporal statistical modeling, especially as applied to problems in environmetrics, ecology, occupational health, agriculture and economics, for professional work at

R is a statistical programming language and can be a little scary at first. I learned it during my first statistics class. While others used Stata, I decided to try if I could do the tasks in R. That was probably one of my best research-choices. My main source of knowledge was Quick-R that's an excellent resource. It...