cumplyr: Extending the plyr Package to Handle Cross-Dependencies

May 3, 2012
By

Introduction For me, Hadley Wickham‘s reshape and plyr packages are invaluable because they encapsulate omnipresent design patterns in statistical computing: reshape handles switching between the different possible representations of the same underlying data, while plyr automates what Hadley calls the Split-Apply-Combine strategy, in which you split up your data into several subsets, perform some computation

Read more »

Google Translate for code, and an R help-list bot

May 3, 2012
By

What we did in our Stan meeting yesterday: Some discussion of revision of the Nuts paper, some conversations about parameterizations of categorical-data models, plans for the R interface, blah blah blah. But also, I had two exciting new ideas! Google Translate for code Wouldn’t it be great if Google Translate could work on computer languages? The post Google...

Read more »

How to plot three categorical variables and one continuous variable using ggplot2

May 3, 2012
By
How to plot three categorical variables and one continuous variable using ggplot2

This post shows how to produce a plot involving three categorical variables and one continuous variable using ggplot2 in R. The following code is also available as a gist on github. 1. Create Data First, let's load ggplot2 and create some data to work...

Read more »

An ivreg2 function for R

May 3, 2012
By
An ivreg2 function for R

The ivreg2 command is one of the most popular routines in Stata. The reason for this popularity is its simplicity. A one-line ivreg2 command generates not only the instrumental variable regression coefficients and their standard errors, but also a number of other statistics of interest. I have come across a number of functions in R

Read more »

reshape (from base) Explained: Part I

May 2, 2012
By
reshape (from base) Explained: Part I

This Post Will Explain the Basics of Wide to Long With base reshape (part I) Often your data set is in wide format and some sort of analysis or visualization requires putting the data set into long format.  Hadely Wickham … Continue reading →

Read more »

Yes, you need more than just R for Big Data Analytics

May 2, 2012
By

Douglas Merrill, former CIO/VP of Engineering at Google, writes in Forbes about using the R language for data analysis: Most folks with math-oriented graduate degrees will have written something in R, a non-commercial option for your big data analysis. So, great graduates from great graduate schools know great tools. His post is titled 'R Is Not Enough For "Big...

Read more »

Doodling With a Conversation, or Retweet, Data Sketch Around LAK12

May 2, 2012
By
Doodling With a Conversation, or Retweet, Data Sketch Around LAK12

How can we represent conversations between a small sample of users, such as the email or SMS converstations between James Murdoch’s political lobbiest and a Government minister’s special adviser (Leveson inquiry evidence), or the pattern of retweet activity around a couple of heavily retweeted individuals using a particular hashtag? I spent a bit of time

Read more »

knitr-Example: Use World Bank Data to Generate Report for Threatened Bird Species

May 2, 2012
By
knitr-Example: Use World Bank Data to Generate Report for Threatened Bird Species

I'll use the below script that retrieves data for threatened bird species from the World Bank via its API and does some processing, plotting and analysis. There is a package (WDI) that allows you to access the data easily.# world bank indicators for sp...

Read more »

EU rules that computer languages cannot be copyrighted

May 2, 2012
By
EU rules that computer languages cannot be copyrighted

The European Court of Justice has published its decision in SAS v WPL; the title of the press release says it all “The functionality of a computer program and the programming language cannot be protected by copyright”. To summarise the background, World Programming Ltd developed a system that was capable of emulating the input/output behavior

Read more »

Big Data, R and SAP HANA: Analyze 200 Million Data Points and Later Visualize in HTML5 Using D3 – Part III

May 2, 2012
By
Big Data, R and SAP HANA: Analyze 200 Million Data Points and Later Visualize in HTML5 Using D3 – Part III

Mash-up Airlines Performance Data with Historical Weather Data to Pinpoint Weather Related DelaysFor this exercise, I combined following four separate blogs that I did on BigData, R and SAP HANA.  Historical airlines and weather dat...

Read more »

Function to Generate a Random Data Set

May 2, 2012
By
Function to Generate a Random Data Set

Often I find myself needing data sets to try functions and code out on or for teaching purposes.  I have a few stand-bys such as the mtcars and CO2 data sets in the base packages of R but sometimes I … Continue reading →

Read more »

Finding Earth II

May 2, 2012
By
Finding Earth II

By 2030, we will have found approximately 10,000 exoplanets. "If it is just us... seems like an awful waste of space." -- from the movie Contact (1997) based on the book Contact by Carl Sagan. By the year 2030, it's possible that over ten th...

Read more »

Computational Journalism Server – The Way Forward

May 2, 2012
By

As I’ve noted here, the Computational Journalism Server “wants to be a Platform-as-a-Service (PaaS) when it grows up.” In plotting the way forward to that goal, I’ve looked at three options: Remain on openSUSE / SUSE Studio and ...

Read more »

Speeding up R with Intel’s Math Kernel Library (MKL)

May 2, 2012
By

I did some comparisons of the generic BLAS with Intel's MKL (both sequential and parallel) on a Dell PowerEdge 610 server with dual hyperthreading 6-core 3.06GHz Xeon X5675 processors.  Here are the results from an R benchmarking script (Normal R indicates the generic BLAS,  sMKL is the sequential (single core Intel MKL, and pMKL is the parallel Intel MKL using...

Read more »

Speeding up R with Intel’s Math Kernel Library (MKL)

May 2, 2012
By
Speeding up R with Intel’s Math Kernel Library (MKL)

I did some comparisons of the generic BLAS with Intel's MKL (both sequential and parallel) on a Dell PowerEdge 610 server with dual hyperthreading 6-core 3.06GHz Xeon X5675 processors.  Here are the results from an R benchmarking script (Normal R ...

Read more »

2nd round of call for chapter proposals for book Data Mining Applications with R: due by 31 May

May 2, 2012
By
2nd round of call for chapter proposals for book Data Mining Applications with R: due by 31 May

2nd CALL FOR CHAPTERS: proposals due by 31 May 2012 Data Mining Applications with R A book to be published by Elsevier http://www.RDataMining.com/books/book2 Introduction —————— R is one of the most widely used data mining tools in scientific and business … Continue reading →

Read more »

Measuring time series characteristics

May 2, 2012
By
Measuring time series characteristics

A few years ago, I was working on a project where we measured various characteristics of a time series and used the information to determine what forecasting method to apply or how to cluster the time series into meaningful groups. The two main papers to come out of that project were: Wang, Smith and Hyndman (2006) Characteristic-​​based clustering for...

Read more »

Next Kölner R User Meeting: 6 July 2012

May 1, 2012
By
Next Kölner R User Meeting: 6 July 2012

The next Cologne R user group meeting is scheduled for 6 July 2012. All details are available on the new KölnRUG Meetup site. Please sign up if you would like to come along, and notice that there is also pub poll for the after "work" drinks. Notes fr...

Read more »

A gallery view for Craigslist

May 1, 2012
By
A gallery view for Craigslist

As much as I love Craigslist, I sometimes find the interface a bit limited. My biggest wish? That there was an option for showing the search results as an image gallery, like eBay has. This could prove quite useful for browsing things like antiques,...

Read more »

Mining for relations between nominal variables

May 1, 2012
By
Mining for relations between nominal variables

The task today was to find what variables had significant relations with an important grouping variable in the big dataset I’ve been working with lately.  The grouping variable has 3 levels, and represents different behaviours of interest.  At first I … Continue reading →

Read more »

Playing with knitr: Create Report with Dynamic List

May 1, 2012
By
Playing with knitr: Create Report with Dynamic List

Here is a little toy example using knitr, LaTeX/MiKTeX and Google Docs.Say you had a list on Google Docs (say a list of attendants) and you want to print a report with it..Then see this example using this Rnw-file and the output...make the tex-file wit...

Read more »

Google BigQuery and the Github Data Challenge

May 1, 2012
By

Github has made data on its code repositories, developer updates, forks etc. from the public GitHub timeline available for analysis, and is offering prizes for the most interesting visualization of the data. Sounds like a great challenge for R programmers! The R language is currently the 26th most popular on GitHub (up from #29 in December), and it would...

Read more »

New R User Group in Cologne, Germany

May 1, 2012
By

The latest local R user group to join the fold is the Köln R User Group, now the sixth R user group in Germany. Their first group meeting will be on July 6, with presentations on ANOVA, ggplot2 graphics in Deducer, and writing R code with Emacs's Org-mode. If you're in the Cologne area, this would be a great...

Read more »

NSF BIGDATA webinar

May 1, 2012
By

If you're doing any kind of big data analysis - genomics, transcriptomics, proteomics, bioinformatics - then unless you've been on vacation the last few weeks you've no doubt heard about the NSF/NIH BIGDATA  Initiative (here's the NSF solicitation...

Read more »

Quick Tip: Replace Values in Dataframe on Condition with Random Numbers

May 1, 2012
By

This one took me some time - though, in fact it is plain simple:> options(scipen=999)> (my_df X1 X2 X3 X4 X5 X6 X7 X8 X9 X101 0 0 1 0 1 1 1 1 0 12 0 0 1 ...

Read more »

What does this package look like?

May 1, 2012
By
What does this package look like?

In this post, I give a very simple trick to understand the way a package is organized, which functions are included in and how these functions depend from each others. The idea has been brought by one of my student, Soraya, who is currently working in a very hostile environment, surrounded by true geeks. However,

Read more »

A Warning About warning()

May 1, 2012
By

Avoid R’s warning feature. This is particularly important if you use R in production; when you regularly run R scripts as part of your business process. This is also important if you author R packages. Don’t issue warnings in your own co...

Read more »

Monitoring some statistics with "R"

May 1, 2012
By
Monitoring some statistics with "R"

I´ve been practicing after reading a couple of tutorials:R: A self-learn tutorialProgramming in Rto create a basic function  to monitor some  basic statistics as RMSEP, Bias, SEP, Correlation and RSQ. I´ve been doing this with other so...

Read more »

How to Make HTML5 Slides with knitr

May 1, 2012
By
How to Make HTML5 Slides with knitr

One week ago I made an early announcement about the markdown support in the knitr package and RStudio, and now the version 0.5 of knitr is on CRAN, so I'm back to show you how I made the HTML5 slides. For those who are not familiar with markdown, you m...

Read more »