Compiling R code, and speed up your computation

May 5, 2012
By

I just ran into this interesting post on the R-bloggers Planet. The described R functionality allows you to compile R code (to byte code) so that it will no longer be interpreted but actually run. That is a performance boost. I guess in due time we will see R use JIT technologies, so that the difference will...

Read more »

Kaplan-Meier Survival plot – with at risk table, by sub groups

May 4, 2012
By
Kaplan-Meier Survival plot – with at risk table, by sub groups

This is a follow on from the previous post, with updated code. There was an argument ‘groups’ in the ggplot(…) line of the code that was working but is now no longer working with the updated version of R/ggplot2 (I … Continue reading →

Read more »

Correlations, dimension, and risk measure

May 4, 2012
By
Correlations, dimension, and risk measure

Yesterday, while I was attending the IFM2 conference, at HEC Montreal, I heard a nice talk about credit risk, and a comparison between contagion (or at least default correlation), for corporate and retail companies (in the US). And it was mentioned...

Read more »

R, now a major programming language, sees a 127% growth in book sales

May 4, 2012
By
R, now a major programming language, sees a 127% growth in book sales

O'Reilly Radar tracks technology adoption via its annual "State of the Computer Book Market" report. In the latest report of 2011 book sales (amongst all publishers), books about R show a 127% increase in 2011 over 2010: As a language specifically for data analysis, R isn't in the same league in raw sales as general-purpose languages like C++ or...

Read more »

Getting R2WinBUGS to talk to WinBUGS 1.4 on Ubuntu 12.04 LTS

May 4, 2012
By

Disclaimer 1: WinBUGS is old and not maintained. There are other packages to use, if you would like to take advantage of more modern developments in MCMC such as: PyMC which transparently implements adaptive Metropolis-Hastings proposals (among other great features), or the LaplacesDemon R package, which … Continue reading →

Read more »

Practicing Script with “ R”: Monitor

May 4, 2012
By
Practicing Script with “ R”: Monitor

These are  samples analyzed by a reference method (column: Protein) and by an analytical method with a certain model (column: IFTpro). The idea is to create a Monitor Report for some basic statistics (RMSEP, Bias, SEP, R,RSQ) to see how ...

Read more »

Zurich, May 2012 – ZurichR Meeting

May 3, 2012
By

(This article was first published on Rmetrics blogs, and kindly contributed to R-bloggers) To leave a comment for the author, please follow the link and comment on their blog: Rmetrics blogs. R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave,...

Read more »

Ack! Duplicates in the Data!

May 3, 2012
By
Ack!  Duplicates in the Data!

As I mentioned in a previous post, I compiled the data set that I’m currently working on in PostgreSQL.  To get this massive data set, I had to write a query that was massive by dint of the number of … Continue reading →

Read more »

R Tutorials and Learning Materials

May 3, 2012
By

We are getting ready to host an R bootcamp this summer at work and I am looking at building on materials that already exist. I just wanted to list a few here while I figure out the best ways to incorporate them. Video Tutorials:This is a fairly ne...

Read more »

Big Data Analytics with R and Hadoop

May 3, 2012
By

The open-source RHadoop project makes it easier to extract data from Hadoop for analysis with R, and to run R within the nodes of the Hadoop cluster -- essentially, to transform Hadoop into a massively-parallel statistical computing cluster based on R. In yesterday's webinar (the replay of which is embedded below), Data scientist and RHadoop project lead Antonio Piccolboni...

Read more »

what’s wrong with package comment?!

May 3, 2012
By
what’s wrong with package comment?!

I spent most of the Sunday afternoon trying to understand why defining did not have the same effect as writing the line until I found there is a clash due to the comment package… The assuredly simple code produces an error message: This is quite an inconvenience as I need to compile my solution manual

Read more »

RegEx: Named Capture in R

May 3, 2012
By

I consider myself a decent RegEx user.  References to famous quotes about RegEx aside, I find it intuitive, like its speed and that it makes my code simple (more so than the alternative anyhow). Thus, I use RegEx where I can in the growing grab bag of languages I consider myself proficient in: *nix command line / shell scripts Javascript PHP Matlab Python R Now...

Read more »

Theme Elements in ggplot2

May 3, 2012
By

This website provides a simple summary of the theme elements that can be set within ggplot2. There should be sufficient information here to change the default settings for graphs within the ggplot2 package.

Read more »

cumplyr: Extending the plyr Package to Handle Cross-Dependencies

May 3, 2012
By

Introduction For me, Hadley Wickham‘s reshape and plyr packages are invaluable because they encapsulate omnipresent design patterns in statistical computing: reshape handles switching between the different possible representations of the same underlying data, while plyr automates what Hadley calls the Split-Apply-Combine strategy, in which you split up your data into several subsets, perform some computation

Read more »

Google Translate for code, and an R help-list bot

May 3, 2012
By

What we did in our Stan meeting yesterday: Some discussion of revision of the Nuts paper, some conversations about parameterizations of categorical-data models, plans for the R interface, blah blah blah. But also, I had two exciting new ideas! Google Translate for code Wouldn’t it be great if Google Translate could work on computer languages? The post Google...

Read more »

How to plot three categorical variables and one continuous variable using ggplot2

May 3, 2012
By
How to plot three categorical variables and one continuous variable using ggplot2

This post shows how to produce a plot involving three categorical variables and one continuous variable using ggplot2 in R. The following code is also available as a gist on github. 1. Create Data First, let's load ggplot2 and create some data to work...

Read more »

An ivreg2 function for R

May 3, 2012
By
An ivreg2 function for R

The ivreg2 command is one of the most popular routines in Stata. The reason for this popularity is its simplicity. A one-line ivreg2 command generates not only the instrumental variable regression coefficients and their standard errors, but also a number of other statistics of interest. I have come across a number of functions in R

Read more »

reshape (from base) Explained: Part I

May 2, 2012
By
reshape (from base) Explained: Part I

This Post Will Explain the Basics of Wide to Long With base reshape (part I) Often your data set is in wide format and some sort of analysis or visualization requires putting the data set into long format.  Hadely Wickham … Continue reading →

Read more »

Yes, you need more than just R for Big Data Analytics

May 2, 2012
By

Douglas Merrill, former CIO/VP of Engineering at Google, writes in Forbes about using the R language for data analysis: Most folks with math-oriented graduate degrees will have written something in R, a non-commercial option for your big data analysis. So, great graduates from great graduate schools know great tools. His post is titled 'R Is Not Enough For "Big...

Read more »

Doodling With a Conversation, or Retweet, Data Sketch Around LAK12

May 2, 2012
By
Doodling With a Conversation, or Retweet, Data Sketch Around LAK12

How can we represent conversations between a small sample of users, such as the email or SMS converstations between James Murdoch’s political lobbiest and a Government minister’s special adviser (Leveson inquiry evidence), or the pattern of retweet activity around a couple of heavily retweeted individuals using a particular hashtag? I spent a bit of time

Read more »

knitr-Example: Use World Bank Data to Generate Report for Threatened Bird Species

May 2, 2012
By
knitr-Example: Use World Bank Data to Generate Report for Threatened Bird Species

I'll use the below script that retrieves data for threatened bird species from the World Bank via its API and does some processing, plotting and analysis. There is a package (WDI) that allows you to access the data easily.# world bank indicators for sp...

Read more »

EU rules that computer languages cannot be copyrighted

May 2, 2012
By
EU rules that computer languages cannot be copyrighted

The European Court of Justice has published its decision in SAS v WPL; the title of the press release says it all “The functionality of a computer program and the programming language cannot be protected by copyright”. To summarise the background, World Programming Ltd developed a system that was capable of emulating the input/output behavior

Read more »

Big Data, R and SAP HANA: Analyze 200 Million Data Points and Later Visualize in HTML5 Using D3 – Part III

May 2, 2012
By
Big Data, R and SAP HANA: Analyze 200 Million Data Points and Later Visualize in HTML5 Using D3 – Part III

Mash-up Airlines Performance Data with Historical Weather Data to Pinpoint Weather Related DelaysFor this exercise, I combined following four separate blogs that I did on BigData, R and SAP HANA.  Historical airlines and weather dat...

Read more »

Function to Generate a Random Data Set

May 2, 2012
By
Function to Generate a Random Data Set

Often I find myself needing data sets to try functions and code out on or for teaching purposes.  I have a few stand-bys such as the mtcars and CO2 data sets in the base packages of R but sometimes I … Continue reading →

Read more »

Finding Earth II

May 2, 2012
By
Finding Earth II

By 2030, we will have found approximately 10,000 exoplanets. "If it is just us... seems like an awful waste of space." -- from the movie Contact (1997) based on the book Contact by Carl Sagan. By the year 2030, it's possible that over ten th...

Read more »

Computational Journalism Server – The Way Forward

May 2, 2012
By

As I’ve noted here, the Computational Journalism Server “wants to be a Platform-as-a-Service (PaaS) when it grows up.” In plotting the way forward to that goal, I’ve looked at three options: Remain on openSUSE / SUSE Studio and ...

Read more »

Speeding up R with Intel’s Math Kernel Library (MKL)

May 2, 2012
By

I did some comparisons of the generic BLAS with Intel's MKL (both sequential and parallel) on a Dell PowerEdge 610 server with dual hyperthreading 6-core 3.06GHz Xeon X5675 processors.  Here are the results from an R benchmarking script (Normal R indicates the generic BLAS,  sMKL is the sequential (single core Intel MKL, and pMKL is the parallel Intel MKL using...

Read more »

Speeding up R with Intel’s Math Kernel Library (MKL)

May 2, 2012
By
Speeding up R with Intel’s Math Kernel Library (MKL)

I did some comparisons of the generic BLAS with Intel's MKL (both sequential and parallel) on a Dell PowerEdge 610 server with dual hyperthreading 6-core 3.06GHz Xeon X5675 processors.  Here are the results from an R benchmarking script (Normal R ...

Read more »

2nd round of call for chapter proposals for book Data Mining Applications with R: due by 31 May

May 2, 2012
By
2nd round of call for chapter proposals for book Data Mining Applications with R: due by 31 May

2nd CALL FOR CHAPTERS: proposals due by 31 May 2012 Data Mining Applications with R A book to be published by Elsevier http://www.RDataMining.com/books/book2 Introduction —————— R is one of the most widely used data mining tools in scientific and business … Continue reading →

Read more »

Sponsors