GPU Computing with R

August 16, 2010
By
GPU Computing with R

Statistics is computationally intensive. Routine statistical tasks such as data extraction, graphical summary, and technical interpretation all require heavy use of modern computing machinery. Obviously, these tasks can benefit greatly from a paralle...

Read more »

ggplot2 plot builder is now on CRAN! (through Deducer 0.4 GUI for R)

August 16, 2010
By

Ian fellows, a hard working contributer to the R community (and a cool guy), has announced today the release of Deducer (0.4) to CRAN (scheduled to update in the next day or so). This major update also includes the release of a new plug-in package (DeducerExtras), containing additional dialogs and functionality. Following is the e-mail he sent out with...

Read more »

Intraday volatility of OMX Baltic stocks

August 16, 2010
By
Intraday volatility of OMX Baltic stocks

Usually, intraday volatility exhibits a “smile” – it is high at open and close and it is lower during the trading day. DJI index, 5 min. intervals, CET time: MOS stock, 5 s. intervals, CET time: Because many readers of this blog are trading Nasdaq OMX Baltic stocks, it is worth to share my findings about volatility in

Read more »

Gone Guerrill_ R on Our Data

August 16, 2010
By

Here's a summary of some things we learnt about applying R to computer performance and capacity planning data in the GDAT Class last week. Neural nets pkg nnet applied to CPU performance data in the Ripley and Venables book (see Section 8.10). How to do stacked plots that Jim calls "spark plots." Jim told...

Read more »

Gone Guerrill_ R on Our Data

August 16, 2010
By

Here's a summary of some things we learnt about applying R to computer performance and capacity planning data in the GDAT Class last week. Neural nets pkg nnet applied to CPU performance data in the Ripley and Venables book (see Section 8.10). How to do stacked plots that Jim calls "spark plots." Jim told...

Read more »

Project Euler Problem #21

August 16, 2010
By

This is a solution for problem 21 on the Project Euler website. It consists of finding the sum of all the amicable numbers under 10000. This was pretty easy to solve, but the solution could probably be improved quite a bit. Solution #1 in R is as follo...

Read more »

Consultants’ Chart in ggplot2

August 16, 2010
By
Consultants’ Chart in ggplot2

Excel Charts Blog posted a video tutorial of how to create a circumplex or rose or dougnut chart in Excel. Apparently this type of chart is very popular in the consulting industry, hence the “Consultants’ Chart”. It is very easy to make this chart in Excel 2010, but it involves countless number of clicks and

Read more »

A quick analysis of the trends in the number of weddings in France (1975–2010)

August 15, 2010
By
A quick analysis of the trends in the number of weddings in France (1975–2010)

I’m currently planning my wedding, and my fiancée and I were discussing wether there were more or less couples getting married over time. It turns out that this information is quite easy to get via INSEE, a french institute that (…)Read the rest of this entry »

Read more »

Downloading DNA sequences into R

August 15, 2010
By

A while ago, a friend of mine needed to download a number of different DNA sequences from Genbank, the online repository for the vast majority of DNA sequences read from all organisms by labs all over the world. This is not a problem. The "ape" package in R has a nifty function, read.GenBank(), that downloads the...

Read more »

Downloading DNA sequences into R

August 15, 2010
By

A while ago, a friend of mine needed to download a number of different DNA sequences from Genbank, the online repository for the vast majority of DNA sequences read from all organisms by labs all over the world. This is not a problem. The "ape" package in R has a nifty function, read.GenBank(), that downloads the...

Read more »

Two Surpising Things about R

August 14, 2010
By
Two Surpising Things about R

I see that it’s been over a year since my last post!  I have a backlog of blog post ideas, but something else always seems to have higher priority.   Today, though, I have some interesting (and useful) things to say about R, which I discovered in the last few days, and which shouldn’t take long

Read more »

Hard drive occupation prediction with R – The linear regression

Hard drive occupation prediction with R – The linear regression

On some environments, disk space usage can be pretty predictable. In this post, we will see how to do a linear regression to estimate when free space will reach zero, and how to assess the quality of such regression, all using R - the statistical soft...

Read more »

Hard drive occupation prediction with R

Hard drive occupation prediction with R

On some environments, disk space usage can be pretty predictable. In this post, we will see how to do a linear regression to estimate when free space will reach zero, and how to assess the quality of such regression, all using R - the statistical soft...

Read more »

Auto-completion in Notepad++ for R Script

August 14, 2010
By
Auto-completion in Notepad++ for R Script

Auto-completion is fancy in a text editor. Notepad++ does not support auto-completion for the R language, so I spent a couple of hours on creating such an XML file to support R: Put it under ‘plugins/APIs‘ in the installation directory of Notepad++ (you can see several other XML files there supporting different languages such as

Read more »

Introducing visualVaR.com

August 13, 2010
By
Introducing visualVaR.com

After a month of on-again, off-again coding, I’ve finally completed a web site geared towards calculating the Value at Risk of the average investor’s portfolio. The site is visualvar.com. The big idea was to combine the statistical and visualization tools of R (especially ggplot2) with the web interface of Drupal. While I’m

Read more »

Presentations and video from useR! 2010 available

August 13, 2010
By

For anyone who missed the useR! 2010 conference in Gaithersburg last month (or just wants to revisit some of the amazing talks), you can now find slides from many of the contributed presentations available for download. (Look for the link to download slides.) The presentations from the Revolution team members are there, including: My presentation on Evolving R...

Read more »

Apologies and Style Guides

August 13, 2010
By
Apologies and Style Guides

I have to say that it’s pretty exciting to watch your blog go from a few hits over its lifetime to getting almost 200 in a single day.  I am currently negotiating with Google over the purchase of this blog.  Or maybe not.  Again, thanks be to @revodavid for posting to the Revolution Analytics Blog.

Read more »

Rcpp svn revision 2000

August 13, 2010
By
Rcpp svn revision 2000

I commited the 2000th revision of Rcpp svn today, so I wanted to look back at what I did previously with the 50 000th R commit. Here are the number of commits per day and month commits_per_month.pngRead more »

Scrape Web data using R

August 13, 2010
By
Scrape Web data using R

Plenty of people have been scraping data from the web using R for a while now, but I just completed my first project and I wanted to share the code with you.  It was a little hard to work through some of the “issues”, but I had some great help from @DataJunkie on twitter. As

Read more »

Rcpp at LondonR, oct 5th

August 12, 2010
By
Rcpp at LondonR, oct 5th

I'll be presenting Rcpp at the next LondonR, which is currently scheduled for october 5th Here is one picture I found on flickr, searching for london speed bus, ... there are many other

Read more »

Fun with the proto package: building an MCMC sampler for Bayesian regression

August 12, 2010
By
Fun with the proto package: building an MCMC sampler for Bayesian regression

The proto package is my latest favourite R goodie. It brings prototype-based programming to the R language - a style of programming that lets you do many of the things you can do with classes, but with a lot less up-front work. Louis Kates and Thomas Petzoldt provide an excellent introduction to using proto in the

Read more »

Fun with the proto package: building an MCMC sampler for Bayesian regression

August 12, 2010
By
Fun with the proto package: building an MCMC sampler for Bayesian regression

The proto package is my latest favourite R goodie. It brings prototype-based programming to the R language - a style of programming that lets you do many of the things you can do with classes, but with a lot less up-front work. Louis Kates and Thomas P...

Read more »

Tuning Notepad++

August 12, 2010
By

Here are some tricks I collected for making Notepad++ a more comfortable text editor for me in general in for the R programming language in particular.Switch between tabs in Notepad++ with Ctrl-PageUp/DownNotepad++'s default behaviour is to use Ctrl+(S...

Read more »

R’s role in the national response to the BP Oil Spill

August 12, 2010
By

In the early days of the Deepwater Horizon oil spill in the Gulf of Mexico, the rate of flow of oil from the spill was of great concern: estimating it accurately was key to coordinating the scale and scope of the response to the emergency. Unfortunately, estimates from independent sources varied widely, and BP's own estimates varied widely over...

Read more »

useR! 2010 conference videos

August 12, 2010
By
useR! 2010 conference videos

Videos of the invited talks of the useR! 2010 conference as follows (courtesy by Kate Mullen and NIST). This site also aims at collecting the materials (video, slides, R code) of local R users group (RUG) meetings and various other … Continue reading →

Read more »

Baseball games: getting longer?

August 11, 2010
By
Baseball games: getting longer?

ESPN's Bill Simmons (aka The Sports Guy) recently suggested that the primary cause of dwindling interest in Red Sox games by fans is that baseball games these days are too long. "It's not that fun to spend 30-45 minutes driving to a game, paying for parking, parking, waiting in line to get in, finding your seat ... and then,...

Read more »

What would a 25th, 50th, and 75th percentile soil profile look like?

August 11, 2010
By
What would a 25th, 50th, and 75th percentile soil profile look like?

I have mentioned the AQP package in previous entries. One of the functions in this package generates aggregate soil profile data, from a collection of soil profiles that are related by some factor: common lithology, common landscape position, and so on...

Read more »

Using R for Introductory Statistics 3.3

August 11, 2010
By
Using R for Introductory Statistics 3.3

...continuing our way though John Verzani's Using R for introductory statistics. Previous installments: chapt1&2, chapt3.1, chapt3.2 Relationships in numeric data If two data series have a natural pairing (x1,y1),...,(xn,yn), then we can ask, &ld...

Read more »

Using R for Introductory Statistics 3.3

August 11, 2010
By
Using R for Introductory Statistics 3.3

...continuing our way though John Verzani's Using R for introductory statistics. Previous installments: chapt1&2, chapt3.1, chapt3.2 Relationships in numeric data If two data series have a natural pairing (x1,y1),...,(xn,yn), then we can ask, &ld...

Read more »