Some project offspin, the jpgfader-function (the jpgfader-function in funny use can be viewed HERE):Read more »

In case you missed them, here are some articles from October of particular interest to R users. The creator of the ggplot2 package, Hadley Wickham, shares details on some forthcoming big-data graphics functions (based on research sponsored by Revolution Analytics). A list of several dozen free data sources that can easily be imported into R. Bob Muenchen gave a...

Tony Breyal woke up an old code optimization problem in this blog post, so I figured it was time for an Rcpp based solution This solutions moves down Henrik Bengtsson's idea (which was at the basis of attempt 10) down to C++. The idea was to call sprintf less than the other solutions to generate the strings...

A Bernoulli process is a sequence of Bernoulli trials (the realization of n binary random variables), taking two values (0/1, Heads/Tails, Boy/Girl, etc…). It is often used in teaching introductory probability/statistics classes about the binomial distribution. When visualizing a Bernoulli process, it is common to use a binary tree diagram in order to show the Read more...

I will be teaching a workshop on R and LaTeX at NEAIR in just under a month. One of the issues I will encounter is a lack of Internet access. I also work with restricted data from NCES which requires the computer to be secured including no network a...

The subset function is available in base R and can be used to return subsets of a vector, martix, or data frame which meet a particular condition. In my three years of using R, I have repeatedly used the subset() function and believe that it is the most useful tool for selecting elements of a

What is important for an investor? The rate of return is at the top of the list. Does the expected rate of return shown on the mean-variance efficient frontier paints the full picture? If investor’s investment horizon is longer than one period, for example 5 years, than the true measure of portfolio performance is Geometric

Many high-schoolers are now using R in class, and to help even more students get exposure to R (while improving R itself), Virgilio Gómez-Rubio is seeking suggestions for projects for the next Google Code-In: An application has been put forward for R to participate in Google Code-in. This is a Google's contest to introduce pre-university students (age 13-18) to...

CloudStat: Learn & Do R on the Cloud CloudStat is a platform to learn and do R on the Cloud. With CloudStat, there is no more download, installation, update and maintenance. CloudStat decrease the R language learning curve besides collaboration. And it...

The arithmetic sequence, 1487, 4817, 8147, in which each of the terms increases by 3330, is unusual in two ways: (i) each of the three terms are prime, and, (ii) each of the 4-digit numbers are permutations of one another. There are no arithmetic sequences made up of three 1-, 2-, or 3-digit primes,...

Dataspora recently analyzed Lending Club‘s data in a geographical way using the data distributed by the site. Lending Club is an online financial community that brings together creditworthy borrowers and savvy investors so that both can benefit financially. We replace the high cost and complexity of bank lending with a faster, smarter way to borrow

This is a followup to a post I uploaded earlier today about web scraping data off Google Scholar. In that post I was frustrated because I’m not smart enough to use xpathSApply to get the kind of results I wanted. However fast-forward to the evening whilst having dinner with a friend, as a passing remark,

Avril Coghlan, a lecturer at University College Cork in Ireland, has written and made available for free three books ideal for students or practitioners new to R who want to use it for multivariate analysis, time series analysis or biomedical statistics. Each book begins with practical advice for installing and using R in general, before diving into their specialized...

I do reports for clients with LyX and Sweave. It took me an extremely long time to get them working, but now that they’re working I can do more in an hour and thus charge more per hour. (Which is, like, the point.) If you’re not familiar, here’s ...

I do reports for clients with LyX and Sweave. It took me an extremely long time to get them working, but now that they’re working I can do more in an hour and thus charge more per hour. If you’re not familiar, here’s a rundown: LaTeX is the stand...

I created an R package to read grads data. As far as I know, there is no dedicated package to read grads data. The package is still quite new, any remarks on the documentation or code are more than welcome.… See more ›

Setting up AWS Cluster I wanted to setup an AWS cluster to take a shot at a Kaggle contest – DunnHumby Challenge http://www.kaggle.com/c/dunnhumbychallenge For this, I found StarCluster to be of great help. It allows you to set-up AWS nodes in a few lines of code and does much more (choosing AMIs and cluster configurations)

I wanted to scrape the information returned by a Google Scholar web search into an R data frame as a quick XPath exercise. The following will successfully extract the ‘title’, ‘url’ , ‘publication’ and ‘description’. If any of these fields are not available, as in the case of a citation, the corresponding cell in the data

The erstwhile big 4 all blanked their opponents last Saturday and a poster on the Guardian wondered when was the previous occasion of such an occurrence. It’s a pretty simple procedure in SQL using a subquery, but in the spirit of learning R, I thought I would tackle the problem in that language, with the