CloudStat: Learn & Do R on the Cloud CloudStat is a platform…

November 8, 2011
By

CloudStat: Learn & Do R on the Cloud CloudStat is a platform to learn and do R on the Cloud. With CloudStat, there is no more download, installation, update and maintenance. CloudStat decrease the R language learning curve besides collaboration. And it...

Read more »

project euler – problem 49

November 8, 2011
By

The arithmetic sequence, 1487, 4817, 8147, in which each of the terms increases by 3330, is unusual in two ways: (i) each of the three terms are prime, and, (ii) each of the 4-digit numbers are permutations of one another. There are no arithmetic sequences made up of three 1-, 2-, or 3-digit primes,...

Read more »

Lending Club – naive data analysis

November 8, 2011
By
Lending Club – naive data analysis

Dataspora recently analyzed Lending Club‘s data in a geographical way using the data distributed by the site. Lending Club is an online financial community that brings together creditworthy borrowers and savvy investors so that both can benefit financially. We replace the high cost and complexity of bank lending with a faster, smarter way to borrow

Read more »

Web Scraping Google Scholar: Part 2 (Complete Success)

November 8, 2011
By
Web Scraping Google Scholar: Part 2 (Complete Success)

This is a followup to a post I uploaded earlier today about web scraping data off Google Scholar. In that post I was frustrated because I’m not smart enough to use xpathSApply to get the kind of results I wanted. However fast-forward to the evening whilst having dinner with a friend, as a passing remark,

Read more »

What the frack? Does hydraulic fracturing lead to increased earthquakes?

November 8, 2011
By
What the frack?  Does hydraulic fracturing lead to increased earthquakes?

Earthquakes are normal occurrences along the boundaries of major plate margins, such as along the San Andreas fault system of California,  and are less common within plate interiors.  Try telling that, however, to the citizens of Oklahoma who...

Read more »

Three free books on R for Statistics

November 8, 2011
By

Avril Coghlan, a lecturer at University College Cork in Ireland, has written and made available for free three books ideal for students or practitioners new to R who want to use it for multivariate analysis, time series analysis or biomedical statistics. Each book begins with practical advice for installing and using R in general, before diving into their specialized...

Read more »

Error Handling in Lyx & Sweave: using Quantmod (and R, of course)

November 8, 2011
By

I do reports for clients with LyX and Sweave. It took me an extremely long time to get them working, but now that they’re working I can do more in an hour and thus charge more per hour. (Which is, like, the point.) If you’re not familiar, here’s ...

Read more »

Error Handling in Lyx & Sweave: using Quantmod (and R, of course)

November 8, 2011
By

I do reports for clients with LyX and Sweave. It took me an extremely long time to get them working, but now that they’re working I can do more in an hour and thus charge more per hour. If you’re not familiar, here’s a rundown: LaTeX is the stand...

Read more »

Using Text Mining to Find Out What @RDataMining Tweets are About

November 8, 2011
By
Using Text Mining to Find Out What @RDataMining Tweets are About

This post shows an example on text mining of Twitter data with R packages twitteR, tm and wordcloud. Package twitteR provides access to Twitter data, tm provides functions for text mining, and wordcloud visualizes the result with a word cloud. … Continue reading →

Read more »

readGrads – An R package to read and manipulate grads data

November 8, 2011
By

I created an R package to read grads data. As far as I know, there is no dedicated package to read grads data. The package is still quite new, any remarks on the documentation or code are more than welcome.… See more ›

Read more »

Setting up AWS Cluster to use snow in R

November 8, 2011
By
Setting up AWS Cluster to use snow in R

Setting up AWS Cluster I wanted to setup an AWS cluster to take a shot at a Kaggle contest – DunnHumby Challenge http://www.kaggle.com/c/dunnhumbychallenge For this, I found StarCluster to be of great help. It allows you to set-up AWS nodes in a few lines of code and does much more (choosing AMIs and cluster configurations)

Read more »

Web Scraping Google Scholar (Partial Success)

November 8, 2011
By

I wanted to scrape the information returned by a Google Scholar web search into an R data frame as a quick XPath exercise. The following will successfully extract  the ‘title’, ‘url’ , ‘publication’ and ‘description’.  If any of these fields are not available, as in the case of a citation, the corresponding cell in the data

Read more »

Bridge and Torch problem in R

November 8, 2011
By
Bridge and Torch problem in R

A couple months ago I came across the bridge and torch problem at a careers fair in Oxford. A young tech company called QuBit used it as a brain teaser challenge for would be software engineers to solve before submitting … Continue reading →

Read more »

Example 9.13: Negative binomial regression with proc mcmc

November 8, 2011
By
Example 9.13: Negative binomial regression with proc mcmc

In practice, data that derive from counts rarely seem to be fit well by a Poisson model; one more flexible alternative is a negative binomial model. In this SAS-only entry, we discuss how proc mcmc can be used for estimation. An overview of support f...

Read more »

Blankety Blank

November 8, 2011
By
Blankety Blank

The erstwhile big 4 all blanked their opponents last Saturday and a poster on the Guardian wondered when was the previous occasion of such an occurrence. It’s a pretty simple procedure in SQL using a subquery, but in the spirit of learning R, I thought I would tackle the problem in that language, with the

Read more »

project euler – problem 47

November 8, 2011
By

The first two consecutive numbers to have two distinct prime factors are: 14 = 2 × 7 Read More: 278 Words Totally

Read more »

project euler – Problem 44

November 8, 2011
By

Pentagonal numbers are generated by the formula, Pn=n(3n−1)/2. The first ten pentagonal numbers are: 1, 5, 12, 22, 35, 51, 70, 92, 117, 145, ... Read More: 472 Words Totally

Read more »

Drawing polar centered spatial maps using ggplot2

November 8, 2011
By
Drawing polar centered spatial maps using ggplot2

Drawing maps of the polar regions can be done using square spatial maps. A small example says more than a thousand words: xlim = c(-180,180) ylim = c(60,90)   # Some fake grid data dat_grid = expand.grid(x = xlim[1]:xlim[2], y… See more ›

Read more »

The mystery of volatility estimates from daily versus monthly returns

November 8, 2011
By
The mystery of volatility estimates from daily versus monthly returns

What drives the estimates apart? Previously A post by Investment Performance Guy prompted “Variability of volatility estimates from daily data”. In my comments to the original post I suggested that using daily data to estimate volatility would be equivalent to using monthly data except with less variability.  Dave, the Investment Performance Guy, proposed the exquisitely … Continue reading...

Read more »

Doing away with “unknown timezone” warnings

November 8, 2011
By
Doing away with “unknown timezone” warnings

Timezone stuff can really drive you NUTS - at least if you’re sitting in front of a German Windows-Box This is what I used to do to set my tz: And I always wondered why R would throw “unknown timezone” warnings: Someday I found out that setting tz via `options()` was not enough as the … Continue reading...

Read more »

project euler – Problem 32

November 8, 2011
By

We shall say that an n-digit number is pandigital if it makes use of all the digits 1 to n exactly once; for example, the 5-digit number, 15234, is 1 through 5 pandigital. The product 7254 is unusual, as the identity, 39 × 186 = 7254, containing multiplicand, multiplier, and product is 1 through...

Read more »

project euler – Problem 31

November 8, 2011
By

In England the currency is made up of pound, £, and pence, p, and there are eight coins in general circulation: 1p, 2p, 5p, 10p, 20p, 50p, £1 (100p) and £2 (200p). Read More: 299 Words Totally

Read more »

project euler – Problem 15

November 8, 2011
By
project euler – Problem 15

Starting in the top left corner of a 2x2 grid, there are 6 routes (without backtracking) to the bottom right corner. How many routes are there through a 20x20 grid? Read More: 293 Words Totally

Read more »

project euler-Problem 43

November 7, 2011
By

The number, 1406357289, is a 0 to 9 pandigital number because it is made up of each of the digits 0 to 9 in some order, but it also has a rather interesting sub-string divisibility property. Let d1 be the 1st digit, d2 be the 2nd digit, and so on. In this way, we note the...

Read more »

ABC on wordpress

November 7, 2011
By
ABC on wordpress

Erkan Buzbas sent me an email about his webpage (operated as a wordpress blog) on ABC. It contains different items of information on ABC research and an hopefully growing list of references. After Scott Sisson’s tweet on ABC_research (latest news: two ABC sessions in ISBA 20122, Kyoto),  here comes another way to keep posted about

Read more »

Webinar Nov 17: What’s new in Revolution R Enterprise 5.0

November 7, 2011
By

Revolution R Enterprise 5.0 will be released soon, and Sue Ranney, VP of Development at Revolution Analytics, will host a webinar on Thursday November 17 to get you up to speed on the latest features: Revolution R Enterprise 5.0 is Revolution Analytics’ scalable analytics platform. At its core is Revolution Analytics’ enhanced Distribution of R, the world’s most widely-used...

Read more »

Coming out of the (Bayesian) closet: multivariate version

November 7, 2011
By
Coming out of the (Bayesian) closet: multivariate version

This week I’m facing my—and many other lecturers’—least favorite part of teaching: grading exams. In a supreme act of procrastination I will continue the previous post, and the antepenultimate one, showing the code for a bivariate analysis of a randomized … Continue reading →

Read more »

Web Scraping Google URLs

November 7, 2011
By
Web Scraping Google URLs

Google slightly changed the html code it uses for hyperlinks on search pages last Thursday, thus causing one of my scripts to stop working. Thankfully, this is easily solved in R thanks to the XML package and the power and simplicity of XPath expressions: Lovely jubbly! P.S. I know that there is an API of

Read more »

Code Optimization: One R Problem, Eleven Solutions – Now Thirteen!

November 7, 2011
By
Code Optimization: One R Problem, Eleven Solutions – Now Thirteen!

Following up from my previous post “Code Optimisation: One R Problem, Ten Solutions – Now Eleven!” I figured out a twelfth solution after writing that blog post. Furthermore, half way through writing this blog post I figured out a thirteenth solution too. As a recap, the problem is taken from rwiki where the goal is to find

Read more »