## Web Scraping Yahoo Search Page via XPath

November 10, 2011
Seeing as I’m on a bit of an XPath kick as of late, I figured I’d continue on scraping search results but this time from Yahoo.com Rolling my own version of xpathSApply to handle NULL elements seems to have done the trick and so far it’s been relatively easy to do the scraping. I’ve created

## In case you missed it: October Roundup

November 10, 2011
In case you missed them, here are some articles from October of particular interest to R users. The creator of the ggplot2 package, Hadley Wickham, shares details on some forthcoming big-data graphics functions (based on research sponsored by Revolution Analytics). A list of several dozen free data sources that can easily be imported into R. Bob Muenchen gave a...

## Facebook Graph API Explorer with R

November 10, 2011
I wanted to play around with the Facebook Graph API  using the Graph API Explorer page as a coding exercise. This facility allows one to use the API with a temporary authorisation token. Now, I don’t know how to make an R package for the proper API where you have to register for an API key and

## Diagram for a Bernoulli process (using R)

November 10, 2011
A Bernoulli process is a sequence of Bernoulli trials (the realization of n binary random variables), taking two values (0/1, Heads/Tails, Boy/Girl, etc…). It is often used in teaching introductory probability/statistics classes about the binomial distribution. When visualizing a Bernoulli process, it is common to use a binary tree diagram in order to show the Read more...

## Geometric Efficient Frontier

November 9, 2011
What is important for an investor? The rate of return is at the top of the list. Does the expected rate of return shown on the mean-variance efficient frontier paints the full picture? If investor’s investment horizon is longer than one period, for example 5 years, than the true measure of portfolio performance is Geometric

## Web Scraping Google Scholar: Part 2 (Complete Success)

November 8, 2011
This is a followup to a post I uploaded earlier today about web scraping data off Google Scholar. In that post I was frustrated because I’m not smart enough to use xpathSApply to get the kind of results I wanted. However fast-forward to the evening whilst having dinner with a friend, as a passing remark,

## Using Text Mining to Find Out What @RDataMining Tweets are About

November 8, 2011
This post shows an example on text mining of Twitter data with R packages twitteR, tm and wordcloud. Package twitteR provides access to Twitter data, tm provides functions for text mining, and wordcloud visualizes the result with a word cloud.

## Bridge and Torch problem in R

November 8, 2011
A couple months ago I came across the bridge and torch problem at a careers fair in Oxford. A young tech company called QuBit used it as a brain teaser challenge for would be software engineers to solve before submitting

## Doing away with “unknown timezone” warnings

November 8, 2011
Timezone stuff can really drive you NUTS - at least if you're sitting in front of a German Windows-Box This is what I used to do to set my tz: And I always wondered why R would throw "unknown timezone" warnings: Someday I found out that setting tz via `options()` was not enough as the

## ABC on wordpress

November 7, 2011
Erkan Buzbas sent me an email about his webpage (operated as a wordpress blog) on ABC. It contains different items of information on ABC research and an hopefully growing list of references. After Scott Sisson’s tweet on ABC_research (latest news: two ABC sessions in ISBA 20122, Kyoto),  here comes another way to keep posted about