# Monthly Archives: November 2011

## Web Scraping Google Scholar (Partial Success)

November 8, 2011
By

I wanted to scrape the information returned by a Google Scholar web search into an R data frame as a quick XPath exercise. The following will successfully extract  the ‘title’, ‘url’ , ‘publication’ and ‘description’.  If any of these fields are not available, as in the case of a citation, the corresponding cell in the data

## Bridge and Torch problem in R

November 8, 2011
By

A couple months ago I came across the bridge and torch problem at a careers fair in Oxford. A young tech company called QuBit used it as a brain teaser challenge for would be software engineers to solve before submitting … Continue reading →

## Example 9.13: Negative binomial regression with proc mcmc

November 8, 2011
By

In practice, data that derive from counts rarely seem to be fit well by a Poisson model; one more flexible alternative is a negative binomial model. In this SAS-only entry, we discuss how proc mcmc can be used for estimation. An overview of support f...

## Blankety Blank

November 8, 2011
By

The erstwhile big 4 all blanked their opponents last Saturday and a poster on the Guardian wondered when was the previous occasion of such an occurrence. It’s a pretty simple procedure in SQL using a subquery, but in the spirit of learning R, I thought I would tackle the problem in that language, with the

## project euler – problem 47

November 8, 2011
By

The first two consecutive numbers to have two distinct prime factors are: 14 = 2 × 7 Read More: 278 Words Totally

## project euler – Problem 44

November 8, 2011
By

Pentagonal numbers are generated by the formula, Pn=n(3n−1)/2. The first ten pentagonal numbers are: 1, 5, 12, 22, 35, 51, 70, 92, 117, 145, ... Read More: 472 Words Totally

## Drawing polar centered spatial maps using ggplot2

November 8, 2011
By

Drawing maps of the polar regions can be done using square spatial maps. A small example says more than a thousand words: xlim = c(-180,180) ylim = c(60,90)   # Some fake grid data dat_grid = expand.grid(x = xlim[1]:xlim[2], y… See more ›

## The mystery of volatility estimates from daily versus monthly returns

November 8, 2011
By

What drives the estimates apart? Previously A post by Investment Performance Guy prompted “Variability of volatility estimates from daily data”. In my comments to the original post I suggested that using daily data to estimate volatility would be equivalent to using monthly data except with less variability.  Dave, the Investment Performance Guy, proposed the exquisitely … Continue reading...

## Doing away with “unknown timezone” warnings

November 8, 2011
By

Timezone stuff can really drive you NUTS - at least if you’re sitting in front of a German Windows-Box This is what I used to do to set my tz: And I always wondered why R would throw “unknown timezone” warnings: Someday I found out that setting tz via `options()` was not enough as the … Continue reading...

## project euler – Problem 32

November 8, 2011
By

We shall say that an n-digit number is pandigital if it makes use of all the digits 1 to n exactly once; for example, the 5-digit number, 15234, is 1 through 5 pandigital. The product 7254 is unusual, as the identity, 39 × 186 = 7254, containing multiplicand, multiplier, and product is 1 through...