## Web Scraping Google Scholar (Partial Success)

November 8, 2011
By

I wanted to scrape the information returned by a Google Scholar web search into an R data frame as a quick XPath exercise. The following will successfully extract  the ‘title’, ‘url’ , ‘publication’ and ‘description’.  If any of these fields are not available, as in the case of a citation, the corresponding cell in the data

## Bridge and Torch problem in R

November 8, 2011
By

A couple months ago I came across the bridge and torch problem at a careers fair in Oxford. A young tech company called QuBit used it as a brain teaser challenge for would be software engineers to solve before submitting … Continue reading →

## Example 9.13: Negative binomial regression with proc mcmc

November 8, 2011
By

In practice, data that derive from counts rarely seem to be fit well by a Poisson model; one more flexible alternative is a negative binomial model. In this SAS-only entry, we discuss how proc mcmc can be used for estimation. An overview of support f...

## Blankety Blank

November 8, 2011
By

The erstwhile big 4 all blanked their opponents last Saturday and a poster on the Guardian wondered when was the previous occasion of such an occurrence. It’s a pretty simple procedure in SQL using a subquery, but in the spirit of learning R, I thought I would tackle the problem in that language, with the

## project euler – problem 47

November 8, 2011
By

The first two consecutive numbers to have two distinct prime factors are: 14 = 2 × 7 Read More: 278 Words Totally

## project euler – Problem 44

November 8, 2011
By

Pentagonal numbers are generated by the formula, Pn=n(3n−1)/2. The first ten pentagonal numbers are: 1, 5, 12, 22, 35, 51, 70, 92, 117, 145, ... Read More: 472 Words Totally

## Drawing polar centered spatial maps using ggplot2

November 8, 2011
By

Drawing maps of the polar regions can be done using square spatial maps. A small example says more than a thousand words: xlim = c(-180,180) ylim = c(60,90)   # Some fake grid data dat_grid = expand.grid(x = xlim[1]:xlim[2], y… See more ›

## The mystery of volatility estimates from daily versus monthly returns

November 8, 2011
By

What drives the estimates apart? Previously A post by Investment Performance Guy prompted “Variability of volatility estimates from daily data”. In my comments to the original post I suggested that using daily data to estimate volatility would be equivalent to using monthly data except with less variability.  Dave, the Investment Performance Guy, proposed the exquisitely … Continue reading...

## Doing away with “unknown timezone” warnings

November 8, 2011
By

Timezone stuff can really drive you NUTS - at least if you’re sitting in front of a German Windows-Box This is what I used to do to set my tz: And I always wondered why R would throw “unknown timezone” warnings: Someday I found out that setting tz via options() was not enough as the … Continue reading...

## project euler – Problem 32

November 8, 2011
By

We shall say that an n-digit number is pandigital if it makes use of all the digits 1 to n exactly once; for example, the 5-digit number, 15234, is 1 through 5 pandigital. The product 7254 is unusual, as the identity, 39 × 186 = 7254, containing multiplicand, multiplier, and product is 1 through...

## project euler – Problem 31

November 8, 2011
By

In England the currency is made up of pound, £, and pence, p, and there are eight coins in general circulation: 1p, 2p, 5p, 10p, 20p, 50p, £1 (100p) and £2 (200p). Read More: 299 Words Totally

## project euler – Problem 15

November 8, 2011
By

Starting in the top left corner of a 2x2 grid, there are 6 routes (without backtracking) to the bottom right corner. How many routes are there through a 20x20 grid? Read More: 293 Words Totally

## project euler-Problem 43

November 7, 2011
By

The number, 1406357289, is a 0 to 9 pandigital number because it is made up of each of the digits 0 to 9 in some order, but it also has a rather interesting sub-string divisibility property. Let d1 be the 1st digit, d2 be the 2nd digit, and so on. In this way, we note the...

## ABC on wordpress

November 7, 2011
By

Erkan Buzbas sent me an email about his webpage (operated as a wordpress blog) on ABC. It contains different items of information on ABC research and an hopefully growing list of references. After Scott Sisson’s tweet on ABC_research (latest news: two ABC sessions in ISBA 20122, Kyoto),  here comes another way to keep posted about

## Webinar Nov 17: What’s new in Revolution R Enterprise 5.0

November 7, 2011
By

Revolution R Enterprise 5.0 will be released soon, and Sue Ranney, VP of Development at Revolution Analytics, will host a webinar on Thursday November 17 to get you up to speed on the latest features: Revolution R Enterprise 5.0 is Revolution Analytics’ scalable analytics platform. At its core is Revolution Analytics’ enhanced Distribution of R, the world’s most widely-used...

## Coming out of the (Bayesian) closet: multivariate version

November 7, 2011
By

This week I’m facing my—and many other lecturers’—least favorite part of teaching: grading exams. In a supreme act of procrastination I will continue the previous post, and the antepenultimate one, showing the code for a bivariate analysis of a randomized … Continue reading →

November 7, 2011
By

Google slightly changed the html code it uses for hyperlinks on search pages last Thursday, thus causing one of my scripts to stop working. Thankfully, this is easily solved in R thanks to the XML package and the power and simplicity of XPath expressions: Lovely jubbly! P.S. I know that there is an API of

## Code Optimization: One R Problem, Eleven Solutions – Now Thirteen!

November 7, 2011
By

Following up from my previous post “Code Optimisation: One R Problem, Ten Solutions – Now Eleven!” I figured out a twelfth solution after writing that blog post. Furthermore, half way through writing this blog post I figured out a thirteenth solution too. As a recap, the problem is taken from rwiki where the goal is to find

## project euler-Problem 41

November 7, 2011
By

We shall say that an n-digit number is pandigital if it makes use of all the digits 1 to n exactly once. For example, 2143 is a 4-digit pandigital and is also prime. What is the largest n-digit pandigital prime that exists? Read More: 288 Words Totally

## Bayesian modeling using WinBUGS

November 6, 2011
By

Yes, yet another Bayesian textbook: Ioannis Ntzoufras’ Bayesian modeling using WinBUGS was published in 2009 and it got an honourable mention at the 2009 PROSE Award. (Nice acronym for a book award! All the mathematics books awarded that year were actually statistics books.) Bayesian modeling using WinBUGS is rather similar to the more recent Bayesian

## Rcpp talk at Seattle RUG next month

November 6, 2011
By

The Seattle R User Group was kind enough to invite me to give a talk about R, C++ and Rcpp. So if you can make it to the Thomas building of the Fred Hutchinson Cancer Research Center in Seattle, WA, on December 7, I would love to see you there. I ha...

## More colour wheels

November 5, 2011
By

In response to my post about colour wheels, I received a suggested enhancement from Drew. The idea is to first match colours based on the text provided and then add nearby colours. This can be done by ordering colours in terms of hue, saturation, and value. The result is a significant improvement and it will capture all of

## Kaplan-Meier Survival Plot – with at risk table

November 5, 2011
By

Credit for the bulk of this code is to Abhijit Dasgupta and the commenters on the original post here from earlier this year. I have made a few changes to the functionality of this which I think warrant sharing. A brief … Continue reading →

## Next Level Web Scraping

November 5, 2011
By

The outcome presented above will not be very useful to most of you - still, this could be a good example for what possibly can be done via web scraping in R.Background: TIRIS is the federal geo-statistical service of North-Tyrol, Austria. One of many g...

## Vectors (CloudStat)

November 5, 2011
By

The simplest type of data object in R is a vector, which is simply an ordered set of values. Some further examples of creating vectors are shown below: Input: 1:20 Output: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 This creates...

## #2 Data Classes (CloudStat)

November 5, 2011
By

As stated in CloudStat Intro, we know that CloudStat is based on R Language, an object orientated language, everything in R is an object. Each object has a class. The simplest data objects are one-dimensional arrays called vectors, consisting of any nu...

## The Joy of R: A Feline Guide

November 5, 2011
By

Just because it’s caturday Images by Mario Pineda-Krch (CC BY-NC-SA 3.0) This is from the “Mario’s Entangled Bank” blog ( http://pineda-krch.com ) of Mario Pineda-Krch, a theoretical biologist at the University of Alberta. Filed under: cats, computing, humour, R, Sweave

## Colour wheels in R

November 5, 2011
By

Regular readers will know I use the R package to produce most of the charts that appear here on the blog. Being more quantitative than artistic, I find choosing colours for the charts to be one of the trickiest tasks when designing a chart, particularly as R has so many colours to choose from. In