Geometric Efficient Frontier

November 9, 2011
By
Geometric Efficient Frontier

What is important for an investor? The rate of return is at the top of the list. Does the expected rate of return shown on the mean-variance efficient frontier paints the full picture? If investor’s investment horizon is longer than one period, for example 5 years, than the true measure of portfolio performance is Geometric

Read more »

Suggest some R tasks for high-schoolers

November 9, 2011
By

Many high-schoolers are now using R in class, and to help even more students get exposure to R (while improving R itself), Virgilio Gómez-Rubio is seeking suggestions for projects for the next Google Code-In: An application has been put forward for R to participate in Google Code-in. This is a Google's contest to introduce pre-university students (age 13-18) to...

Read more »

Add Transparency to JPEG – Yes, We Can!

November 9, 2011
By
Add Transparency to JPEG – Yes, We Can!

...Just read you JPEG and add an alpha channel manually, then assign values for transparency. Of course for printing you need to use a device that accepts alpha.See how it's done HERE.

Read more »

Getting Started With Twitter Analysis in R

November 9, 2011
By
Getting Started With Twitter Analysis in R

Earlier today, I saw a post vis the aggregating R-Bloggers service a post on Using Text Mining to Find Out What @RDataMining Tweets are About. The post provides a walktrhough of how to grab tweets into an R session using the twitteR library, and then do some text mining on it. I’ve been meaning to

Read more »

R-Function GScholarScraper to Webscrape Google Scholar Search Result

November 9, 2011
By
R-Function GScholarScraper to Webscrape Google Scholar Search Result

Based on my previous post on Web Scraping I coded and uploaded the Function "GScholarScraper" HERE for testing!The function will pull all (!) results, processing pages in chunks of 100 results/titles, and return a file with all titles, links, etc. It w...

Read more »

CloudStat: Learn & Do R on the Cloud CloudStat is a platform…

November 8, 2011
By

CloudStat: Learn & Do R on the Cloud CloudStat is a platform to learn and do R on the Cloud. With CloudStat, there is no more download, installation, update and maintenance. CloudStat decrease the R language learning curve besides collaboration. And it...

Read more »

project euler – problem 49

November 8, 2011
By

The arithmetic sequence, 1487, 4817, 8147, in which each of the terms increases by 3330, is unusual in two ways: (i) each of the three terms are prime, and, (ii) each of the 4-digit numbers are permutations of one another. There are no arithmetic sequences made up of three 1-, 2-, or 3-digit primes,...

Read more »

Lending Club – naive data analysis

November 8, 2011
By
Lending Club – naive data analysis

Dataspora recently analyzed Lending Club‘s data in a geographical way using the data distributed by the site. Lending Club is an online financial community that brings together creditworthy borrowers and savvy investors so that both can benefit financially. We replace the high cost and complexity of bank lending with a faster, smarter way to borrow

Read more »

Web Scraping Google Scholar: Part 2 (Complete Success)

November 8, 2011
By
Web Scraping Google Scholar: Part 2 (Complete Success)

This is a followup to a post I uploaded earlier today about web scraping data off Google Scholar. In that post I was frustrated because I’m not smart enough to use xpathSApply to get the kind of results I wanted. However fast-forward to the evening whilst having dinner with a friend, as a passing remark,

Read more »

What the frack? Does hydraulic fracturing lead to increased earthquakes?

November 8, 2011
By
What the frack?  Does hydraulic fracturing lead to increased earthquakes?

Earthquakes are normal occurrences along the boundaries of major plate margins, such as along the San Andreas fault system of California,  and are less common within plate interiors.  Try telling that, however, to the citizens of Oklahoma who...

Read more »

Three free books on R for Statistics

November 8, 2011
By

Avril Coghlan, a lecturer at University College Cork in Ireland, has written and made available for free three books ideal for students or practitioners new to R who want to use it for multivariate analysis, time series analysis or biomedical statistics. Each book begins with practical advice for installing and using R in general, before diving into their specialized...

Read more »

Error Handling in Lyx & Sweave: using Quantmod (and R, of course)

November 8, 2011
By

I do reports for clients with LyX and Sweave. It took me an extremely long time to get them working, but now that they’re working I can do more in an hour and thus charge more per hour. (Which is, like, the point.) If you’re not familiar, here’s ...

Read more »

Error Handling in Lyx & Sweave: using Quantmod (and R, of course)

November 8, 2011
By

I do reports for clients with LyX and Sweave. It took me an extremely long time to get them working, but now that they’re working I can do more in an hour and thus charge more per hour. If you’re not familiar, here’s a rundown: LaTeX is the stand...

Read more »

Using Text Mining to Find Out What @RDataMining Tweets are About

November 8, 2011
By
Using Text Mining to Find Out What @RDataMining Tweets are About

This post shows an example on text mining of Twitter data with R packages twitteR, tm and wordcloud. Package twitteR provides access to Twitter data, tm provides functions for text mining, and wordcloud visualizes the result with a word cloud. … Continue reading →

Read more »

readGrads – An R package to read and manipulate grads data

November 8, 2011
By

I created an R package to read grads data. As far as I know, there is no dedicated package to read grads data. The package is still quite new, any remarks on the documentation or code are more than welcome.… See more ›

Read more »

Setting up AWS Cluster to use snow in R

November 8, 2011
By
Setting up AWS Cluster to use snow in R

Setting up AWS Cluster I wanted to setup an AWS cluster to take a shot at a Kaggle contest – DunnHumby Challenge http://www.kaggle.com/c/dunnhumbychallenge For this, I found StarCluster to be of great help. It allows you to set-up AWS nodes in a few lines of code and does much more (choosing AMIs and cluster configurations)

Read more »

Web Scraping Google Scholar (Partial Success)

November 8, 2011
By

I wanted to scrape the information returned by a Google Scholar web search into an R data frame as a quick XPath exercise. The following will successfully extract  the ‘title’, ‘url’ , ‘publication’ and ‘description’.  If any of these fields are not available, as in the case of a citation, the corresponding cell in the data

Read more »

Bridge and Torch problem in R

November 8, 2011
By
Bridge and Torch problem in R

A couple months ago I came across the bridge and torch problem at a careers fair in Oxford. A young tech company called QuBit used it as a brain teaser challenge for would be software engineers to solve before submitting … Continue reading →

Read more »

Example 9.13: Negative binomial regression with proc mcmc

November 8, 2011
By
Example 9.13: Negative binomial regression with proc mcmc

In practice, data that derive from counts rarely seem to be fit well by a Poisson model; one more flexible alternative is a negative binomial model. In this SAS-only entry, we discuss how proc mcmc can be used for estimation. An overview of support f...

Read more »

Blankety Blank

November 8, 2011
By
Blankety Blank

The erstwhile big 4 all blanked their opponents last Saturday and a poster on the Guardian wondered when was the previous occasion of such an occurrence. It’s a pretty simple procedure in SQL using a subquery, but in the spirit of learning R, I thought I would tackle the problem in that language, with the

Read more »

project euler – problem 47

November 8, 2011
By

The first two consecutive numbers to have two distinct prime factors are: 14 = 2 × 7 Read More: 278 Words Totally

Read more »

project euler – Problem 44

November 8, 2011
By

Pentagonal numbers are generated by the formula, Pn=n(3n−1)/2. The first ten pentagonal numbers are: 1, 5, 12, 22, 35, 51, 70, 92, 117, 145, ... Read More: 472 Words Totally

Read more »

Drawing polar centered spatial maps using ggplot2

November 8, 2011
By
Drawing polar centered spatial maps using ggplot2

Drawing maps of the polar regions can be done using square spatial maps. A small example says more than a thousand words: xlim = c(-180,180) ylim = c(60,90)   # Some fake grid data dat_grid = expand.grid(x = xlim[1]:xlim[2], y… See more ›

Read more »

The mystery of volatility estimates from daily versus monthly returns

November 8, 2011
By
The mystery of volatility estimates from daily versus monthly returns

What drives the estimates apart? Previously A post by Investment Performance Guy prompted “Variability of volatility estimates from daily data”. In my comments to the original post I suggested that using daily data to estimate volatility would be equivalent to using monthly data except with less variability.  Dave, the Investment Performance Guy, proposed the exquisitely … Continue reading...

Read more »

Doing away with “unknown timezone” warnings

November 8, 2011
By
Doing away with “unknown timezone” warnings

Timezone stuff can really drive you NUTS - at least if you’re sitting in front of a German Windows-Box This is what I used to do to set my tz: And I always wondered why R would throw “unknown timezone” warnings: Someday I found out that setting tz via `options()` was not enough as the … Continue reading...

Read more »

project euler – Problem 32

November 8, 2011
By

We shall say that an n-digit number is pandigital if it makes use of all the digits 1 to n exactly once; for example, the 5-digit number, 15234, is 1 through 5 pandigital. The product 7254 is unusual, as the identity, 39 × 186 = 7254, containing multiplicand, multiplier, and product is 1 through...

Read more »

project euler – Problem 31

November 8, 2011
By

In England the currency is made up of pound, £, and pence, p, and there are eight coins in general circulation: 1p, 2p, 5p, 10p, 20p, 50p, £1 (100p) and £2 (200p). Read More: 299 Words Totally

Read more »

project euler – Problem 15

November 8, 2011
By
project euler – Problem 15

Starting in the top left corner of a 2x2 grid, there are 6 routes (without backtracking) to the bottom right corner. How many routes are there through a 20x20 grid? Read More: 293 Words Totally

Read more »

project euler-Problem 43

November 7, 2011
By

The number, 1406357289, is a 0 to 9 pandigital number because it is made up of each of the digits 0 to 9 in some order, but it also has a rather interesting sub-string divisibility property. Let d1 be the 1st digit, d2 be the 2nd digit, and so on. In this way, we note the...

Read more »