by Earl F. Glynn, Kansas Watchdog The goal of this article is to describe how to “reshape” data from screen scraping to make analysis with existing tools easier. Background The Kansas Secretary of State published Nov. 2010 election results ...

by Earl F. Glynn, Kansas Watchdog The goal of this article is to show how to visit 105 online web pages programmatically and “scrape” data from them to form a statewide summary of election data in Kansas. An earlier article gave details of ...

Here's some R code that can be used to download archived tide height data from NOAA's CO-OPS OPeNDAP server. The code makes use of RCurl to send a URL query to the server, and then splits apart the resulting data into a data frame.

by Earl F. Glynn, Kansas Watchdog The goal of this exercise is to show how to “screen scrape” data from an online web page using R. Additional articles will extend this example to scrape data from 105 Kansas county pages to form a statewide...

Packages for R are being added and updated so frequently now that it's tough to keep up with them all (the @CRANberriesFeed Twitter feed helps, though). But here are a couple of recent package updates that caught my eye: The Rcpp package for seamless integration between R and C++ has been updated. While most of the changes are under...

In my explorations with R, Mathematica, FreeMat, MatLab, and RapidMiner (now with R support! Yay!), I’m seeing integration of R to be quite useful in building a trading app, as technical analysis is one of R’s fortés. For the sake of brevity, I’m including comments in the code instead of using paragraphs…use the source, Luke.Note that I'm not using...

Ok, R is very well-considered in certain respects, but there are also some things annoying me... This time it's scoping...

This post is the introduction to a series that will illustrate how to backtest the same strategy in Excel and R. The impetus for this series started with this tweet by Jared Woodard at Condor Options. After Soren Macbeth introduced us, Jare...

A simple challenge in Le Monde this week: find the group of four primes such that any sum of three terms in the group is prime and the overall sum is minimised. Here is a quick exploration by simulation, using the schoolmath package (with its imperfections): A=primes(start=1,end=53) lengthA=length(A) res=4*53 for (t in 1:10^4){ B=sample(A,4,prob=1/(1:lengthA)) sto=is.prim(sum(B))

For students planning to attend the annual worldwide R user conference, useR! 2011, travel grants are available to help defray the cost of attending the conference in the UK. CRISM is offering bursaries for accommodation and conference fees, and Revolu...

In his detailed research on RSI(2) indicator, MarketSci emphasized several times that the contrarian strategies based on the RSI(2) indicator didn’t start working until the 80s. I remembered this observation recently when I observed another interesting anomaly … In statistics, an important initial step in studying time series data is to consider the auto correlation

Had a mental block today trying to figure out how to get the indices of columns in a data frame given their names. Simple task but difficult to search Google for an answer. Thanks to jashapiro, Matt, and Vince for giving me a heads up on the which() fu...

As we dig deeper into Stata or R debate, a few questions have come up.Question 1: One of the things Stata does well is the way it constructs new variables (see example below). How to do this in R? We can rewrite it as-is using for loops in R...

In part a, I presented a series of barplots which showed that the plurality of police

My friend Michael Bommarito has been doing the data community quite a service, capturing and sharing all of the traffic on Twitter related to the Iranian protests. Specifically, he has all of the tweets containing the #25bahman hast-tag, and made them available for anyone to download. I am unable to resist the temptation to explore a

Something like this probably already exists in an R package somewhere out there, but I needed a function to summarize how much missing data I have in each variable of a data frame in R. Pass a data frame to this function and for each variable it'll give you the number of missing values, the total N, and the...

