Revisiting text processing with R and Python

May 25, 2013
Back in 2011, I covered the relative performance difference of the most popular libraries for text processing in R and Python.   In case you can’t guess the answer, Python and NLTK  won by a significant margin over R and

HOWTO: X11 Forwarding for Oracle R Enterprise

May 25, 2013
Sentiment analysis finds trouble in the Enron emails

May 24, 2013
The Enron email dataset, collected during the FERC investigation of the Enron financial scandal, represents the largest publicly available set of emails. This makes theman ideal testbed for sentiment analysis algorithms. Ikanow's Andrew Strite used the open-source Infinit.e framework and a Hadoop cluster to generate sentiment scores for all of the Enron emails, and then used R to manipulate...

Down and Dirty Forecasting: Part 2

May 24, 2013
This is the second part of the forecasting exercise, where I am looking at a multiple regression. To keep it simple I chose the states that boarder WI and the US unemployment information for the regression. Again this is a down and dirty analysis, I wo...

What is probabilistic truth? Part 2 – Everything is conditional

May 24, 2013
Read Part 1 When making a statement of the form “1/2 is the correct probability that this coin will land tails”, there are a few things which are left unsaid, but which are typically implied. The statement is one about the probability of an unknown event occurring, and it would seem reasonable to write this

Down and Dirty Forecasting: Part 1

May 24, 2013
I wanted to see what I could do in a hurry using the commands found at Forecasting: Principles and Practice . I chose a simple enough data set of Wisconsin Unemployment from 1976 to the present (April 2013). I kept the last 12 months worth of...

Shiny + Concerto = YES !!!

May 23, 2013
So I have finally gotten beta access to the two most powerful R controlled web application makers in existence and produced very exciting experimental productsA few posts ago I posted a Visual Reasoning Test that I had made by hand and powered wit...

Robert Hijmans on Spatial Data Analysis

May 23, 2013
Last week at the Davis R Users’ Group Robert Hijmans gave a talk about spatial data analysis in R. Robert is a professor of biogeography at UC Davis and the author of the raster (analysis of gridded data), dismo (species distribution modeling), and geosphere (spherical trigonometry), packages.

Robert’s presentation spanned topics including basic...

7th R/Rmetrics workshop in Switzerland, June 30-July 4

May 23, 2013
The 7th annual R/Rmetrics Workshop om Computational Finance and Financial Engineering will take place June 30-July 4 in the beatiful alpine setting of Lake Thune, Switzerland. This is an intimate workshop limited to around 50 participants, and features tutorials from leading practitioners in finance with R, with a special focus on the Rmetrics suite of R packages. This year's...

Highlights of the Milwaukee Workshop on R and Bioinformatics

May 23, 2013
by Joseph Rickert On May 10th and 11th, in honor of this being the International Year of Statistics, the Milwaukee Chapter of the American Statistical Association (MILWASA) held a workshop on cutting edge uses of R in Bioinformatics. One objective of the workshop was to show the "nuts and bolts" details of how R with C++ integration and the...

Veterinary Epidemiologic Research: Modelling Survival Data – Non-Parametric Analyses

May 23, 2013
Next topic from Veterinary Epidemiologic Research: chapter 19, modelling survival data. We start with non-parametric analyses where we make no assumptions about either the distribution of survival times or the functional form of the relationship between a predictor and survival. There are 3 non-parametric methods to describe time-to-event data: actuarial life tables, Kaplan-Meier method, and

Generating a Markov chain vs. computing the transition matrix

May 23, 2013
$h\times h$

A couple of days ago, we had a quick chat on Karl Broman‘s blog, about snakes and ladders (see http://kbroman.wordpress.com/…) with Karl and Corey (see http://bayesianbiologist.com/….), and the use of Markov Chain. I do believe that this application is truly awesome: the example is understandable by anyone, and computations (almost any kind, from what we’ve tried) are easy to perform....

The R-Podcast Episode 13: Interview with Yihui Xie

May 23, 2013
It’s an episode of firsts on the R-Podcast! In this episode recorded on location I had the honor and privilege of interviewing Yihui Xie, author of many innovative packages such as knitr and animation. Some of the topics we discussed include: Yihui’s motivation for creating knitr and some key new features How markdown plays a

Vote in the KDnuggets poll on Analytics Software

May 22, 2013
The 14th annual KDnuggets poll measuring use of analytics software is open for voting. The poll asks, "What Predictive Analytics, Big Data, Data mining, Data Science software you used in the past 12 months for a real project?" and allows up to 20 choices from commercial software, open source software, and "big data" software. R was the leading choice...

Big Data Analytics in R – the tORCH has been lit!

May 22, 2013
How Important is Variable Selection?

May 22, 2013
Very. If you have 10 possible independent regressors, and none of which matter, you have a good chance to find at least one is important. A good chance being 40%: prob(one or more looks important) = 1 – prob(non looks …

Operating on files with R: copy and rename

Operating on files with R: copy and rename

Nowadays, routinary operations on files, such as renaming or copying, are performed with some mouse clicks. Sometimes, it is useful perform this operations in batch. Linux users perform this operations through the shell. Also Windows users can use the shell, …

What happened to six million voters?

May 22, 2013
The recent elections in Pakistan on May 11 were a great success by all means. In spite of the threats for violence by Al-Qaeda and its local franchises in Pakistan against those who would vote, millions of Pakistanis indeed stepped out to vote for an elected government. The Election Commission of Pakistan (ECP) claimed a voter turnout of 60%....

My Prime Sieve – Homage to Yitan Zhang

May 22, 2013
# As a homage to Yitang Zhang who has proven a mind-bending property of Prime Pairs, I have written a prime Sieve to detect all of the prime numbers from 1 to N. # There might very well be a function in the base package that already does this. No...

Video: R, ProjectTemplate, RStudio and GitHub: Automate the boring bits and get on with the fun stuff

May 22, 2013
This post shares the video from the talk presented on 15th May 2013 by Dr Kendra Vant on ProjectTemplate, github and Rstudio at Melbourne R Users. Overview: Want to minimise the drudge work of data prep? Get started with test …

May 21, 2013
The OpenData StackExchange site has just launched in beta, and looks to be a great resource for open data sources. Like StackOverflow for programming and CrossValidated for statistics, OpenData is is a question and answer site for developers and researchers interested in open data. There's no R tag yet (though that would be nice for data sources specifically compatible...

Getting to the point – an alternative to the bezier arrow

May 21, 2013
(This article was first published on G-Forge » R, and kindly contributed to R-bloggers) An alternative bezier arrow to the regular grid-bezier. Apart from a cool gradient it has the advantages of: exact width, exact start/end points and axis alignment. About two weeks ago I got frustrated with the bezierGrob function in the grid package. The lwd parameter is...

Spatial correlograms in R: a mini overview

May 21, 2013
Spatial correlograms are great to examine patterns of spatial autocorrelation in your data or model residuals. They show how correlated are pairs of spatial observations when you increase the distance (lag) between them - they are plots of some index…

Slide: one function for lag/lead variables in data frames, including time-series cross-sectional data

May 21, 2013
I often want to quickly create a lag or lead variable in an R data frame. Sometimes I also want to create the lag or lead variable for different groups in a data frame, for example, if I want to lag GDP for each country in a data frame.

I've found the various R methods for doing this hard...

An R debugging example

May 21, 2013
The steps taken to fix an R problem. Task To prepare for the Portfolio Probe blog post called “Implied alpha and minimum variance”, I tried to update a matrix of daily stock prices using a function I had written for the purpose. Error When I tried to do what I wanted, I got: > univclose130518

The post An...

R programming challenge: Escape the zombie horde

May 20, 2013
So when the world is taken over by a Zombie horde, you're going to want to figure out a way to get the human population to safety. This R script by econometrician Francis Smart won't help you do that exactly, but given a list of waypoints to navigate through zombie-infested lands to a safe house, it will tell you...

Solving Multiple Supplier Selection Problem using R and LP Solve

May 20, 2013
(This article was first published on Enterprise Software Doesn't Have to Suck, and kindly contributed to R-bloggers)

Non-Verbal Reasoning Test – Concerto

May 20, 2013
I have just released my first complete test of non-verbal problem solving skills.  It is run on Concerto (an R-based application development platform targeted at primarily test developers)  Try it out by following the link below.Non-Verbal Re...