Revisiting text processing with R and Python

May 25, 2013
By

Back in 2011, I covered the relative performance difference of the most popular libraries for text processing in R and Python.   In case you can’t guess the answer, Python and NLTK  won by a significant margin over R and

HOWTO: X11 Forwarding for Oracle R Enterprise

May 25, 2013
By

Normal 0 false false false EN-US X-NONE X-NONE ...

Sentiment analysis finds trouble in the Enron emails

May 24, 2013
By

The Enron email dataset, collected during the FERC investigation of the Enron financial scandal, represents the largest publicly available set of emails. This makes theman ideal testbed for sentiment analysis algorithms. Ikanow's Andrew Strite used the open-source Infinit.e framework and a Hadoop cluster to generate sentiment scores for all of the Enron emails, and then used R to manipulate...

Down and Dirty Forecasting: Part 2

May 24, 2013
By

This is the second part of the forecasting exercise, where I am looking at a multiple regression. To keep it simple I chose the states that boarder WI and the US unemployment information for the regression. Again this is a down and dirty analysis, I wo...

What is probabilistic truth? Part 2 – Everything is conditional

May 24, 2013
By

Read Part 1 When making a statement of the form “1/2 is the correct probability that this coin will land tails”, there are a few things which are left unsaid, but which are typically implied. The statement is one about the probability of an unknown event occurring, and it would seem reasonable to write this

Down and Dirty Forecasting: Part 1

May 24, 2013
By

I wanted to see what I could do in a hurry using the commands found at Forecasting: Principles and Practice . I chose a simple enough data set of Wisconsin Unemployment from 1976 to the present (April 2013). I kept the last 12 months worth of...

Shiny + Concerto = YES !!!

May 23, 2013
By

So I have finally gotten beta access to the two most powerful R controlled web application makers in existence and produced very exciting experimental productsA few posts ago I posted a Visual Reasoning Test that I had made by hand and powered wit...

Robert Hijmans on Spatial Data Analysis

May 23, 2013
By

Last week at the Davis R Users’ Group Robert Hijmans gave a talk about spatial data analysis in R. Robert is a professor of biogeography at UC Davis and the author of the raster (analysis of gridded data), dismo (species distribution modeling), and geosphere (spherical trigonometry), packages.

Robert’s presentation spanned topics including basic...

7th R/Rmetrics workshop in Switzerland, June 30-July 4

May 23, 2013
By

The 7th annual R/Rmetrics Workshop om Computational Finance and Financial Engineering will take place June 30-July 4 in the beatiful alpine setting of Lake Thune, Switzerland. This is an intimate workshop limited to around 50 participants, and features tutorials from leading practitioners in finance with R, with a special focus on the Rmetrics suite of R packages. This year's...

Highlights of the Milwaukee Workshop on R and Bioinformatics

May 23, 2013
By

by Joseph Rickert On May 10th and 11th, in honor of this being the International Year of Statistics, the Milwaukee Chapter of the American Statistical Association (MILWASA) held a workshop on cutting edge uses of R in Bioinformatics. One objective of the workshop was to show the "nuts and bolts" details of how R with C++ integration and the...

Veterinary Epidemiologic Research: Modelling Survival Data – Non-Parametric Analyses

May 23, 2013
By

Next topic from Veterinary Epidemiologic Research: chapter 19, modelling survival data. We start with non-parametric analyses where we make no assumptions about either the distribution of survival times or the functional form of the relationship between a predictor and survival. There are 3 non-parametric methods to describe time-to-event data: actuarial life tables, Kaplan-Meier method, and

Generating a Markov chain vs. computing the transition matrix

May 23, 2013
By
$h\times h$

A couple of days ago, we had a quick chat on Karl Broman‘s blog, about snakes and ladders (see http://kbroman.wordpress.com/…) with Karl and Corey (see http://bayesianbiologist.com/….), and the use of Markov Chain. I do believe that this application is truly awesome: the example is understandable by anyone, and computations (almost any kind, from what we’ve tried) are easy to perform....

The R-Podcast Episode 13: Interview with Yihui Xie

May 23, 2013
By

It’s an episode of firsts on the R-Podcast! In this episode recorded on location I had the honor and privilege of interviewing Yihui Xie, author of many innovative packages such as knitr and animation. Some of the topics we discussed include: Yihui’s motivation for creating knitr and some key new features How markdown plays a

Vote in the KDnuggets poll on Analytics Software

May 22, 2013
By

The 14th annual KDnuggets poll measuring use of analytics software is open for voting. The poll asks, "What Predictive Analytics, Big Data, Data mining, Data Science software you used in the past 12 months for a real project?" and allows up to 20 choices from commercial software, open source software, and "big data" software. R was the leading choice...

Big Data Analytics in R – the tORCH has been lit!

May 22, 2013
By

Normal 0 false false false EN-US X-NONE X-NONE ...

How Important is Variable Selection?

May 22, 2013
By

Very. If you have 10 possible independent regressors, and none of which matter, you have a good chance to find at least one is important. A good chance being 40%: prob(one or more looks important) = 1 – prob(non looks … Continue reading

Operating on files with R: copy and rename

Nowadays, routinary operations on files, such as renaming or copying, are performed with some mouse clicks. Sometimes, it is useful perform this operations in batch. Linux users perform this operations through the shell. Also Windows users can use the shell, … Continue reading

What happened to six million voters?

May 22, 2013
By

The recent elections in Pakistan on May 11 were a great success by all means. In spite of the threats for violence by Al-Qaeda and its local franchises in Pakistan against those who would vote, millions of Pakistanis indeed stepped out to vote for an elected government. The Election Commission of Pakistan (ECP) claimed a voter turnout of 60%....

My Prime Sieve – Homage to Yitan Zhang

May 22, 2013
By

# As a homage to Yitang Zhang who has proven a mind-bending property of Prime Pairs, I have written a prime Sieve to detect all of the prime numbers from 1 to N. # There might very well be a function in the base package that already does this. No...

Video: R, ProjectTemplate, RStudio and GitHub: Automate the boring bits and get on with the fun stuff

May 22, 2013
By

This post shares the video from the talk presented on 15th May 2013 by Dr Kendra Vant on ProjectTemplate, github and Rstudio at Melbourne R Users. Overview: Want to minimise the drudge work of data prep? Get started with test … Continue reading

May 21, 2013
By

The OpenData StackExchange site has just launched in beta, and looks to be a great resource for open data sources. Like StackOverflow for programming and CrossValidated for statistics, OpenData is is a question and answer site for developers and researchers interested in open data. There's no R tag yet (though that would be nice for data sources specifically compatible...

Getting to the point – an alternative to the bezier arrow

May 21, 2013
By

(This article was first published on G-Forge » R, and kindly contributed to R-bloggers) An alternative bezier arrow to the regular grid-bezier. Apart from a cool gradient it has the advantages of: exact width, exact start/end points and axis alignment. About two weeks ago I got frustrated with the bezierGrob function in the grid package. The lwd parameter is...

Spatial correlograms in R: a mini overview

May 21, 2013
By

Spatial correlograms are great to examine patterns of spatial autocorrelation in your data or model residuals. They show how correlated are pairs of spatial observations when you increase the distance (lag) between them - they are plots of some index…

Slide: one function for lag/lead variables in data frames, including time-series cross-sectional data

May 21, 2013
By

I often want to quickly create a lag or lead variable in an R data frame. Sometimes I also want to create the lag or lead variable for different groups in a data frame, for example, if I want to lag GDP for each country in a data frame.

I've found the various R methods for doing this hard...

An R debugging example

May 21, 2013
By

The steps taken to fix an R problem. Task To prepare for the Portfolio Probe blog post called “Implied alpha and minimum variance”, I tried to update a matrix of daily stock prices using a function I had written for the purpose. Error When I tried to do what I wanted, I got: > univclose130518

The post An...

R programming challenge: Escape the zombie horde

May 20, 2013
By

So when the world is taken over by a Zombie horde, you're going to want to figure out a way to get the human population to safety. This R script by econometrician Francis Smart won't help you do that exactly, but given a list of waypoints to navigate through zombie-infested lands to a safe house, it will tell you...

Solving Multiple Supplier Selection Problem using R and LP Solve

May 20, 2013
By

(This article was first published on Enterprise Software Doesn't Have to Suck, and kindly contributed to R-bloggers) To leave a comment for the author, please follow the link and comment on his blog: Enterprise Software Doesn't Have to Suck. R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming...

Non-Verbal Reasoning Test – Concerto

May 20, 2013
By

I have just released my first complete test of non-verbal problem solving skills.  It is run on Concerto (an R-based application development platform targeted at primarily test developers)  Try it out by following the link below.Non-Verbal Re...