Machine Learning Examples in R

February 12, 2012
By
Machine Learning Examples in R

This is a post that has been a long time in the making. Following on from the excellent Stanford Machine Learning Course I have made examples of the main algorithms covered in R.We have Linear RegressionFollowed by Neural NetworksAnd Support ...

Read more »

Classifying Breast Cancer as Benign or Malignant Using RTextTools

RTextTools has largely been used for topic classification in the social sciences. However, recent discussions with researchers at various universities have demonstrated that the package can be applied to a host of problems in the natural sciences as well.One such application is using text classification to identify breast cancer masses as benign or malignant. Using the Wisconsin Diagnostic Breast Cancer...

Read more »

piecewise regression

February 11, 2012
By
piecewise regression

A beta of a stock generally means its relation with the market, how many percent move we should expect from the stock when the market moves one percent. Market, being a somewhat vague notion is approximated here, as usual, using … Continue reading →

Read more »

Generating directed Watts-Strogatz network

February 11, 2012
By
Generating directed Watts-Strogatz network

There are two limitations of Watts-Strogatz network generator in igraph package: (1) it works only for undirected graphs and (2) rewiring algorithm can produce loops or multiple edges.You can use simplify function of such a graph, but then number of ed...

Read more »

R jags rjags on an ec2 instance

February 11, 2012
By

Winbugs and Jags free Item Response Theory from the dot matrix plots of proprietary software and open up a multicoloured world of posterior predictive model checking. Fitting IRT models using brute force is not for the impatient, however. That’s why, just as early psychometricians shipped off their calculations to teams of monks. I’ve shipped off my model fitting to...

Read more »

Stupid R tricks: using outer to create many data.frame subsets

February 11, 2012
By
Stupid R tricks: using outer to create many data.frame subsets

Selecting subsets of a data.frame is easy in R if you define the predicates manually. But if you need to define many conditions the standard slicing and subsetting methods are cumbersome. For this illustration I want to pick some large number of numerical ranges and label all of the rows that match any of the

Read more »

Revolution R and Fedora: Revisited

February 10, 2012
By
Revolution R and Fedora: Revisited

A previous post of mine had suggested that, despite them being extremely similar operating systems, and really there being no clear reason why, Revolution R 5.0, which does support Red Hat Enterprise Linux, refused to work on Fedora 16. The installation failed, dependencies could not be installed, tech support was singularly unhelpful because I wasn’t

Read more »

RTextTools Short Course Materials

Attached are some of the materials from the recent short course at UNC. For confidential reasons, we are unable to present all of the materials, but this is enough to get someone started. 1. Lecture; 2. Intro to R; 3. NY Times; 4.

Read more »

More Thoughts on Potential Audience Metrics for Hashtag Communities

February 10, 2012
By
More Thoughts on Potential Audience Metrics for Hashtag Communities

Following on from the sketched ideas relating to estimating the Potential Audience Size for a Hashtag Community?, here are a few quick doodles around the graph representation of the tag users – followers graph that explore the extent to which we can use quite simple counts and analyses to get a feel for how the

Read more »

Simplified Example of Systematic Investor’s Fine Work

February 10, 2012
By
Simplified Example of Systematic Investor’s Fine Work

THIS IS ONLY AN EXAMPLE AND IS NOT INVESTMENT ADVICE. ACTING ON THIS WILL LOSE LOTS OF MONEY. Systematic Investor Blog (be sure to check out the site) offers extremely good examples of how to use R in finance.  Since I firmly believe more examples...

Read more »

Revisiting homicide rates

February 10, 2012
By
Revisiting homicide rates

A pint of R plotted an interesting dataset: intentional homicides in South America. I thought the graphs were pretty but I was unhappy about the way information was conveyed in the plots; relative risk should be very important but number … Continue reading →

Read more »

Reading Code

February 10, 2012
By

Code Readability is maybe the most important part of producing reproducible research. If it's impossible (i.e. very costly) for somebody else to read/understand the computer code that underlies your results, then the odds are that they will never be...

Read more »

Visualising the Metropolis-Hastings algorithm

February 10, 2012
By
Visualising the Metropolis-Hastings algorithm

In a previous post, I demonstrated how to use my R package MHadapive to do general MCMC to estimate Bayesian models. The functions in this package are an implementation of  the Metropolis-Hastings algorithm. In this post, I want to provide an intuitive way to picture what is going on ‘under the hood’in this algorithm. The

Read more »

A new local R user group in Cambridge, UK

February 10, 2012
By

It turns out there's another local R user group in Cambridge, UK. It's called CambR, and organizing committee member Laurent Gatto described its history to me in an email: After meeting repeatedly at several R related conferences (Bioconductor meetings, useR 2011), some R enthusiasts thought Cambridge deserved a local R user group and founded CambR in September 2011. Since...

Read more »

R charts used for analysis at Politico

February 10, 2012
By
R charts used for analysis at Politico

Zack Abrahamson, the "data whiz" at political analysis site Politico, is apparently an R user. Politico's Feb 10 2012 chart of the day clearly uses the ggplot2 graphics package and (quoting Politico) looks into the disenchanted slice of the GOP that’s not engaged with its party’s primary. And that slice doesn’t like Mitt Romney. People say turnout's down. When...

Read more »

managing projects using RStudio

February 10, 2012
By
managing projects using RStudio

We're continually amazed with new developments within RStudio, the integrated developed environment for R that we highlighted previously (Among others, Andrew Gelman agrees with us about its value). The most recent addition addresses one of our earlie...

Read more »

MAT8886 exchangeability, credit risk and risk measures

February 10, 2012
By
MAT8886 exchangeability, credit risk and risk measures

Exchangeability is an extremely concept, since (most of the time) analytical expressions can be derived. But it can also be used to observe some unexpected behaviors, that we will discuss later on with a more general setting. For instance, in a old...

Read more »

"R": Predicting a Test Set (Gasoline)

February 9, 2012
By
"R": Predicting a Test Set (Gasoline)

> data(gasoline)> #60 spectra of gasoline (octane is the constituent) > #We divide the whole Set into a Train Set and a Test Set.> gasTrain<-gasoline> gasTest<-gasoline> #Let´s develop the PLSR with the Tain Set ...

Read more »

On Unpublished Software

February 9, 2012
By
On Unpublished Software

sciseekclaimtoken-4f343317d3d60 I ran across this post at The Tree of Life entitled ‘Interesting new metagenomics paper w/ one big big big caveat – critical software not available”. The long and short of it? Paper appears in Science, has fancy new methodology, lacks the software for someone else to use their methodology. Blog author understandably annoyed. But I

Read more »

Daily casualties in Syria

February 9, 2012
By
Daily casualties in Syria

Every new day brings its statistics of new deaths in Syria… Here is an attempt to learn about the Syrian uprising by the figures. Data vary among sources: the Syrian opposition provides the number of casualties by day (here on Dropbox), updated on 8 February 2012, with a total exceeding 8 000. We note first

Read more »

Slides and replay for "A backstage tour of ggplot2"

February 9, 2012
By

Many thanks to Hadley Wickham for his informative and entertaining webinar yesterday, "A backstage tour of ggplot2". Thanks also to everyone who submitted questions -- with more than 800 attendees live on the line we had many more questions than we had time to answer. For more ggplot2 information, Hadley kindly provided the following resources in his slides: ggplot2...

Read more »

Monitoring Progress Inside a Foreach Loop

February 9, 2012
By

The foreach package for R is excellent, and allows for code to easily be run in parallel. One problem with foreach is that it creates new RScript instances for each iteration of the loop, which prevents status messages from being logged to the console output. This is particularly frustrating during long-running tasks, when we are often unsure...

Read more »

Intentional Homicide in South America 1995-2010

February 9, 2012
By
Intentional Homicide in South America 1995-2010

Intentional homicide is defined as unlawful death purposefully inflicted on a person by another person. The source of this stat is The United Nations Office on Drugs and Crime (UNODC). I created the above image using ggplot2 which does 98% of the leg-work in most cases. Count is the number of homicides in a calendar year

Read more »

The reshape function

February 9, 2012
By
The reshape function

The other day I wrote about the R functions by, apply and friends, which allow me to operate on subsets of data. All those functions work nicely, if the data is given in the right format. More often than not it isn't and I have to reshape the data befo...

Read more »

Monitoring Progress Inside a Foreach Loop

February 9, 2012
By

The foreach package for R is excellent, and allows for code to easily be run in parallel. One problem with foreach is that it creates new RScript instances for each iteration of the loop, which prevents status messages from being logged to the console output. This is particularly frustrating during long-running tasks, when we are often unsure how much...

Read more »

GARCH estimation using maximum likelihood

February 9, 2012
By

In my previous post I presented my findings from my finance project under the guidance of Dr Susan Thomas. The results in my paper suggested that there are macroeconomic variables, particularly the INR/USD exchange rates, that help us understand the dynamics of stock returns. Although the results that I obtained were significant at 5%...

Read more »

Successful Two Day Workshop at UNC-Chapel Hill

This week the Odum Institute at UNC held a two day short course on text classification with RTextTools. The workshop, led by Loren Collingwood, covered the basics of content analysis, supervised learning and text classification, introduction to R, and how to use RTextTools. Participants brought in their own data on the second day, which the instructor helped them classify....

Read more »

Analyzing Twitter Data in R – Part 1

February 8, 2012
By

I recently began using the TwitteR package in R to examine my tweeting patterns. One of my first projects was to identify each of my Twitter followers, where they were located, how many tweets they had, and then plot their location on a map using a bubble which was related to their total number of

Read more »

Trust in the EU and National Parliaments

February 8, 2012
By
Trust in the EU and National Parliaments

I have been playing around with some data from Eurobarometer, to support some arguments for a small comment I am writing for the Maastricht Law Review. I got the data for the following two questions: I would like to ask you a question about how much ...

Read more »