Revolution R and Fedora: Revisited

February 10, 2012
By
Revolution R and Fedora: Revisited

A previous post of mine had suggested that, despite them being extremely similar operating systems, and really there being no clear reason why, Revolution R 5.0, which does support Red Hat Enterprise Linux, refused to work on Fedora 16. The installation failed, dependencies could not be installed, tech support was singularly unhelpful because I wasn’t

Read more »

RTextTools Short Course Materials

Attached are some of the materials from the recent short course at UNC. For confidential reasons, we are unable to present all of the materials, but this is enough to get someone started. 1. Lecture; 2. Intro to R; 3. NY Times; 4.

Read more »

More Thoughts on Potential Audience Metrics for Hashtag Communities

February 10, 2012
By
More Thoughts on Potential Audience Metrics for Hashtag Communities

Following on from the sketched ideas relating to estimating the Potential Audience Size for a Hashtag Community?, here are a few quick doodles around the graph representation of the tag users – followers graph that explore the extent to which we can use quite simple counts and analyses to get a feel for how the

Read more »

Simplified Example of Systematic Investor’s Fine Work

February 10, 2012
By
Simplified Example of Systematic Investor’s Fine Work

THIS IS ONLY AN EXAMPLE AND IS NOT INVESTMENT ADVICE. ACTING ON THIS WILL LOSE LOTS OF MONEY. Systematic Investor Blog (be sure to check out the site) offers extremely good examples of how to use R in finance.  Since I firmly believe more examples...

Read more »

Revisiting homicide rates

February 10, 2012
By
Revisiting homicide rates

A pint of R plotted an interesting dataset: intentional homicides in South America. I thought the graphs were pretty but I was unhappy about the way information was conveyed in the plots; relative risk should be very important but number … Continue reading →

Read more »

Reading Code

February 10, 2012
By

Code Readability is maybe the most important part of producing reproducible research. If it's impossible (i.e. very costly) for somebody else to read/understand the computer code that underlies your results, then the odds are that they will never be...

Read more »

Visualising the Metropolis-Hastings algorithm

February 10, 2012
By
Visualising the Metropolis-Hastings algorithm

In a previous post, I demonstrated how to use my R package MHadapive to do general MCMC to estimate Bayesian models. The functions in this package are an implementation of  the Metropolis-Hastings algorithm. In this post, I want to provide an intuitive way to picture what is going on ‘under the hood’in this algorithm. The

Read more »

A new local R user group in Cambridge, UK

February 10, 2012
By

It turns out there's another local R user group in Cambridge, UK. It's called CambR, and organizing committee member Laurent Gatto described its history to me in an email: After meeting repeatedly at several R related conferences (Bioconductor meetings, useR 2011), some R enthusiasts thought Cambridge deserved a local R user group and founded CambR in September 2011. Since...

Read more »

R charts used for analysis at Politico

February 10, 2012
By
R charts used for analysis at Politico

Zack Abrahamson, the "data whiz" at political analysis site Politico, is apparently an R user. Politico's Feb 10 2012 chart of the day clearly uses the ggplot2 graphics package and (quoting Politico) looks into the disenchanted slice of the GOP that’s not engaged with its party’s primary. And that slice doesn’t like Mitt Romney. People say turnout's down. When...

Read more »

managing projects using RStudio

February 10, 2012
By
managing projects using RStudio

We're continually amazed with new developments within RStudio, the integrated developed environment for R that we highlighted previously (Among others, Andrew Gelman agrees with us about its value). The most recent addition addresses one of our earlie...

Read more »

MAT8886 exchangeability, credit risk and risk measures

February 10, 2012
By
MAT8886 exchangeability, credit risk and risk measures

Exchangeability is an extremely concept, since (most of the time) analytical expressions can be derived. But it can also be used to observe some unexpected behaviors, that we will discuss later on with a more general setting. For instance, in a old...

Read more »

"R": Predicting a Test Set (Gasoline)

February 9, 2012
By
"R": Predicting a Test Set (Gasoline)

> data(gasoline)> #60 spectra of gasoline (octane is the constituent) > #We divide the whole Set into a Train Set and a Test Set.> gasTrain<-gasoline> gasTest<-gasoline> #Let´s develop the PLSR with the Tain Set ...

Read more »

On Unpublished Software

February 9, 2012
By
On Unpublished Software

sciseekclaimtoken-4f343317d3d60 I ran across this post at The Tree of Life entitled ‘Interesting new metagenomics paper w/ one big big big caveat – critical software not available”. The long and short of it? Paper appears in Science, has fancy new methodology, lacks the software for someone else to use their methodology. Blog author understandably annoyed. But I

Read more »

Daily casualties in Syria

February 9, 2012
By
Daily casualties in Syria

Every new day brings its statistics of new deaths in Syria… Here is an attempt to learn about the Syrian uprising by the figures. Data vary among sources: the Syrian opposition provides the number of casualties by day (here on Dropbox), updated on 8 February 2012, with a total exceeding 8 000. We note first

Read more »

Slides and replay for "A backstage tour of ggplot2"

February 9, 2012
By

Many thanks to Hadley Wickham for his informative and entertaining webinar yesterday, "A backstage tour of ggplot2". Thanks also to everyone who submitted questions -- with more than 800 attendees live on the line we had many more questions than we had time to answer. For more ggplot2 information, Hadley kindly provided the following resources in his slides: ggplot2...

Read more »

Monitoring Progress Inside a Foreach Loop

February 9, 2012
By

The foreach package for R is excellent, and allows for code to easily be run in parallel. One problem with foreach is that it creates new RScript instances for each iteration of the loop, which prevents status messages from being logged to the console output. This is particularly frustrating during long-running tasks, when we are often unsure...

Read more »

Intentional Homicide in South America 1995-2010

February 9, 2012
By
Intentional Homicide in South America 1995-2010

Intentional homicide is defined as unlawful death purposefully inflicted on a person by another person. The source of this stat is The United Nations Office on Drugs and Crime (UNODC). I created the above image using ggplot2 which does 98% of the leg-work in most cases. Count is the number of homicides in a calendar year

Read more »

The reshape function

February 9, 2012
By
The reshape function

The other day I wrote about the R functions by, apply and friends, which allow me to operate on subsets of data. All those functions work nicely, if the data is given in the right format. More often than not it isn't and I have to reshape the data befo...

Read more »

Monitoring Progress Inside a Foreach Loop

February 9, 2012
By

The foreach package for R is excellent, and allows for code to easily be run in parallel. One problem with foreach is that it creates new RScript instances for each iteration of the loop, which prevents status messages from being logged to the console output. This is particularly frustrating during long-running tasks, when we are often unsure how much...

Read more »

GARCH estimation using maximum likelihood

February 9, 2012
By

In my previous post I presented my findings from my finance project under the guidance of Dr Susan Thomas. The results in my paper suggested that there are macroeconomic variables, particularly the INR/USD exchange rates, that help us understand the dynamics of stock returns. Although the results that I obtained were significant at 5%...

Read more »

Successful Two Day Workshop at UNC-Chapel Hill

This week the Odum Institute at UNC held a two day short course on text classification with RTextTools. The workshop, led by Loren Collingwood, covered the basics of content analysis, supervised learning and text classification, introduction to R, and how to use RTextTools. Participants brought in their own data on the second day, which the instructor helped them classify....

Read more »

Analyzing Twitter Data in R – Part 1

February 8, 2012
By

I recently began using the TwitteR package in R to examine my tweeting patterns. One of my first projects was to identify each of my Twitter followers, where they were located, how many tweets they had, and then plot their location on a map using a bubble which was related to their total number of

Read more »

Trust in the EU and National Parliaments

February 8, 2012
By
Trust in the EU and National Parliaments

I have been playing around with some data from Eurobarometer, to support some arguments for a small comment I am writing for the Maastricht Law Review. I got the data for the following two questions: I would like to ask you a question about how much ...

Read more »

What is the Potential Audience Size for a Hashtag Community?

February 8, 2012
By
What is the Potential Audience Size for a Hashtag Community?

What’s the potential audience size around a Twitter hashtag? Way back when, in the early days of webs stats, reported figures tended to centre around the notion of hits, the number of calls made to a server via website activity. I forget the details, but the metric was presumably generated from server logs. This measure

Read more »

Oracle’s strange understanding of R users

February 8, 2012
By

After reading David Smith’s tweet on the price of Oracle R Enterprise (actually free, but it requires Oracle Data Mining at $23K/core as pointed out by Joshua Ulrich.) I went to Oracle’s site to see what was all about. Oracle … Continue reading →

Read more »

discrimination between CpG islands and random sequences using Markov chains

February 8, 2012
By
discrimination between CpG islands and random sequences using Markov chains

Major part of modern research is trying to find patterns in the given dataset using learning methods. One of the methods that can use a priori information for such purpose is Markov chains, in which the probability of symbol occurrence … Continue reading →

Read more »

Revolution R update adds Red Hat 6 support

February 8, 2012
By

The Dev Team at Revolution Analytics recently released an update to the Revolution R 5 family. Version 5.0.1 adds compatibility with Red Hat Enterprise Linux 6 for all editions (Community, Academic and Enterprise). This expands the platform support to Red Hat 5, Red Hat 6 and Microsoft Windows. For Revolution R Enterprise customers and users of the free Academic...

Read more »

"R": PLS Regression (Gasoline) – 005

February 8, 2012
By
"R": PLS Regression (Gasoline) – 005

Let´s see know how to plot the scores for the 3 PLS Components:  We can see the explained variance from each component in the diagonal.We can get it from R with:> explvar(gas1)   Comp 1      Comp 2  &nbs...

Read more »

Zero rates with futile.paradigm

February 8, 2012
By
Zero rates with futile.paradigm

Here’s a short example of calculating zero rates and discount factors from cash rates using futile.paradigm. Of note is how …Continue reading »

Read more »