Benchmarking matrix creation

October 23, 2012
By
Benchmarking matrix creation

Sometimes it is useful to take a vector, or one column/row of a matrix, and build a new matrix of identical copies of that vector. There are lots of different ways to do this, but I just discovered a new, and very straightforward way to do this with m...

Read more »

The basics of Value at Risk and Expected Shortfall

October 23, 2012
By
The basics of Value at Risk and Expected Shortfall

Value at Risk and Expected Shortfall are common risk measures.  Here is a quick explanation. Ingredients The first two ingredients are each a number: The time horizon — how many days do we look ahead? The probability level — how far in the tail are we looking? Ingredient number 3 is a prediction distribution of … Continue reading...

Read more »

Presidential Debates 2012

October 23, 2012
By
Presidential Debates 2012

I have been playing with the beta version of qdap utilizing the presidential debates as a data set. qdap is in a beta phase lacking documentation though I’m getting there. In previous blog posts (presidential debate 1 LINK and VP … Continue reading →

Read more »

It Takes 2 Lines of R Code to Discover Interesting Biology

October 23, 2012
By
It Takes 2 Lines of R Code to Discover Interesting Biology

The following biological phenomenon demonstrates just how elegant R code can be. In vertebrate genomes, a methyl group (-CH3) can be added to nucleotides. Such process of methylation is commonly associated with gene suppression. Most of the cytosines in the … Continue reading →

Read more »

googleVis 0.3.0/0.3.1 is released: It’s faster!

October 23, 2012
By
googleVis 0.3.0/0.3.1 is released: It’s faster!

Version 0.3.0 of the googleVis package for R has been released on CRAN on 20 October 2012. With this version we have been able to speed up the code considerably. The transformation of R data frames into JSON works significantly faster. The execution of the gvisMotionChart function in the World Bank demo is over 35 times...

Read more »

ChIP-seq Analysis with Bioconductor

October 22, 2012
By
ChIP-seq Analysis with Bioconductor

Often scientists are interested in finding genome-wide binding site of their protein of interest. R offers easy way to load and process the sequence files coming from ChIP-seq experiment. During the next weeks I’m going to present a pipeline that … Continue reading →

Read more »

Top Facebook Posts During the US Presidential Debate

October 22, 2012
By
Top Facebook Posts During the US Presidential Debate

The following data was collected during the Presidential Debate on the 22nd of October by tapping into the Facebook social graph API using R. The top three posted links during the debate for each candidate are: Obama- #1     http://bit.ly/QCODJg #2     http://bit.ly/RXstnm #3    http://bit.ly/P8MmJ1 Romney- #1    http://bit.ly/zDdsKf #2    http://bit.ly/SjFbKx

Read more »

Break even ratios for development investment decisions

October 22, 2012
By
Break even ratios for development investment decisions

Developers are constantly being told that it is worth making the effort when writing code to make it maintainable (whatever that might be). Looking at this effort as an investment what kind of return has to be achieved to make it worthwhile? Short answer: The percentage saving during maintenance has to be twice as great

Read more »

Force R help HTML server to always use the same URL port

October 22, 2012
By

The below code shows how to configure the 'help.ports' option in R such that the built-in R help server always uses the same URL port. Just add it to the .Rprofile file in your home directory (iff missing, create it). For more details, see help("Startup").# Force the URL of the help to http://127.0.0.1:21510options(help.ports=21510); A slighter fancier version is to use...

Read more »

Eight new R User Groups worldwide

October 22, 2012
By

There are new local R user groups in eight (!) countries to announce this month: Sweden is host to the first R user group in Scandinavia. StockholmR has been holding meetings since September, and their next meeting on November 28 will be on Teaching R and Data visualization using R. In Taiwan, the Taipei-based Taiwan useR Group holds regular...

Read more »

Get an R Data Frame from a MongoDB Query

October 22, 2012
By

There’s a good FAQ on how to do the MongoDB query -> R data frame but I wanted to post a more complete example that included the database connection and query setup since I suspect there are folks new to Mongo who would appreciate the end-to-end view. The code is fully annotated with comments, and

Read more »

Resurrect Posts on Japan and the Yen

October 22, 2012
By
Resurrect Posts on Japan and the Yen

As the Yen and Japan continue to get more interesting in my mind, I just wanted to resurrect some posts that I have done on Japan and the Yen and sort them by my favorites. Japan Trade by Geographic RegionJapanese Trade and the YenJapan Intentional or...

Read more »

Josh vs. himself (or: Firefly > all)

October 22, 2012
By
Josh vs. himself (or: Firefly > all)

For Jan...I've got no data for "S.H.I.E.L.D." :(Maybe, but just maybe, "Firefly" gets the way-to-early-cancelled bonus by the voting community.

Read more »

A statistical project bleg (urgent-ish)

October 22, 2012
By

We all know that politicians can play it a little fast and loose with the truth. This is particularly true in debates, where politicians have to think on their feet and respond to questions from the audience or from each … Continue reading →

Read more »

A statistical project bleg (urgent-ish)

October 22, 2012
By

We all know that politicians can play it a little fast and loose with the truth. This is particularly true in debates, where politicians have to think on their feet and respond to questions from the audience or from each other.  Usually, we find out a...

Read more »

Is it meaningful to talk about a probability of “65.7%” that Obama will win the election?

October 22, 2012
By
Is it meaningful to talk about a probability of “65.7%” that Obama will win the election?

The other day we had a fun little discussion in the comments section of the sister blog about the appropriateness of stating forecast probabilities to the nearest tenth of a percentage point. It started when Josh Tucker posted this graph from Nate Silver: My first reaction was: this looks pretty but it’s hyper-precise. I’m a The post Is...

Read more »

Getting data in and out of R

October 22, 2012
By
Getting data in and out of R

One of the great advantages of R is that it recognizes almost any data format that you can throw at it. There are a myriad of different possible file formats but I'll concentrate on the four files that we see almost exclusively in public health: Excel ...

Read more »

Predict User’s Return Visit within a day part-3

October 22, 2012
By
Predict User’s Return Visit within a day part-3

Welcome to the last part of the series on predicting user’s revisit to the website. In the  first part of series, I generated the logistic regression model for prediction problem whether a user will come back on  website in next 24 hours. In the second part, I discussed about model improvement and seen the model accuracy.

Read more »

Distribution of colors by flag

October 22, 2012
By
Distribution of colors by flag

A story: We showed you how to use R to assess flag similarity and make a scatter plot of raster images. Dr. Wickham referred us to the set of 2400 flag icons made available by GoSquared, and then (probably jokingly) challenged us to replicate the cool...

Read more »

Going to the Movies…

October 22, 2012
By
Going to the Movies…

Today, let us have a look at movies. The Internet Movie Database (IMDb) has some data dumps available on their website. It's a subset of the information available on the IMDb site, but it's more than enough. I will spare you my code to convert these da...

Read more »

Predict User’s Return Visit within a day part-2

October 22, 2012
By
Predict User’s Return Visit within a day part-2

Welcome to the second part of the series on predicting user’s revisit to the website. In my earlier blog Logistic Regression with R, I discussed what is logistic regression. In the first part of the series, we applied logistic regression to available data set. The problem statement there was whether a user will return in

Read more »

Predict User’s Return Visit within a day part-1

October 22, 2012
By
Predict User’s Return Visit within a day part-1

In my earlier blog, I have discussed about what is logistic regression? And how logistic model is generated in R? Now we will apply that learning on a specific problem of prediction. In this post, I will create a basic model to predict whether a user will return on website in next 24 hours. This

Read more »

Classes and Objects in R

October 21, 2012
By

Classes and objects in R Welcome back! In this blog post I'm going to try to tackle the concept of objects in R. R is said to be an “object oriented” language. I touched on this in my last post when we discussed the concatenate function c() and I'll go a bit beyond that this time. Speaking of the c() function, I'll begin this...

Read more »

Logistic Regression with R

October 21, 2012
By
Logistic Regression with R

Logistic Regression In my first blog post, I have explained about the what is regression? And how linear regression model is generated in R? In this post, I will explain what is logistic regression? And how the logistic regression model is generated in R? Let’s first understand logistic regression. Logistic regression is one of the

Read more »

Basics of JavaScript and D3 for R Users

October 21, 2012
By
Basics of JavaScript and D3 for R Users

Hadley Wickham, creator of the ggplot2 R package, has been learning JavaScript and its D3 library for the next iteration of ggplot2 (tentatively titled r2d3?)… so I suspect it’s only a matter of time before he pulls the rest of the … Continue reading →

Read more »

Player timelines with ggplot

October 21, 2012
By
Player timelines with ggplot

Timelines can be quite a handy way of getting an overview of a player’s career in terms of when they played, with which team and who were their contemporaries As often is the case, I turned to Stackoverflow to set me on my way for an R solution. In this instance, I did not take

Read more »

ggmcmc – diagnostic plots for MCMC with ggplot2

October 21, 2012
By
ggmcmc – diagnostic plots for MCMC with ggplot2

Xavier Fernández i Marín, who maintains the jags package on Gentoo Linux, writes to tell me he is developing the R package ggmcmc. This package is for visualizing Markov Chain Monte Carlo output using ggplot2 graphics and  should complement the … Continue reading →

Read more »

Looking to the PCA scores with GGobi

October 21, 2012
By
Looking to the PCA scores with GGobi

In this post I continue with the unsupervised exploration of oil spectra, which we have seen in previous post ( PCA with "ChemoSpec" - 001).In the manual "ChemoSpec:An R Package for Chemometric Analysis of Spectroscopic Data", (page 23) there is a brie...

Read more »

Momentum in R: Part 2

October 20, 2012
By
Momentum in R: Part 2

Many of the sites I linked to in the previous post have articles or papers on momentum investing that investigate the typical ranking factors; 3, 6, 9, and 12 month returns. Most (not all) of the articles seek to find which is the “best” look-back period to rank the assets. Say that the outcome of … Continue reading...

Read more »

Sponsors