Cricket All Round Performances

September 19, 2011
By

In cricket a player who can perform well with both the bat and bowl is a great asset for any team and across the history of international cricket there have been a number of cricketers that hall into this bracket. It is difficult to specify a set of criteria to determine whether a player can

Read more »

About commercial publishers

September 19, 2011
By
About commercial publishers

Julien Cornebise has pointed out a recent Guardian article. It is about commercial publishers of academic journals, mainly Elsevier, Springer, and Wiley, with a clear stand from its title: “Academic publishers make Murdoch look like a socialist“! The valuable argument therein is that academic publishers make hefty profits (a 40% margin for Elsevier!)

Read more »

Using jri to connect JAVA to R

September 19, 2011
By

The R package rJava allows R to be accessed in Java programs. The part of the package that allows this is jri. The notes on the rJava site about getting jri to work didn’t help me much getting it to … Continue reading →

Read more »

R 2.14 to be released on October 31; R 2.13 patch on September 13

September 19, 2011
By

The next major release of R has been announced: R 2.14.0 is scheduled for October 31. Details are still coming in about the new features planned for this release, but R core member Luke Tierney has revealed some of the performance improvements expected, and R core member Brian Ripley has spoken of forthcoming low-level support for multi-threaded computing and...

Read more »

Appendable saving in R

September 19, 2011
By

One of the most crucial problems in HPC is that every error you make have much greater impact than in the normal computing — there is nothing more amusing than finding out that few-day simulation broke few minutes before the end because of an unfortunate value thrown by a random generator, typo in result saving code or

Read more »

Three free books for better programming in R (and any other language)

September 19, 2011
By

Like many users and producers of R packages, I have never had any formal training in computer science. I’ve come to to the conclusion that this is a serious omission in a professional researcher’s training. Computer scientists and professional hackers … Continue reading →

Read more »

rgdal + raster + RCurl = My next package

September 18, 2011
By
rgdal + raster + RCurl = My next package

This package has been a long time in the making.  In the end it’s more of a data package than a functional package, but pulling all the pieces together required me to learn some really cool packages: raster ( which I already knew ) rgdal and RCurl.  I’ll provide a littel bit of an overview

Read more »

DTW: dynamic time warping 动态时间规整

September 18, 2011
By

Basically, DTW (dynamic time warping) is an algorithm to output cumulative distance of two time sequences, which is widely used e.g. for classification and clustering.For example, when using k-mean for clustering, we can use DTW as distance function. Here is one of such nice instances (using R: http://www.rdatamining.com/examples/ts-mining)Relevant information from Anshul's email.  A review of DTW http://csdl.ics.hawaii.edu/techreports/08-04/08-04.pdfCode:Python code: https://mlpy.fbk.eu/R...

Read more »

Map the distribution of your sample by geolocating ip addresses or zip codes

September 18, 2011
By
Map the distribution of your sample by geolocating ip addresses or zip codes

Yesterday I wanted to create a map of participants from a study on social media and partisan selective exposure that Sean Westwood and I ran recently, with participants from Amazon’s Mechanical Turk.  We recorded ip addresses for each Turker participant, so … Continue reading →

Read more »

Implementation of the CDC Growth Charts in R

September 17, 2011
By

I implemented in R a function to re-create the CDC Growth Chart, according to the data provided by the CDC.In order to use this function, you need to download the .rar file available at this megaupload link.Mirror: mediafire link.Then unrar the file, a...

Read more »

Bayesian Models with Censored Data: A comparison of OLS, tobit and bayesian models

September 17, 2011
By
Bayesian Models with Censored Data: A comparison of OLS, tobit and bayesian models

The following R code models a censored dependent variable (in this case academic aptitude) using a traditional least squares, tobit, and Bayesian approaches.  As depicted below, the OLS estimates (blue) for censored data are inconsistent and will ...

Read more »

The Long Tail of the Pareto Distribution

The Long Tail of the Pareto Distribution

In my last two posts, I have discussed cases where the mean is of little or no use as a data characterization.  One of the specific examples I discussed last time was the case of the Pareto type I distribution, for which the density is given by:                        p(x) = aka/xa+1defined for all x > k, where k and a...

Read more »

littler 0.1.5

September 17, 2011
By

Brown-bag release time for littler. One of the minor cleanups in the 0.1.4 release from Thursday actually introduced a nasty little bug as you can't call Rf_KillAllDevices() when you do not have any graphics device. Doh. So with apologies for the l...

Read more »

UK R Courses – 2012

September 17, 2011
By
UK R Courses – 2012

The School of Mathematics & Statistics at Newcastle University (UK), are again running some R courses. In January, 2012, we will run: January 16th: Introduction to R; January 17th: Programming with R; January 18th & 19th: Advanced graphics with R. The courses aren’t aimed at teaching statistics, rather they aim to go through the fundemental

Read more »

Introduction to Beamer

September 17, 2011
By
Introduction to Beamer

A friend of mine, who is quite smart by the way (she is a PhD. student in Physics at Cambridge), recently asked me for some help with Beamer. Well most of my knowledge and code came from Utkarsh when I had started about a year ago. Initially, I ha...

Read more »

Elements of Bayesian Econometrics

September 16, 2011
By
Elements of Bayesian Econometrics

 posterior = (likelihood x prior) / integrated likelihoodThe combination of a prior distribution and a likelihood function is utilized to produce a posterior distribution.  Incorporating information from both the prior distribution and the likelihood function leads to a reduction in variance and an improved estimator. As n→...

Read more »

ifelse function in R only returns the first element

September 16, 2011
By

If you also favor to use the function, be aware of the returned value. For example:> ifelse(1>0, 3, 4) 3> ifelse(1>0, c(2, 3), c(4, 5)) # only the first element returned. 2 > ifelse(c(1:10)>5, 'on', 'off') "off" "off...

Read more »

R in the insurance industry

September 16, 2011
By
R in the insurance industry

Let's talk about R in the insurance industry today.  David Smith's blog entry reminded me about our poster at the R user conference in Warwick in August 2011:Using R in InsuranceWe presented examples on how R can be used in the insu...

Read more »

How to extract time series from large timestamped logs with R

September 16, 2011
By

Revolution Analytics' Joe Rickert has a new post on inside-R.org, demonstrating how you can use R and the RevoScaleR package to extract time series data from time-stamped logs (in this case, the "US Domestic Flights From 1990 to 2009" dataset on Infochimps): Analyzing time series data of all sorts is a fundamental business analytics task to which the R...

Read more »

Backtesting Part 2: Splits, Dividends, Trading Costs and Log Plots

September 16, 2011
By
Backtesting Part 2: Splits, Dividends, Trading Costs and Log Plots

Note: This post is NOT financial advice!  This is just a fun way to explore some of the capabilities R has for importing and manipulating data.   In my last post, I demonstrated how to backtest a simple momentum-based stock trading strategy ...

Read more »

Beta and expected returns

September 16, 2011
By
Beta and expected returns

Some pictures to explore the reality of the theory that stocks with higher beta should have higher expected returns. Figure 2 of “The effect of beta equal 1″ shows the return-beta relationship as downward sloping.  That’s a sample of size 1.  In this post we add six more datapoints. Data The exact same betas of … Continue reading...

Read more »

A multidimensional “which” function

September 16, 2011
By
A multidimensional “which” function

update Henrik Bengtsson commented that which(x, arr.ind=TRUE) gives the same result, rendering the blog below academic (thanks for the comment!). So, for academic interest, I'll leave it. In my defense, I implemented this kind of functionality in C some time … Continue reading →

Read more »

A multidimensional "which" function

September 16, 2011
By
A multidimensional "which" function

The well-known which function accepts a logical vector and returns the indices where its value equals TRUE. Actually, which also accepts matrices or multidimensional arrays. Internally, R uses a single index to run through such two- or higher-dimension...

Read more »

Soil-Landscape Block Diagrams in SoilWeb

September 16, 2011
By
Soil-Landscape Block Diagrams in SoilWeb

Users of our Google Earth interface to USDA-NCSS soils information will now see links to soil-landscape block diagrams listed within map unit descriptions. Automated Linking to NCSS Block Diagrams read more

Read more »

Always put comments in your code!

September 16, 2011
By
Always put comments in your code!

I have a paper which I wrote some years ago, which has not been finished, and which should be accompanied by an R package. So far nothing special, but at that time, I was only at the beginning of my affair with R, and so I made several mistakes (OK – I did also some

Read more »

Soil Series Query for SoilWeb

September 16, 2011
By
Soil Series Query for SoilWeb

A map depicting the spatial distribution of a given soil series can be very useful when working on a new soil survey, updating an old one, or searching for specific soil characteristics. We have recently added a soil series query facility to SoilWeb, w...

Read more »

Simulation studies in R – Using all cores and other tips

September 16, 2011
By

After working more seriously with simulations I noticed some updates were necessary to my previous setup. Most notably are the following three: It is very handy to explicitly call the different scenarios instead of using nested loops Storing intermediate results in single files obliviates the need to rerun an almost finished but crashed analysis and

Read more »

Beeswarm Plot with ggplot2

September 16, 2011
By
Beeswarm Plot with ggplot2

A colleague showed me results of his study project with beeswarm plots made by GraphPad. I was wondering if it could be implemented in R and more specifically with ggplot2. There is a R package allowing to draw such graphs, the beeswarm package (beeswa...

Read more »

Performance with ggplot2

September 16, 2011
By
Performance with ggplot2

Now after Reporting Good Enough to Share, let’s use ggplot2 and PerformanceAnalytics to turn this into this From TimelyPortfolio I have been notified that the colors aren’t great.  How does everyone like this? R code (click to download)...

Read more »