R package development

June 11, 2013
By
R package development

Building R packages is not particular hard, but it can be a bit of a daunting endeavour at the beginning, particularly if you are more of a statistician than a computer scientist or programmer. Some concepts may appear foreign or like red tape, yet man...

Read more »

Measures of Skewness and Kurtosis

June 10, 2013
By
Measures of Skewness and Kurtosis

Skewness and kurtosis in R are available in the moments package (to install an R package, click here), and these are:Skewness - skewnessKurtosis - kurtosisExample 1. Mirra is interested in the elapse time (in minutes) she spends on riding a tricycle fr...

Read more »

Scenario analysis for option strategies Pt. 2

June 10, 2013
By

No vivid improvements since the last post. However, I got it to the stage, when I can share the code and let you try it yourself: https://github.com/afraid2trade/SCENARIO_ANALYSIS.git Once you downloaded it, the only thing you need to open is "sa_work...

Read more »

Safe Loading of RData Files

June 10, 2013
By
Safe Loading of RData Files

Unless you have configured R not to ask, every time you close R or RStudio you are prompted to save your workspace. This saves an RData file to the working directory. The functions save.image() and save() offer a little more … Continue reading →

Read more »

Microsoft Office Metadata with R

June 10, 2013
By

Sometimes I need to retrieve various items of metadata from Microsoft Office files. For the 'old-style' (i.e. '.doc' and '.xls') files perhaps a solution in python, such as hachoir, was the best way to extract this data from the ole2 file format -...

Read more »

Bringing R to the Enterprise – new white paper available

June 10, 2013
By

Check out this new white paper entitled "Bringing R to the Enterprise -  A Familiar R Environment with Enterprise-Caliber Performance, Scalability, and Security." In this white paper, we begin with "Beyond the Laptop" exploring the ability to run R code in the database, working with CRAN packages at the database server, operationalizing R analytics, and...

Read more »

In case you missed it: May 2013 Roundup

June 10, 2013
By

In case you missed them, here are some articles from May of particular interest to R users: Billions of geotagged Tweets create a beautiful map of the world when plotted with the ggmap package. A review of Ryan Sheftel's talk at R/Finance, on how he uses R on the trading desk at Credit Suisse. Also, a quick take on...

Read more »

Where is the R Activity?

June 10, 2013
By
Where is the R Activity?

R has become one of the world’s most widely used

Read more »

The RStudio CRAN mirror

June 10, 2013
By
The RStudio CRAN mirror

RStudio maintains its own CRAN mirror, http://cran.rstudio.com. The server itself is a virtual machine run by Amazon’s EC2 service, and it syncs with the main CRAN mirror in Austria once per day. When you contact http://cran.rstudio.com, however, you’re probably not talking to our CRAN mirror directly. That’s because we use Amazon CloudFront, a content delivery

Read more »

Running time

June 10, 2013
By
Running time

Marta and I are doing some re-analysis of our Eurovision contest (some context here and here). We have slightly modified our original model (mostly, I have navigated the mess in Marta's notation $-$ it's OK: I'm not at risk of her mighty wrath, as I've...

Read more »

Le Monde puzzle [#822]

June 10, 2013
By
Le Monde puzzle [#822]

For once Le Monde math puzzle is much more easily solved on a piece of paper than in R, even in a plane from Roma: Given a partition of the set {1,…,N} in k groups, one considers the collection of all subsets of  the set {1,…,N} containing at least one element from each group. Show

Read more »

Measure of Relative Variability

June 10, 2013
By

The measure of relative variability is the coefficient of variation (CV). Unlike measures of absolute variability, the CV is unitless when it comes to comparisons between the dispersions of two distributions of different units of measurement. In R, CV ...

Read more »

Ripley Facts

June 10, 2013
By

Normally, this blog would only contain technical and scientific related posts. But this time I would like to share with you a very interesting phenomenon I came across on the R mailing list(s). I call it 'Ripley Facts' after the prolific statistician, ...

Read more »

Introduction to stable distributions for finance

June 10, 2013
By
Introduction to stable distributions for finance

A few basics about the stable distribution. Previously “The distribution of financial returns made simple” satirized ideas about the statistical distribution of returns, including the stable distribution. Origin As “A tale of two returns” points out, the log return of a long period of time is the sum of the log returns of the shorter … Continue reading...

Read more »

Measures of Absolute Variability

June 10, 2013
By

Measures of absolute variability deal with the dispersion of the data points. This include the following:Range - rangeInterquartile Range - IQRQuartile DeviationAverage DeviationStandard Deviation - sdThese measures of variability restrict to uniform u...

Read more »

Sobol Sensitivity Analysis

June 10, 2013
By
Sobol Sensitivity Analysis

Sensitivity analysis is the task of evaluating the sensitivity of a model output Y to input variables (X1,…,Xp). Quite often, it is assumed that this output is related to the input through a known function f :Y= f(X1,…,Xp). Sobol indices are generalizing the coefficient of the coefficient of determination in regression. The ith first order indice is the proportion of...

Read more »

You Do Not Need to Tell Me I Have A Typo in My Documentation

June 10, 2013
By
You Do Not Need to Tell Me I Have A Typo in My Documentation

So I just got yet yet another comment saying "you have a typo in your documentation". While I do appreciate these kind reminders, I think it might be a good exercise for those who want to try GIT and Github pull requests, which make it possible for y...

Read more »

Using Metadata to find Paul Revere

June 9, 2013
By
Using Metadata to find Paul Revere

London, 1772. I have been asked by my superiors to give a brief demonstration of the surprising effectiveness of even the simplest techniques of the new-fangled Social Networke Analysis in the pursuit of those who would seek to undermine the liberty enjoyed by His Majesty's subjects. This is in connection with the discussion of the role of "metadata" in

Read more »

Why are Birds Dinosaurs?

June 9, 2013
By
Why are Birds Dinosaurs?

Month after month, one of the most popular posts on the Paleocave blog is the How to Read a Cladogram post I did some time ago. I always intended to follow it up with more cladistic fun. So, hold onto your butts, we’re going to let the dinosaurs loose. Birds are dinosaurs. We’ve all heard

Read more »

Better Neighborhoods with R: Exploring and Analyzing SeeClickFix Data (part 1)

June 9, 2013
By
Better Neighborhoods with R: Exploring and Analyzing SeeClickFix Data (part 1)

Better Neighborhoods with R: Exploring and Analyzing SeeClickFix Data (part 1) The ‎ National Day of Civic Hacking took place …Continue reading »

Read more »

Improve The Efficiency in Joining Data with Index

June 9, 2013
By
Improve The Efficiency in Joining Data with Index

When managing big data with R, many people like to use sqldf() package due to its friendly interface or choose data.table() package for its lightening speed. However, very few would pay special attentions to small details that might significantly boost the efficiency of these packages by adding index to the data.frame or data.table. In my

Read more »

Mahout for R Users

June 9, 2013
By
Mahout for R Users

I have a few posts coming up on Apache Mahout so I thought it might be useful to share some notes. I came at it as primarily an R coder with some very rusty Java and C++ somewhere in the back of my head so that will be my point of reference. I’ve also included … Continue reading...

Read more »

How to read quickly large dataset in R?

June 9, 2013
By

Medal Allocations at the Comrades Marathon

June 9, 2013
By
Medal Allocations at the Comrades Marathon

Following up on my previous post regarding attrition rates at Comrades Marathon 2013, here are the statistics I have gathered for medal allocations. There is some interesting history behind the Comrades Marathon medals. For reference, the medals are allocated as follows: Gold medals to the first ten finishers in the men’s race and the ladies’ race;

Read more »

Exploratory Data Analysis: Kernel Density Estimation in R on Ozone Pollution Data in New York and Ozonopolis

Exploratory Data Analysis: Kernel Density Estimation in R on Ozone Pollution Data in New York and Ozonopolis

Introduction Recently, I began a series on exploratory data analysis; so far, I have written about computing descriptive statistics and creating box plots in R for a univariate data set with missing values.  Today, I will continue this series by analyzing the same data set with kernel density estimation, a useful non-parametric technique for visualizing

Read more »

Quartiles, Deciles, and Percentiles

June 9, 2013
By

The measures of position such as quartiles, deciles, and percentiles are available in quantile function. This function has a usage,where:x - the data pointsprob - the location to measurena.rm - if FALSE, NA (Not Available) data points are not ignoredna...

Read more »

Estimating Finite Mixture Models with Flexmix Package

June 9, 2013
By
Estimating Finite Mixture Models with Flexmix Package

In my post on 06/05/2013 (http://statcompute.wordpress.com/2013/06/05/estimating-composite-models-for-count-outcomes-with-fmm-procedure), I’ve shown how to estimate finite mixture models, e.g. zero-inflated Poisson and 2-class finite mixture Poisson models, with FMM and NLMIXED procedure in SAS. Today, I am going to demonstrate how to achieve the same results with flexmix package in R. R Code R Output for 2-Class Finite Mixture

Read more »

Quick and Simple D3 Network Graphs from R

June 8, 2013
By
Quick and Simple D3 Network Graphs from R

Sometimes I just want to quickly make a simple D3 JavaScript directed network graph with data in R. Because D3 network graphs can be manipulated in the browser–i.e. nodes can be moved around and highlighted–they're really nice for data exploration. They're also really nice in HTML presentations. So I put together a...

Read more »

Mean and Median

June 8, 2013
By

Mean in R is computed using the function mean. Consider the scores of 20 MSU-IIT students in Stat 101 exam with a hundred items: 70, 78, 66, 65, 50, 53, 48, 88, 95, 80, 85, 84, 81, 63, 68, 73, 75, 84, 49, and 77. Compute and interpret the mean and medi...

Read more »

Sponsors