Monthly Archives: March 2012

Visualization series: Insight from Cleveland and Tufte on plotting numeric data by groups

March 4, 2012
By
Visualization series: Insight from Cleveland and Tufte on plotting numeric data by groups

After my post on making dotplots with concise code using plyr and ggplot, I got an email from my dad who practices immigration law and runs a website with a variety of immigration resources and tools.  He pointed out that the … Continue reading

Read more »

Spurious Regression illustrated

March 4, 2012
By
Spurious Regression illustrated

Spurious Regression problem dates back to Yule (1926): “Why Do We Sometimes Get Nonsense Correlations between Time-series?”. Lets see what is the problem, and how can we fix it. I am using Morgan Stanley (MS) symbol for illustration, pre-crisis time … Continue reading

Read more »

Setting the Default RNG Seed in R

March 4, 2012
By

How to set the default seed for the RNG behind the runif(), sample() and other command? Well, there are several ways doing that (like setting .Random.seed directly), but as the documentation states, set.seed() is the recommended way to specify seeds.




> ?set.seed
> set.seed(0)
> runif(1,0,1)
0.8966972
> set.seed(0)
> runif(1,0,1)
0.8966972

> set.seed(0)
> sample(1:10, 10)
9 3 10 5 ...

Read more »

My Pocket Change

March 4, 2012
By
My Pocket Change

I'm playing around with some personal data collection, and using some cloud computing to visualize it. Following the directions in this blog post, I've written an R function which visualizes data it draws from a Google Docs spreadsheet, and uploaded it...

Read more »

Capturing Tick Data via C#, Interactive Brokers, and MySQL

March 3, 2012
By
Capturing Tick Data via C#, Interactive Brokers, and MySQL

Interactive Brokers is a discount brokerage that provides a good API for programatically accessing their platform.  The purpose of this post is to create an application that will capture tick level data and save that data into a database for futur...

Read more »

Gastwirth’s location estimator

Gastwirth’s location estimator

The problem of outliers – data points that are substantially inconsistent with the majority of the other points in a dataset – arises frequently in the analysis of numerical data.  The practical importance of outliers lies in the fact that even a few of these points can badly distort the results of an otherwise reasonable data analysis.  This outlier-sensitivity...

Read more »

R versus Stata Redux

March 3, 2012
By

I've used both R and Stata for a long time, but these days I use Stata much more frequently than R. While R is useful for some kinds of graphics (especially three-dimensional graphics) and some statistical procedures (for example, finite mixture models...

Read more »

NIT: Fatty acids study in R – Part 002

March 2, 2012
By
NIT: Fatty acids study in R – Part 002

> library(chemometrics)> fatmsc_nipals<-nipals(fat_msc,a=10,it=160)> CPs<-seq(1,10,by=1)> matplot(CPs,t(fatmsc_nipals$T),lty=1,pch=21,  + xlab="PC_number",ylab="Explained_Var")In the 2D plot, we can see that with 3 or 4 principal...

Read more »

The German DIN33430 – Analysis of acceptance with R

March 2, 2012
By
The German DIN33430 – Analysis of acceptance with R

The german DIN33430 defines quality standards that must be met in “job-related proficiency assessments”, the qualifications of the responsible parties involved, as well as the creation, execution and evaluation of such assessments. Licensed persons are published on a website (german). … Weiterlesen →

Read more »

New data visualization features in ggplot2 update

March 2, 2012
By
New data visualization features in ggplot2 update

Hadley Wickham has just released an update to the ggplot2 graphics package for R. Version 0.9.0 significantly speeds up the process of rendering graphics, and the documentation is much improved (including the addition of many new examples). This update also adds a bunch of new features, which are documented in this 40-page "changes and additions" guide. Here's a sampling...

Read more »