di Roma

March 4, 2012
By
di Roma

It has been a wonderful week in Roma, a mix of pleasant work and enjoyable free-time! I gave the ABC advanced course for the second time in a month so it did not require much in terms of preparation and there was a good sized audience with attentive (if too silent!) students and friends as

Read more »

Data visualization

March 4, 2012
By

For those who have not read the seminal works of Tufte and Cleveland, please hang your heads in shame. To salvage some sense of self-worth, you can then head over to Solomon Messing’s blog where he is starting a series on data visualization based on ...

Read more »

Boxplots and Day of Week Effects

March 4, 2012
By
Boxplots and Day of Week Effects

THIS BLOG DOES NOT CONSTITUTE INVESTMENT ADVICE. ACTING ON IT WILL MOST LIKELY BE DETRIMENTAL TO YOUR FINANCIAL HEALTH.After following some R-related quant finance blogs like Timely Portfolio, Systematic Investor or Quantitative tho...

Read more »

googleVis 0.2.15 is released: Improved geo and bubble charts

March 4, 2012
By
googleVis 0.2.15 is released: Improved geo and bubble charts

The guys behind the Google Visualisation API don't seem to rest. On 22 February 2012 they released an update of their API. Google added options for a gradient colour axis to bubble chart and a magnifying glass to geo chart, which opens when the user ho...

Read more »

ggplot2 0.9.0 released

March 4, 2012
By

This announcement was made by Hadley Wickham in the mailing list. ———————- # ggplot2 ggplot2 is a plotting system for R, based on the grammar of graphics, which tries to take the good parts of base and lattice graphics and avoid bad parts. It takes care of many of the fiddly details that make plotting a hassle (like drawing legends)...

Read more »

Interpretation of R-index

March 4, 2012
By
Interpretation of R-index

Having introduced the R-index, it is time to look how it works. For this a simple example is sufficient. What happens if a product is different from another product. To make this at least slightly realistic, three products are needed. Two products will...

Read more »

Visualization series: Insight from Cleveland and Tufte on plotting numeric data by groups

March 4, 2012
By
Visualization series: Insight from Cleveland and Tufte on plotting numeric data by groups

After my post on making dotplots with concise code using plyr and ggplot, I got an email from my dad who practices immigration law and runs a website with a variety of immigration resources and tools.  He pointed out that the … Continue reading →

Read more »

Spurious Regression illustrated

March 4, 2012
By
Spurious Regression illustrated

Spurious Regression problem dates back to Yule (1926): “Why Do We Sometimes Get Nonsense Correlations between Time-series?”. Lets see what is the problem, and how can we fix it. I am using Morgan Stanley (MS) symbol for illustration, pre-crisis time … Continue reading →

Read more »

Setting the Default RNG Seed in R

March 4, 2012
By

How to set the default seed for the RNG behind the runif(), sample() and other command? Well, there are several ways doing that (like setting .Random.seed directly), but as the documentation states, set.seed() is the recommended way to specify seeds.> ?set.seed> set.seed(0)> runif(1,0,1) 0.8966972> set.seed(0)> runif(1,0,1) 0.8966972> set.seed(0)> sample(1:10, 10) 9 3 10 5 ...

Read more »

My Pocket Change

March 4, 2012
By
My Pocket Change

I'm playing around with some personal data collection, and using some cloud computing to visualize it. Following the directions in this blog post, I've written an R function which visualizes data it draws from a Google Docs spreadsheet, and uploaded it...

Read more »

Capturing Tick Data via C#, Interactive Brokers, and MySQL

March 3, 2012
By
Capturing Tick Data via C#, Interactive Brokers, and MySQL

Interactive Brokers is a discount brokerage that provides a good API for programatically accessing their platform.  The purpose of this post is to create an application that will capture tick level data and save that data into a database for futur...

Read more »

Gastwirth’s location estimator

Gastwirth’s location estimator

The problem of outliers – data points that are substantially inconsistent with the majority of the other points in a dataset – arises frequently in the analysis of numerical data.  The practical importance of outliers lies in the fact that even a few of these points can badly distort the results of an otherwise reasonable data analysis.  This outlier-sensitivity...

Read more »

R versus Stata Redux

March 3, 2012
By

I've used both R and Stata for a long time, but these days I use Stata much more frequently than R. While R is useful for some kinds of graphics (especially three-dimensional graphics) and some statistical procedures (for example, finite mixture models...

Read more »

NIT: Fatty acids study in R – Part 002

March 2, 2012
By
NIT: Fatty acids study in R – Part 002

> library(chemometrics)> fatmsc_nipals<-nipals(fat_msc,a=10,it=160)> CPs<-seq(1,10,by=1)> matplot(CPs,t(fatmsc_nipals$T),lty=1,pch=21,  + xlab="PC_number",ylab="Explained_Var")In the 2D plot, we can see that with 3 or 4 principal...

Read more »

The German DIN33430 – Analysis of acceptance with R

March 2, 2012
By
The German DIN33430 – Analysis of acceptance with R

The german DIN33430 defines quality standards that must be met in “job-related proficiency assessments”, the qualifications of the responsible parties involved, as well as the creation, execution and evaluation of such assessments. Licensed persons are published on a website (german). … Weiterlesen →

Read more »

New data visualization features in ggplot2 update

March 2, 2012
By
New data visualization features in ggplot2 update

Hadley Wickham has just released an update to the ggplot2 graphics package for R. Version 0.9.0 significantly speeds up the process of rendering graphics, and the documentation is much improved (including the addition of many new examples). This update also adds a bunch of new features, which are documented in this 40-page "changes and additions" guide. Here's a sampling...

Read more »

What is R-index

March 2, 2012
By
What is R-index

R index is developed in interpreting signal detection data for human perception. In sensory research it is used to interpret ranking data. The value one gets out of an R-index calculation is interpreted as a confusion between samples tested. It has bee...

Read more »

How to square numbers in your head

March 2, 2012
By
How to square numbers in your head

MENTALLY MULTIPLY NUMBERS BY THEMSELVES Assume you know your multiplication tables up to 10x10. Here's how to compute the squares of numbers from 11 to 100.

Read more »

When Venn diagrams are not enough – Visualizing overlapping data with Social Network Analysis in R

March 2, 2012
By
When Venn diagrams are not enough – Visualizing overlapping data with Social Network Analysis in R

I recently thought about ways to visualize medications and their co-occurences in a group of children. As long as you want to visualize up to  4 different medications you can simply use Venn diagrams. There is a very nice R-package to generate these kind of graphics for you (for a  description see: Chen and Boutros, 2011). But

Read more »

A terrible 2000 words

March 2, 2012
By
A terrible 2000 words

I've only just started looking at the homicide data made available by the Philadelphia Inquirer in my free time (which is hard to come by lately). I've been thinking about what sorts of statistics I could do, or what kinds of additional data sets I cou...

Read more »

ABC in Roma [R lab #2]

March 2, 2012
By
ABC in Roma [R lab #2]

Here are the R codes of the second R lab organised by Serena Arima in supplement of my lectures (now completed!). This morning I covered ABC model choice and the following example is the benchmark used in the course (and in the paper) about the impact of summary statistics. (Warning! It takes a while to

Read more »

Modeling Trick: the Signed Pseudo Logarithm

March 1, 2012
By
Modeling Trick: the Signed Pseudo Logarithm

Much of the data that the analyst uses exhibits extraordinary range. For example: incomes, company sizes, popularity of books and any “winner takes all process”; (see: Living in A Lognormal World). Tukey recommended the logarithm as an important “stabilizing transform” (a transform that brings data into a more usable form prior to generating exploratory statistics, Related posts:

Read more »

Download and Parse NAREIT Data

March 1, 2012
By
Download and Parse NAREIT Data

This is the first post of a series that describes how to download and parse specific data sets into R. These kinds of scripts can be functionalized further, but I doubt that these will ever find their way into a formal package. They are intended to be helpful to those facing similar tasks, but as

Read more »

NIT: Fatty acids study in R – Part 001

March 1, 2012
By
NIT: Fatty acids study in R – Part 001

This time I´m going to use my own data to develop a model to predict some fatty acid in the solid fat (pork).Samples had been analyzed in a NIT (Near Infrared Transmittance) instrument. The range of the wavelengths is from 850 to 1048 nm (100 data poi...

Read more »

Poisson approximation of binomial probabilities

March 1, 2012
By
Poisson approximation of binomial probabilities

This is yet another experiment to see how good is the approximation of binomial probability when we use Poisson and normal distributions for scenarios with large $n$, and $p$ close to zero or one. Consider a problem where the random variable $X$ follows a binomial distribution with a known probability of success $p$, and number of trials $n$. If $n$...

Read more »

R code for Chapter 1 of Non-Life Insurance Pricing with GLM

March 1, 2012
By
R code for Chapter 1 of Non-Life Insurance Pricing with GLM

Insurance pricing is backwards and primitive, harking back to an era before computers. One standard (and good) textbook on the topic is Non-Life Insurance Pricing with Generalized Linear Models by Esbjorn Ohlsson and Born Johansson. We have been doing some work in this area recently. Needing a robust internal training course...

Read more »

R code for Chapter 1 of Non-Life Insurance Pricing with GLM

March 1, 2012
By
R code for Chapter 1 of Non-Life Insurance Pricing with GLM

Insurance pricing is backwards and primitive, harking back to an era before computers. One standard (and good) textbook on the topic is Non-Life Insurance Pricing with Generalized Linear Models by Esbjorn Ohlsson and Born Johansson. We have been doing some work in this area recently. Needing a robust internal training course and documented methodology, we have...

Read more »

Parallelizing Voting simulation

March 1, 2012
By
Parallelizing Voting simulation

Last week I have compared synchronous and asynchronous implementation of NetLogo Voting model. An interesting afterthought is that synchronous model implementation can be easily made much faster using vectorization.The two versions of the Voting synchr...

Read more »

I see high frequency data

March 1, 2012
By
I see high frequency data

In the previous post I shared an example how to get high frequency data from IB broker (well, it is retail version of HFD – it has only best bid/ask and the trades). Now, once you saved some data – what should you do next? Next logical step would be data sanity check and visualization.

Read more »