# Monthly Archives: February 2012

## Analyzing weblog data with R

February 23, 2012
By

The R-chart blog explains how to read a weblog file into R, so you can analyze traffic to a website. For example, here's a page request chart created with R: Now, charts like this are stock-in-trade for tools like Google Analytics, but this is still useful if you want to look at the performance of a site that hasn't...

## GSoC Project #2 for 2012

February 23, 2012
By

In my prior post, I discussed the origins of the first GSoC project I posted this year. The second GSoC project I’ve proposed is around the writing and code of Attilio Meucci, an adjunct professor at Baruch College – CUNY and an excellent speaker (I saw him at the University of Chicago when he spoke

## Large-scale Inference

February 23, 2012
By

Large-scale Inference by Brad Efron is the first IMS Monograph in this new series, coordinated by David Cox and published by Cambridge University Press. Since I read this book immediately after Cox’ and Donnelly’s Principles of Applied Statistics, I was thinking of drawing a parallel between the two books. However, while none of them can

## Pocketbook costs of software

February 23, 2012
By

I have always been provided SAS as part of my job, so I never really realized how much it cost. I’ve bought Stata before, and of course R . I recently found out how much a reasonable bundle of SAS modules along with base SAS costs per year per seat, at least under the GSA.

## Ternary ifelse ( ?: ) in different languages

February 23, 2012
By

AWK$awk 'ORS=NR%3?",":"\n"' student-marksPerl /PHP$result = ($a >$b) ? $x :$y;In Per6, use double ? and ! instead.$result = ($a > $b) ??$x !! \$y;Rifelse(a>0,a,0)Ternary operator (if?true:false)bash/linuxternary operator ? : is ju...

## PCA for NIR Spectra_part 002: "Score planes"

February 23, 2012
By

The idea of this post is to compare the score plots for the first 3 principal components obtained with the algorithm “svd” with the scores plot of  other chemometric software (Win ISI in this case). Previously I had exported the yarn spectra t...

## Prediction: the Lasso vs. just using the top 10 predictors

February 23, 2012
By

One incredibly popular tool for the analysis of high-dimensional data is the lasso. The lasso is commonly used in cases when you have many more predictors than independent samples (the n « p) problem. It is also often used in the context of predictio...

## Visualization in regression analysis

February 23, 2012
By

Visualization is a key to success in regression analysis. This is one of the (many) reasons I am also suspicious when I read an article with a quantitative (econometric) analysis without any graph. Consider for instance the following dataset, obtai...

## Example 9.21: The birthday "problem" re-examined

February 23, 2012
By

The so-called birthday paradox or birthday problem is simply the counter-intutitive discovery that the probability of (at least) two people in a group sharing a birthday goes up surprisingly fast as the group size increases. If the group is only 23 peo...