Smoothing temporally correlated data

July 21, 2011
By
Smoothing temporally correlated data

Something I have been doing a lot of work with recently are time series data, to which I have been fitting additive models to describe trends and other features of the data. When modelling temporally dependent data, we often need … Continue reading →

Read more »

One-liners which make me love R: twitteR’s searchTwitter() #rstats

July 21, 2011
By
One-liners which make me love R: twitteR’s searchTwitter() #rstats

R reminds me a lot of English. It’s easy to get started, but very difficult to master. So for all those times I’ve spent… well, forever… trying to figure out the “R way” of doing something, I’m glad to share these quick wins. My recent R tutorial on mining Twitter for consumer sentiment wouldn’t have

Read more »

Smoothing temporally correlated data

July 21, 2011
By
Smoothing temporally correlated data

Something I have been doing a lot of work with recently are time series data, to which I have been fitting additive models to describe trends and other features of the data. When modelling temporally dependent data, we often need to adjust our fitted models to account for the lack of independence in the model residuals. When smoothing such...

Read more »

Showcasing the latest phylogenetic methods: AUTEUR

July 20, 2011
By
Showcasing the latest phylogenetic methods: AUTEUR

While high-speed fish feeding videos may be the signature of the lab, dig a bit deeper and you’ll find a wealth of comparative phylogenetic methods sneaking in.  It’s a natural union — expert functional morphology is the key to good comparative methods, just as phylogenies hold the key to untangling the evolutionary origins of that

Read more »

Regional differences on what drives CO2 emissions

July 20, 2011
By
Regional differences on what drives CO2 emissions

If you are investigating the change of CO2 emissions, then you might ask: Where do the changes occur? Well here is the answer.The staircase plots show the contributing factors to CO2 emissions for each continent. population refers to population effects, gdp_pcap refers to income per capita, energy_intensity refers to energy used per dollar added value, and carbon intensity...

Read more »

Slides for Reproducible Research Talk at Interface 2011

July 20, 2011
By
Slides for Reproducible Research Talk at Interface 2011

I gave a talk at the Interface Symposium on reproducible research in practice. I went first in the session, so the slides have a bit more background and philosophy. It was a great session; one of Jon Claerbout's colleagues spoke, Sergey Fomel, a founding author of Madagascar; Sorin Mitran from UNC Chapel Hill talked about

Read more »

Simpson’s Paradox

July 20, 2011
By
Simpson’s Paradox

A few days ago I heard a talk about Simpson's paradox, and I decided to write a little example in R:library(MASS) # For multivariate normals# List of (vectors of) meansmu <- list(c(5, 175), c(6.25, 110))# List of covariance matricessigma ...

Read more »

Visualizing Kickstarter Projects with R

July 20, 2011
By
Visualizing Kickstarter Projects with R

Kickstarter, a social funding platform where individuals can chip in cash to get a worthy project going, just celebrated their 10,000th kickstarted project. Kickstart employee Fred Benenson recognized the achievement by visualizing the funding of music, design, art, game and many other kinds of projects using R and ggplot2. For example, here's a chart that shows the increasing rate...

Read more »

Showcasing the latest phylogenetic methods: AUTEUR

July 20, 2011
By
Showcasing the latest phylogenetic methods: AUTEUR

While high-speed fish feeding videos may be the signature of the lab, dig a bit deeper and you’ll find a wealth of comparative phylogenetic methods sneaking in.  It’s a natural union — expert functional morphology is the key to good … Continue reading →

Read more »

Shorting Mebane Faber

July 19, 2011
By
Shorting Mebane Faber

Although I do not personally know Mebane Faber, I know enough that I do not want to short him. However, I thought it would be insightful to see how the short side of his “A Quantitative Approach To Tactical Asset Allocation” might look.  Once ...

Read more »

The Road to Default: The Other Side of the Story

July 19, 2011
By
The Road to Default: The Other Side of the Story

Okay so I was gliding through the articles of CNBC.com and stumbled upon one titled, "A Downgrade of U.S. Debt Won't Matter as Much as You Think." The argument laid down in this piece is that insurance companies and pension funds are required to hold h...

Read more »

Looking for NppToR beta testers.

July 19, 2011
By

NppToR 2.6 is coming with improved flexibility and speed. Testers needed before setting as default.

Read more »

Version 1.3 Verified

July 19, 2011
By
Version 1.3 Verified

Whew, For a few hours last night I struggled with a bug, err several bugs. In the end some of the bugs were mine, some of them were “upstream” and some of them were  far “upstream”.  Image  bugs in R 2.13.1. But the math bugs ( or flat out errors on my part) have all

Read more »

Growth in data-related jobs

July 19, 2011
By
Growth in data-related jobs

At job-search site indeed.com, you can take a look at trends in the use of keywords used in job postings. As you might expect, job postings containing terms related to making sense from data are on the rise. Here's the growth in job postings mentioning big data: And here's statistician: The drop-off in demand for statisticians 2011 seems to...

Read more »

Extracting EOD Data from BSE

July 19, 2011
By
Extracting EOD Data from BSE

Earlier, I had worked around to download Bhavcopy from NSE. Now, I would make a similar attempt to download the BSE BhavcopyObjective: Download Bhavcopy (Equity) from http://www.bseindia.com and save only relevant columns Date, Symbol, Name, Open, High...

Read more »

Extracting EOD Data from NSE

July 19, 2011
By
Extracting EOD Data from NSE

My prime interest being the Indian financial markets, the first step would be to get the data to play around. NSE India provides EOD of data as bhavcopies. The same are stored as zipped files at their servers. Downloading them one by one for a larger t...

Read more »

Geocoding addresses from Missouri Sex Offender Registry

July 19, 2011
By

Computer Assisted Reporting This is the second of four articles about analyzing distances between sex offenders and child daycare centers in Missouri as part of a joint project with KSHB NBC Action News in Kansas City. The previous article gave details...

Read more »

Analysis of Missouri Sex Offender Registry Data

July 18, 2011
By

Computer Assisted Reporting This is the first of three articles about analyzing distances between sex offenders and child daycare centers in Missouri as part of a joint project with KSHB NBC Action News in Kansas City. The Missouri State Highway Patrol...

Read more »

The foundations of Statistics [reply]

July 18, 2011
By
The foundations of Statistics [reply]

Shravan Vasishth has written a response to my review both published on the Statistics Forum. His response is quite straightforward and honest. In particular, he acknowledges not being a statistician and that he “should spend more time studying statistics”. I also understand the authors’ frustration at trying “to recruit several statisticians (at different points) to

Read more »

GigaOm article on R, Big Data and Data Science

July 18, 2011
By

I'm really pleased that an article I wrote, "5 real-world uses of big data", has been published in the widely-read technology blog GigaOm. In the article, I review five examples of using data science techniques and R to make sense of some large real-world data sets: Drew Conway's analysis of the Afghanistan attacks data released by Wikileaks Benetech's use...

Read more »

Registration closing for UseR! 2011

July 18, 2011
By
Registration closing for UseR! 2011

Friday July 22 is the last day on which you can register for UseR! 2011 at the University of Warwick.  The conference will be 2011 August 16-18. You can peruse the book of abstracts and view the draft schedule. I am scheduled to give a talk on “Random input testing with R”.  The abstract is: … Continue reading...

Read more »

Model Validation: Interpreting Residual Plots

July 18, 2011
By
Model Validation: Interpreting Residual Plots

When conducting any statistical analysis it is important to evaluate how well the model fits the data and that the data meet the assumptions of the model. There are numerous ways to do this and a variety of statistical tests to evaluate deviations from model assumptions. However, there is little general acceptance of any of the statistical tests. Generally...

Read more »

Example 9.3: augmented display of contingency table

July 18, 2011
By
Example 9.3: augmented display of contingency table

SAS and R often provide different levels of details from output. This is particularly true for the descriptive analysis of contingency tables, where SAS makes it easy to display tables with additional quantities (such as the observed cell count).The m...

Read more »

The Road to Default: Puppy Power!

July 18, 2011
By
The Road to Default: Puppy Power!

Although Congress can technically dilly dally until August 2nd to come up with an agreement and raise the debt ceiling- markets have anticipated the inevitable. They haven't sat back and decided to wait till August 2nd to panic- they are already in "oh...

Read more »

Fast logistic regression on Big Data with commodity hardware? No problem.

July 18, 2011
By

You might think that doing advanced statistical analysis on Big Data is out of reach for those of us without access to expensive hardware and software. For example, back in April SAS was proud to demonstrate being able to run logistic regression on a billion records (and "just a few" variables) in less than 80 seconds. But that feat...

Read more »

Avoiding Loops in R: An Example with Principal Minors

July 18, 2011
By
Avoiding Loops in R: An Example with Principal Minors

Yesterday, I found myself wanting to compute a large subset of the second order principal minors of a matrix (diagonal-preserving minors; the ones for which the rows and columns kept are the same). Don't judge me for wanting to do this, and bear with ...

Read more »

1st Data Analysis Contest Using R

1st Data Analysis Contest Using R

Emilio Torres Manzanera has just announced the 1st Data Analysis Contest Using R: “Nestoria (http://www.nestoria.com/) is a specialized web search engine platform in house prices. Nestoria and Lokku Labs aim to improve the understanding of the public of the value of its databases. The company aims to engage a few brilliant statisticians in the expectation

Read more »

On “Stock correlation has been rising”

July 17, 2011
By
On “Stock correlation has been rising”

Ticker Sense posted about the mean correlation of the S&P 500. The plot there — similar to Figure 1 — shows that correlation has been on the rise after a low in February. Figure 1: Mean 50-day rolling correlation of S&P 500 constituents to the index. For me, this post raised a whole lot more … Continue reading...

Read more »

The method in the mirror: reflection in R

July 17, 2011
By
The method in the mirror: reflection in R

Reflection is a programming concept that sounds scarier than it is. There are three related concepts that fall under the umbrella of reflection, and I’ll be surprised if you haven’t come across most of these code ideas already, even if you didn’t know it was called reflection. The first concept is examination of your variables.

Read more »