1539 search results for "regression"

What is Correctness for Statistical Software?

December 14, 2012
By
What is Correctness for Statistical Software?

Introduction A few months ago, Drew Conway and I gave a webcast that tried to teach people about the basic principles behind linear and logistic regression. To illustrate logistic regression, we worked through a series of progressively more complex spam detection problems. The simplest data set we used was the following: This data set has

Read more »

Trading with SVMs: Performance

December 13, 2012
By
Trading with SVMs: Performance

To get a feeling of SVM performance in trading, I run different setups on the S&P 500 historical data from … the 50s. The main motif behind using this decade was to decide what parameters to vary and what to keep steady prior to running the most important tests. Treat it as an “in-sample” test

Read more »

Testing Assumption Testing

December 13, 2012
By
Testing Assumption Testing

I’ve been doing a lot of linear modeling this year. That’s not much different than any ordinary year, but now I’m doing it in R. I had spent a bit of time in recent years trying to look at loss reserving as a multivariate regression. Excel is happy to do that, but testing various predictor

Read more »

Linear Models with Multiple Fixed Effects

December 11, 2012
By
Linear Models with Multiple Fixed Effects

Estimating a least squares linear regression model with fixed effects is a common task in applied econometrics, especially with panel data. For example, one might have a panel of countries and want to control for fixed country factors. In this case the researcher will effectively include this fixed identifier as a factor variable, and then proceed to

Read more »

A Simple Model for Realized Volatility

December 9, 2012
By
A Simple Model for Realized Volatility

The post has two goals: (1) Explain how to forecast volatility using a simple Heterogeneous Auto-Regressive (HAR) model. (Corsi, 2002) (2) Check if higher moments like Skewness and Kurtosis add forecast value to this model. It will be a high … Continue reading →

Read more »

Handling missing data with Amelia

December 9, 2012
By
Handling missing data with Amelia

So, what if you have data, but some of the observations are missing? Many statistical techniques assume no missingness, so we might want to “fill in” or rectangularize our data, by replacing missing observations with plausible substitutes....

Read more »

Please stop using Excel-like formats to exchange data

December 7, 2012
By
Please stop using Excel-like formats to exchange data

I know “officially” data scientists all always work in “big data” environments with data in a remote database, streaming store or key-value system. But in day to day work Excel files and Excel export files get used a lot and cause a disproportionate amount of pain. I would like to make a plea to my Related posts:

Read more »

Tibshirani’s original paper on the lasso. Breiman’s…

December 6, 2012
By
Tibshirani’s original paper on the lasso.
Breiman’s…

Tibshirani’s original paper on the lasso. Breiman’s Garotte — 1993 Tibshirani lasso paper submitted — 1994 Tibshirani lasso paper revised — 1995 Tibshirani lasso paper accepted — 1996 This is one of those papers that I’m so excited about, I feel like “You should just read the whole thing! It’s all good!” But I realise that’s less than reasonable. Here is a bit of summary,...

Read more »

How a Mexican state ended up with more drug war homicides than total homicides

December 5, 2012
By
How a Mexican state ended up with more drug war homicides than total homicides

During 2007 and 2008 the Mexican state of Sinaloa had more drug war-related homicides than total homicides. This should in theory be impossible since drug war homicides are a subset of total homicides. How did this happen? Here is a chart from my old post highlighting the monthly difference between the...

Read more »

Modis QC Bits

December 5, 2012
By
Modis QC Bits

In the course of working through my MODIS  LST project and reviewing the steps that Imhoff and Zhang took as well has the data preparations other researchers have taken ( Neteler ) the issue of MODIS Quality control bits came up.  Every MODIS  HDF file comes with multiple SDS or multiple layers of data. For

Read more »