# 2071 search results for "regression"

## Unprincipled Component Analysis

February 10, 2014
By

As a data scientist I have seen variations of principal component analysis and factor analysis so often blindly misapplied and abused that I have come to think of the technique as unprincipled component analysis. PCA is a good technique often used to reduce sensitivity to overfitting. But this stated design intent leads many to (falsely) Related posts:

## After 1st semester of Statistics PhD program

February 9, 2014
By

Have you ever wondered whether the first semester of a PhD is really all that busy? My complete lack of posts last fall should prove it Some thoughts on the Fall term, now that Spring is well under way: The … Continue reading →

## N2 with runlm()

February 9, 2014
By

Introduction The default swN2() calculation in Oce uses a smoothing spline. One disadvantage of this is that few readers will know how it works. A possible alternative is to compute d(rho)/dz using the slope inferred from a running-window linear regression. Such a slope is provided by the new Oce function runlm(), which is tested here. (Note...

## Another skewed normal distribution

February 8, 2014
By
$Another skewed normal distribution$

At the CLRS last year, Glenn Meyers talked about something very near to my heart: a skewed normal distribution. In loss reserving (and I'm sure, many other contexts) standard linear regression is less than ideal as it presumes that deviations from the mean are equally distributed. We rarely expect this assumption to hold (though we

## Random Forest Almighty

February 6, 2014
By

Random Forests are awesome. They do not overfit, they are easy to tune, they tell you about important variables, they can be used for classification and regression, they are implemented in many programming languages and they are faster than their competitors (neural nets, boosting, support vector machines, ...)Let us take a moment to appreciate them: The...

## In case you missed it: January 2014 roundup

February 5, 2014
By

In case you missed them, here are some articles from January of particular interest to R users: Princeton’s Germán Rodríguez has published a useful “Introduction to R” guide, with a focus on linear and logistic regression. The rxDForest function in the RevoScaleR package fits random forests of histogram-binning trees. A tutorial on using the xts package to analyze and...

## An Inconvenient Statistic

February 4, 2014
By

As I sit here waiting on more frigid temperatures subsequent to another 10 inches of snow, suffering from metastatic cabin fever, I can't help but ponder what I can do examine global warming/climate change.  Well, as luck would have it, R has the tools to explore this controversy.  Using two packages, vars and forecast, I will see if I...

## Data Analysis Steps

February 3, 2014
By

After going through the overview of tools & technologies needed to become a Data scientist in my previous blog post, in this post, we shall understand how to tackle a data analysis problem.Any data analysis project starts with identifying a business problem where historical data exists. A business problem can be anything which can include prediction...

## Bad Bayes: an example of why you need hold-out testing

February 1, 2014
By

We demonstrate a dataset that causes many good machine learning algorithms to horribly overfit. The example is designed to imitate a common situation found in predictive analytic natural language processing. In this type of application you are often building a model using many rare text features. The rare text features are often nearly unique k-grams Related posts:

## Inference for ARMA(p,q) Time Series

January 30, 2014
By
$ARMA(1,1)$

As we mentioned in our previous post, as soon as we have a moving average part, inference becomes more complicated. Again, to illustrate, we do not need a two general model. Consider, here, some  process, where  is some white noise, and assume further that . > theta=.7 > phi=.5 > n=1000 > Z=rep(0,n) > set.seed(1) > e=rnorm(n) > for(t...