2047 search results for "regression"

Random Forest Almighty

February 6, 2014
By
Random Forest Almighty

Random Forests are awesome. They do not overfit, they are easy to tune, they tell you about important variables, they can be used for classification and regression, they are implemented in many programming languages and they are faster than their competitors (neural nets, boosting, support vector machines, ...)Let us take a moment to appreciate them: The...

Read more »

In case you missed it: January 2014 roundup

February 5, 2014
By

In case you missed them, here are some articles from January of particular interest to R users: Princeton’s Germán Rodríguez has published a useful “Introduction to R” guide, with a focus on linear and logistic regression. The rxDForest function in the RevoScaleR package fits random forests of histogram-binning trees. A tutorial on using the xts package to analyze and...

Read more »

An Inconvenient Statistic

February 4, 2014
By
An Inconvenient Statistic

As I sit here waiting on more frigid temperatures subsequent to another 10 inches of snow, suffering from metastatic cabin fever, I can't help but ponder what I can do examine global warming/climate change.  Well, as luck would have it, R has the tools to explore this controversy.  Using two packages, vars and forecast, I will see if I...

Read more »

Data Analysis Steps

February 3, 2014
By
Data Analysis Steps

After going through the overview of tools & technologies needed to become a Data scientist in my previous blog post, in this post, we shall understand how to tackle a data analysis problem.Any data analysis project starts with identifying a business problem where historical data exists. A business problem can be anything which can include prediction...

Read more »

Bad Bayes: an example of why you need hold-out testing

February 1, 2014
By
Bad Bayes: an example of why you need hold-out testing

We demonstrate a dataset that causes many good machine learning algorithms to horribly overfit. The example is designed to imitate a common situation found in predictive analytic natural language processing. In this type of application you are often building a model using many rare text features. The rare text features are often nearly unique k-grams Related posts:

Read more »

Inference for ARMA(p,q) Time Series

January 30, 2014
By
Inference for ARMA(p,q) Time Series

As we mentioned in our previous post, as soon as we have a moving average part, inference becomes more complicated. Again, to illustrate, we do not need a two general model. Consider, here, some  process, where  is some white noise, and assume further that . > theta=.7 > phi=.5 > n=1000 > Z=rep(0,n) > set.seed(1) > e=rnorm(n) > for(t...

Read more »

A First Look at rxDForest()

January 30, 2014
By
A First Look at rxDForest()

by Joseph RIckert Last July, I blogged about rxDTree() the RevoScaleR function for building classification and regression trees on very large data sets. As I explaned then, this function is an implementation of the algorithm introduced by Ben-Haim and Yom-Tov in their 2010 paper that builds trees on histograms of data and not on the raw data itself. This...

Read more »

Comparing multiple (g)lm in one graph #rstats

January 29, 2014
By
Comparing multiple (g)lm in one graph #rstats

It’s been a while since a user of my plotting-functions asked whether it would be possible to compare multiple (generalized) linear models in one graph (see comment). While it is already possible to compare multiple models as table output, I now managed to build a function that plots several (g)lm-objects in a single ggplot-graph. The

Read more »

Inference for AR(p) Time Series

January 28, 2014
By
Inference for AR(p) Time Series

Consider a (stationary) autoregressive process, say of order 2, for some white noise with variance . Here is a code to generate such a process, > phi1=.25 > phi2=.7 > n=1000 > set.seed(1) > e=rnorm(n) > Z=rep(0,n) > for(t in 3:n) Z=phi1*Z+phi2*Z+e > Z=Z > n=length(Z) > plot(Z,type="l") Here, we have to estimate two sets of parameters: the autoregressive...

Read more »

How to convert odds ratios to relative risks

January 27, 2014
By
How to convert odds ratios to relative risks

My short paper on this came out on Friday in the British Medical Journal. The aim is to help both authors and readers of research make sense of this rather confusing but unavoidable statistic, the odds ratio (OR). The fundamental … Continue reading →

Read more »

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)