Blog Archives

Using geom_step

June 3, 2016
By
Using geom_step

geom_step is an interesting geom supplied by the R package ggplot2. It is an appropriate rendering option for financial market data and we will show how and why to use it in this article. Let’s take a simple example of plotting market data. In this case we are plotting the "ask price" (the publicly published … Continue reading...

Read more »

A demonstration of vtreat data preparation

June 1, 2016
By
A demonstration of vtreat data preparation

This article is a demonstration the use of the R vtreat variable preparation package followed by caret controlled training. In previous writings we have gone to great lengths to document, explain and motivate vtreat. That necessarily gets long and unnecessarily feels complicated. In this example we are going to show what building a predictive model … Continue reading...

Read more »

On ranger respect.unordered.factors

May 30, 2016
By
On ranger respect.unordered.factors

It is often said that “R it its packages.” One package of interest is ranger a fast parallel C++ implementation of random forest machine learning. Ranger is great package and at first glance appears to remove the “only 63 levels allowed for string/categorical variables” limit found in the Fortran randomForest package. Actually this appearance is … Continue reading...

Read more »

Installing WVPlots and “knitting R markdown”

May 20, 2016
By
Installing WVPlots and “knitting R markdown”

Some readers have been having a bit of trouble using devtools to install WVPlots. I thought I would write a note with a few instructions to help. These are things you should not have to do often, and things those of us already running R have stumbled through and forgotten about. First you will need … Continue reading...

Read more »

For a short time: Half Off Some Manning Data Science Books

May 12, 2016
By

Our publisher Manning Publications is celebrating the release of a new data science in Python title Introducing Data Science by offering it and other Manning titles at half off until Wednesday, May 18. As part of the promotion you can also use the supplied discount code mlcielenlt for half off some R titles including R … Continue reading...

Read more »

Coming up: principal components analysis

May 7, 2016
By
Coming up: principal components analysis

Just a “heads-up.” I’ve been editing a two-part series Nina Zumel is writing on some of the pitfalls of improperly applied principal components analysis/regression and how to avoid them (we are using the plural spelling as used in following Everitt The Cambridge Dictionary of Statistics). The series is looking absolutely fantastic and I think it … Continue reading...

Read more »

vtreat cross frames

May 5, 2016
By
vtreat cross frames

vtreat cross frames John Mount, Nina Zumel 2016-05-05 As a follow on to “On Nested Models” we work R examples demonstrating “cross validated training frames” (or “cross frames”) in vtreat. Consider the following data frame. The outcome only depends on the “good” variables, not on the (high degree of freedom) “bad” variables. Modeling such a … Continue reading...

Read more »

On Nested Models

April 26, 2016
By
On Nested Models

We have been recently working on and presenting on nested modeling issues. These are situations where the output of one trained machine learning model is part of the input of a later model or procedure. I am now of the opinion that correct treatment of nested models is one of the biggest opportunities for improvement … Continue reading...

Read more »

Improved vtreat documentation

April 17, 2016
By
Improved vtreat documentation

Nina Zumel has donated some time to greatly improve the vtreat R package documentation (now available as pre-rendered HTML here). vtreat is an R data.frame processor/conditioner package that helps prepare real-world data for predictive modeling in a statistically justifiable manner. Even with modern machine learning techniques (random forests, support vector machines, neural nets, gradient boosted … Continue reading...

Read more »

Free data science video lecture: debugging in R

April 9, 2016
By

We are pleased to release a new free data science video lecture: Debugging R code using R, RStudio and wrapper functions. In this 8 minute video we demonstrate the incredible power of R using wrapper functions to catch errors for later reproduction and debugging. If you haven’t tried these techniques this will really improve your … Continue reading...

Read more »

Sponsors

Mango solutions



RStudio homepage



Zero Inflated Models and Generalized Linear Mixed Models with R

Quantide: statistical consulting and training

datasociety

http://www.eoda.de





ODSC

ODSC

CRC R books series





Six Sigma Online Training









Contact us if you wish to help support R-bloggers, and place your banner here.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)