# 2207 search results for "regression"

## Model Segmentation with Cubist

March 18, 2015
By

Cubist is a tree-based model with a OLS regression attached to each terminal node and is somewhat similar to mob() function in the Party package (https://statcompute.wordpress.com/2014/10/26/model-segmentation-with-recursive-partitioning). Below is a demonstrate of cubist() model with the classic Boston housing data.

## Seven Ways You Can Use A Linear, Polynomial, Gaussian, & Exponential Line Of Best Fit

March 18, 2015
By

A line of best fit lets you model, predict, forecast, and explain data. This post shows how you can use a line of best fit to explain college tuition, rats, turkeys, burritos, and the NHL draft. Read on or see our tutorials for more. Contact us if you’re interested in a trial of plotly on-premise....

## Part 3b: EDA with ggplot2

March 16, 2015
By

In Part 3a I have introduced the plotting system ggplot2. I talked about its concept and syntax with some detail, and then created a few general plots, using the weather data set we've been working with in this series of tutorials. My goal was to show ...

## Matrix factorization

March 10, 2015
By

Or fancy words that mean very simple things. At the heart of most data mining, we are trying to represent complex things in a simple way. The simpler you can explain the phenomenon, the better you understand. It’s a little zen – compression is the same as understanding. Warning: Some math ahead.. but stick with it, it’s worth

## Matrix factorization

March 10, 2015
By

Or fancy words that mean very simple things. At the heart of most data mining, we are trying to represent complex things in a simple way. The simpler you can explain the phenomenon, the better you understand. It’s a little zen – compression is the same as understanding. Warning: Some math ahead.. but stick with it, it’s worth

## Econometrics Sim – 1: Endogeneity

March 9, 2015
By
$Econometrics Sim – 1: Endogeneity$

Introduction This is the first post in a series devoted to explaining basic econometric concepts using R simulations. The topic in this post is endogeneity, which can severely bias regression estimates. I will specifically simulate endogeneity caused by an omitted variable. In future posts in this series, I’ll simulate other specification issues such as heteroskedasticity, multicollinearity, and collider … Continue reading...

## Simulating Endogeneity

March 9, 2015
By
$Simulating Endogeneity$

Introduction The topic in this post is endogeneity, which can severely bias regression estimates. I will specifically simulate endogeneity caused by an omitted variable. In future posts in this series, I’ll simulate other specification issues such as heteroskedasticity, multicollinearity, and collider bias. The Data-Generating Process Consider the data-generating process (DGP) of some outcome variable : For the … Continue reading...

## Some Intuition About the Theory of Statistical Learning

March 7, 2015
By

While I was working on the Theory of Statistical Learning, and the concept of consistency, I found the following popular graph (e.g. from  thoses slides, here in French) The curve below is the error on the training sample, as a function of the size of the training sample. Above, it is the error on a validation sample. Our learning...

## Visualising a Classification in High Dimension

March 6, 2015
By

So far, when discussing classification, we’ve been playing on my toy-dataset (actually, I should no claim it’s mine, it is inspired by the one used in the introduction of Boosting, by Robert Schapire and Yoav Freund). But in ral life, there are more observations, and more explanatory variables.With more than two explanatory variables, it starts to be more complicated...

## Beautiful tables for linear model summaries #rstats

March 6, 2015
By

Beautiful HTML tables of linear models In this blog post I’d like to show some (old and) new features of the sjt.lm function from my sjPlot-package. These functions are currently only implemented in the development snapshot on GitHub. A package update is planned to be submitted soon to CRAN. There are two new major features