2046 search results for "regression"

Predicting optimal of iterations and completion time for GBM

November 20, 2013
By

When choosing the hyperparameters for Generalized Boosted Regression Models, two important choices are shrinkage and the number of trees. Generally a smaller shrinkage with more trees produces a better model, but the modeling time significantly increases. Building a model with too many trees that are heavily cut back by cross validation wastes time, while building a model...

Art of Statistical Inference

November 20, 2013
By

(This article was first published on MATHEMATICS IN MEDICINE, and kindly contributed to R-bloggers) Art of Statistical Inference This post was written by me a few years ago, when I started learning the art and science of data analysis. It will be a good starter for the amateur data analysts. Introduction What is statistics? There are about a dozen...

On the use of marginal posteriors in marginal likelihood estimation via importance-sampling

November 19, 2013
By

Perrakis, Ntzoufras, and Tsionas just arXived a paper on marginal likelihood (evidence) approximation (with the above title). The idea behind the paper is to base importance sampling for the evidence on simulations from the product of the (block) marginal posterior distributions. Those simulations can be directly derived from an MCMC output by randomly permuting the

Simulation (is where it’s happening)

November 18, 2013
By

Jim Silverton wrote to the Allstat mailing list recently: “Hi, Anyone up for a challenge? Suppose we have random variables that are random points on the surface of a sphere. What is the probability that the tetrahedron made by joining these … Continue reading →

Some Options for Testing Tables

November 18, 2013
By

Contingency tables are a very good way to summarize discrete data.  They are quite easy to construct and reasonably easy to understand. However, there are many nuances with tables and care should be taken when making conclusions related to the data. Here are just a few thoughts on the topic. Dealing with sparse data On

Visualizing neural networks in R – update

November 14, 2013
By

In my last post I said I wasn’t going to write anymore about neural networks (i.e., multilayer feedforward perceptron, supervised ANN, etc.). That was a lie. I’ve received several requests to update the neural network plotting function described in the original post. As previously explained, R does not provide a lot of options for visualizing

Calibration of p-value under variable selection: an example

November 14, 2013
By

Very often people report p-values for linear regression estimates after performing variable selection step. Here is a simple simulation that shows that such a procedure might lead to wrong calibration of such tests.Consider a simple data generating pro...

A slightly different introduction to R, part V: plotting and simulating linear models

November 11, 2013
By

In the last episode (which was quite some time ago) we looked into comparisons of means with linear models. This time, let’s visualise some linear models with ggplot2, and practice another useful R skill, namely how to simulate data from known models. While doing this, we’ll learn some more about the layered structure of a

A statistical review of ‘Thinking, Fast and Slow’ by Daniel Kahneman

November 11, 2013
By

I failed to find Kahneman’s book in the economics section of the bookshop, so I had to ask where it was.  ”Oh, that’s in the psychology section.”  It should have also been in the statistics section. He states that his collaboration with Amos Tversky started with the question: Are humans good intuitive statisticians? The wrong The post A...

Key Driver vs. Network Analysis in R

November 8, 2013
By

When marketing researchers speak of driver analysis, they are referring to an input-output model with overall satisfaction as the output and performance ratings of specific product and service components as the inputs. The causal model is straightforwa...