A better ‘nls’ (?)

July 5, 2012
By
A better ‘nls’ (?)

Those that do a lot of nonlinear regression will love the nls function of R. In most of the cases it works really well, but there are some mishaps that can occur when using bad starting values for the parameters. One of the most dreaded is the “singular gradient matrix at initial parameter estimates” which

Read more »

Health Care Costs – Part 1, "The Problem"

July 5, 2012
By
Health Care Costs – Part 1, "The Problem"

The Problem In the United States, health care costs have been going up for a number of years, even when adjusted for inflation. Not unlike a runaway freight train, this rampant inflation cannot continue indefinitely without crashing. ...

Read more »

New R User Group in Leipzig, Germany

July 5, 2012
By

Leipzig R Statistical Computing is the sixth local R user group in Germany, and has been holding meetings since February. In the next meeting on July 12, member Claudia Beleites will talk about her pacakges softclassval (for classifier performance measures) and hyperspec (for hyperspectral data). meetup.com: Leipzig R Statistical Computing

Read more »

Validating email adresses in R

July 5, 2012
By

I currently program an automated report generation in R – participants fill out a questionnaire, and they receive a nicely formatted pdf with their personality profile. I use knitr, LaTex, and the sendmailR package. Some participants did not provide valid email addresses, which caused the sendmail function to crash. Therefore I wanted some validation of

Read more »

A tiny RCurl headache ;)

July 4, 2012
By

As more and more data go online (plus we love Google Drive) we are forced to connect to our data over the net. We mostly do this via RCurl (but we could do this using RGoogleDocs as well).In that case all that is required to get the data into R is the two lines of ...read more

Read more »

A new open journal on Data Science

July 4, 2012
By

Springer has introduced a new open, peer-reviewed journal focused on Data Science: EPJ Data Science. What makes this a Data Science journal is novel uses of statistics, data analysis, computer techniques and public data sources to research a topic in another domain, rather than methodological research. Here are a few examples of the papers you'll find in the journal:...

Read more »

Alternative to Monte Carlo Testing

July 4, 2012
By
Alternative to Monte Carlo Testing

When we backtest a strategy on a portfolio, it is a simple analysis of a single period in time. There are ways to “stress test” a strategy such as monte carlo, random portfolios, or shuffling the returns in a random order. I could never really wrap my head around monte carlo and shuffling the returns … Continue reading...

Read more »

Three Questions about a Matrix of Coefficient Plots

July 4, 2012
By

It's Independence Day in the U.S., so I am taking the day off, but I received the following request for advice and thought I'd pass it along to my readers. I wonder if you could help – I am trying to create 9 different coefficient plots , which repr...

Read more »

A tutorial on outlier detection techniques

July 4, 2012
By
A tutorial on outlier detection techniques

by Yanchang Zhao, RDataMining.com There is an excellent tutorial on outlier detection techniques, presented by Hans-Peter Kriegel et al. at ACM SIGKDD 2010. It presents many popular outlier detection algorithms, most of which were published between mid 1990s and 2010, … Continue reading →

Read more »

The Higgs boson: 5-sigma and the concept of p-values

July 4, 2012
By
The Higgs boson: 5-sigma and the concept of p-values

Why are physicists talking about 5-sigma, and what's it got to do with statistics? In this short post I'll explain what 5-sigma is and why it's not a measure of how certain scientist are that they've found the Higgs boson

Read more »

Glmnet_1.8 uploaded to CRAN

July 4, 2012
By

(by Trevor Hastie) Glmnet_1.8 uploaded to CRAN – This is a major revision, with two additional models included. 1) Multiresponse regression – family=”mgaussian” Here we have a matrix of M responses, and we fit a series of linear models in parallel. We use a group-lasso penalty on the set of M coefficients for each variable. This means they are...

Read more »

To the Basics: Bayesian Inference on A Binomial Proportion

July 4, 2012
By
To the Basics: Bayesian Inference on A Binomial Proportion

Think of something observable – countable – that you care about with only one outcome or another. It could be the votes cast in a two-way election in your town, or the free throw shots the center on your favorite...

Read more »

Example of Factor Attribution

July 3, 2012
By
Example of Factor Attribution

In the prior post, Factor Attribution 2, I have shown how Factor Attribution can be applied to decompose fund’s returns in to Market, Capitalization, and Value factors, the “three-factor model” of Fama and French. Today, I want to show you a different application of Factor Attribution. First, let’s run Factor Attribution on each the stocks

Read more »

RcppBDT 0.2.0

A new release of the RcppBDT package appeared on CRAN earlier today. RcppBDT uses Rcpp, and in particular the nifty Rcpp modules feature of wrapping C++ code for R just by declaring the (class or function) interfaces. It uses this to bring in some useful functions from Boost Date.Time to R so that one can do things like R> library(RcppBDT) R> sapply(2012:2016, function(year) +...

Read more »

The role of Statistics in the Higgs Boson discovery

July 3, 2012
By
The role of Statistics in the Higgs Boson discovery

News is starting to leak that the Large Hadron Collider may have accomplished its primary mission of confirming the existence of the hypothesised and heretofore elusive subatomic particle, the Higgs Boson. And sure, billions of Euros worth of state-of-the-art high-energy machinery and an army of experimental and theoretical physicists probably had something to do with the discovery. But did...

Read more »

An Improvement to Coefficient Plots

July 3, 2012
By

I recently posted about coefficient plots, discussing my approach and providing some example R code to create the graphs. I had the good fortune of hearing Amanda Driscoll give a talk recently, and she made a small, but really nice … Continue rea...

Read more »

Combining ggplot Images

July 3, 2012
By
Combining ggplot Images

The ggplot2 package provides an excellent platform for data visualization. One (minor) drawback of this package is that combining ggplot images into one plot, like the par() function does for regular plots, is not a straightforward procedure. Fortunately, R user Stephen Turner has kindly provided a function called “arrange” that does exactly this. The function,

Read more »

Blog with Knitr and Jekyll

July 3, 2012
By
Blog with Knitr and Jekyll

The knitr package provides an easy way to embed R code in a Jekyll-Bootstrap blog post. The only required input is an R Markdown source file. The name of the source file used to generate this post is 2012-07-03-knitr-jekyll.Rmd, available here. Steps taken to build this post are as follows: Step 1 Create a Jekyll-Boostrap blog if...

Read more »

Applying a function successively in R

July 3, 2012
By
Applying a function successively in R

At the R in Finance conference Paul Teetor gave a fantastic talk about Fast(er) R Code. Paul mentioned the common higher-order function Reduce, which I hadn't used before. Reduce allows me to apply a function successively over a vector. What does that...

Read more »

RcppArmadillo 0.3.2.3

Conrad releaser version 3.2.3 of Armadillo a few days ago, and the corresponding RcppArmadillo package 0.3.2.3 is now CRAN. (For these keeping score 3.2.1 never was a full release, and 3.2.2 containing fixes for a build issue that did not affect the ...

Read more »

A big list of the things R can do

July 2, 2012
By

R is an incredibly comprehensive statistics package. Even if you just look at the standard R distribution (the base and recommended packages), R can do pretty much everything you need for data manipulation, visualization, and statistical analysis. And for everything else, there's more than 5000 packages on CRAN and other repositories, and the big-data capabilities of Revolution R Enterprise....

Read more »

precise pangolin (Ubuntu 12.04)

July 2, 2012
By
precise pangolin (Ubuntu 12.04)

Following the crash of my hard drive right before leaving Kyoto, I bought a cheap Compaq Presario CQ57 to reinstall Ubuntu 12.04 over the weekend (and have a laptop available before leaving for Australia…)  It took about one hour to install from the DVD and everything seems to be working out of the box. The

Read more »

Graphics Artifacts from Quarterly Commentary

July 2, 2012
By
Graphics Artifacts from Quarterly Commentary

For my Q2 2012 commentary, I tried multiple graphs to illustrate the disconnect of the US stock markets with the rest of the world.  I think I finally settled on this simple Excel bar graph populated by Bloomberg data, but I thought some might lik...

Read more »

Project Euler — problem 11

July 2, 2012
By
Project Euler — problem 11

It’s been a while since I solved one Euler problem last time. Has been busy. Now I’m back and continue to solve the next problem, which is to find the maximum. Let’s take a look at the 11th problem: What … Continue reading →

Read more »

Citing R or SAS

July 2, 2012
By
Citing R or SAS

One of us recently read a colleague's first draft of a paper, in which she had written: "All analyses were done in R 2.14.0." We assume we're preaching to the converted here, when we say that the enormous amount of work that goes into R needs to be re...

Read more »

My first competition at Kaggle

July 2, 2012
By
My first competition at Kaggle

For me Kaggle becomes a social network for data scientist, as stackoverflow.com or github.com for programmers. If you are data scientist, machine learner or statistician you better off to have a profile there, otherwise you do not exist. Nevertheless, I won’t bet on rosy future for data scientist as journalists suggest (sexy job for next

Read more »

Popularity of R continues

July 2, 2012
By
Popularity of R continues

No doubt those that read my blog know that the tools I use to do my Industrial Engineering and Operations Research work heavily rely on the open source side of software.  That is why I try to support as many open source projects such as COIN-OR, G...

Read more »

Moving beyond hopeless graphics

July 2, 2012
By

I was at a talk awhile ago where the speaker presented tables with 4, 5, 6, even 8 significant digits even though, as is usual, only the first or second digit of each number conveyed any useful information. A graph would be better, but even if you’re too lazy to make a plot, a bit The post Moving...

Read more »

Random portfolios versus Monte Carlo

July 2, 2012
By
Random portfolios versus Monte Carlo

What is the difference between Monte Carlo — as it is usually defined in finance — and random portfolios? The meaning of “Monte Carlo” The idea of “Monte Carlo” is very simple.  It is a fancy word for “simulation”. As usual, it is all too possible to find incredibly muddied explanations of such a simple … Continue reading...

Read more »

Sponsors

Mango solutions



plotly webpage

dominolab webpage



Zero Inflated Models and Generalized Linear Mixed Models with R

Quantide: statistical consulting and training

datasociety

http://www.eoda.de





ODSC

ODSC

CRC R books series





Six Sigma Online Training









Contact us if you wish to help support R-bloggers, and place your banner here.