## Does It Make Sense to Segment Using Individual Estimates from a Hierarchical Bayes Choice Model?

March 24, 2013
I raise this question because we see calls for running segmentation with individual estimates from hierarchical Bayes choice models without any mention of the possible complications that might accompany such an approach.  Actually, all the calls seem to be from those using MaxDiff to analyze the data from incomplete block designs.  For example, if one were to...

## Writing a MS-Word document using R (with as little overhead as possible)

March 24, 2013
The problem: producing a Word (.docx) file of a statistical report created in R, with as little overhead as possible. The solution: combining R+knitr+rmarkdown+pander+pandoc (it is easier than it is spelled). If you get what this post is about, just …Read more »

## Using R: reading tables that need a little cleaning

March 24, 2013
Sometimes one needs to read tables that are a bit messy, so that read.table doesn’t immediately recognize the content as numerical. Maybe some weird characters are sprinkled in the table (ever been given a table with significance stars in otherwise numerical columns?). Some search and replace is needed. You can do this by hand, and

## R Help tooltips

March 24, 2013
I created a simple jquery plugin to display some information when hovering links to r documentation files hosted at help.r-enthusiasts.com Below is a snapshot from highlight.r-enthusiasts.com that uses the tooltips: See also a live example here: data.frame Using this feature … Continue reading →

## Tupper’s self-referential formula

March 24, 2013
Can't remember where I first came across this equation but the Tupper's self referential equation, is a very interesting formula that when graphed in two dimension plane it reproduces the formula. \[ \frac{1}{2} I first thought this would be...

## Not all proportion data are binomial outcomes

March 24, 2013
It really is trivial. Not every proportion is frequency. There are things that have values  bounded between 0 and 1 and yet they are neither probabilities, nor frequencies. Why do I even bother to write this? Because some kinds of…Read more →

## Rcpp 0.10.3

March 24, 2013
Rcpp 0.10.3 is on CRAN. Here is the part of the NEWS file related to this release Changes in R code Prevent build failures on Windowsn when Rcpp is installed in a library path with spaces (transform paths in the … Continue reading →

## Moving

March 24, 2013
This blog is moving to blog.r-enthusiasts.com. The new one is powered by wordpress and gets a subdomain of r-enthusiasts.com. See you there

## Web Hosted R Syntax Highlighter

March 24, 2013
highlight uses simple jquery command to syntax highlight R code contained in any regular <pre> element. For example, this chunk of code, from the datasets::cars help file. require(stats); require(graphics) plot(cars, xlab = "Speed (mph)", ylab = "Stopping distance (ft)", las … Continue reading →

## Automatic ARMA/GARCH selection in parallel

March 24, 2013
In the original ARMA/GARCH post I outlined the implementation of the garchSearch function. There have been a few requests for the code so … here it is. Quite easy to use too: After the last code line above, fit contains the best (according to the AIC statistic) model, which is the return value of garchFit.

## Estimating the Decay Rate and the Half-Life of DDT in Trout – Applying Simple Linear Regression with Logarithmic Transformation

This blog post uses a function and a script written in R that were displayed in an earlier blog post. Introduction This is the second of a series of blog posts about simple linear regression; the first was written recently on some conceptual nuances and subtleties about this model.  In this blog post, I will use

## My Own R Function and Script for Simple Linear Regression – An Illustration with Exponential Decay of DDT in Trout

Here is the function that I wrote for doing simple linear regression, as alluded to in my blog post about simple linear regression on log-transformed data on the decay of DDT concentration in trout in Lake Michigan.  My goal was to replicate the 4 columns of the output from applying summary() to the output of lm().

## Parsing complex text files using regular expressions and vectorization

March 24, 2013
When text data is in a nice CSV format, read.csv is enough to parse it into a useable format. But if this is not the case, getting the data into a useable format is not so straightforward. In this post… See more ›

## A Merging Test Bench

March 24, 2013
As requested here's the packed data and a test bench you can test your own merging function ideas and replicate my results (hopefully). If you want the plots you can use the end part of scripts in part1 part2. The data is a bunch of super secret Eve...

## Writing a for-loop in R

March 23, 2013
There may be no R topic that is more controversial than the humble for-loop. And, to top it off, good help is hard to find. I was astounded by the lack of useful posts when I googled “for loops in R” (the top return linked to a page that did not exist). In fact, even

## Introduction to Simulation using R

March 23, 2013
We had a great turnout yesterday for our Zero to R Hero workshop at the Quebec Centre for Biodiversity Science. We went from the absolute basics of the command line, to the intricacies of importing data, and finally we had a look at plotting using ggplot2. We didn’t have time to get to this extra module

## Predicting who will win a NFL match at half time

March 23, 2013
It was great to have a little break, Spring break, although the weather didn’t feel like spring at all! During the early part of the break I worked on my final project for Jeff Leek’s data analysis class, which we call 140.753 here. Continuing my previous posts on the topic, this time I’ll share the results of my...

## Production Quality Report with R and knitr on Yen

March 22, 2013
Sometimes I actually use my experiments for real work.  For example, I wanted to send an update  on the Japanese Yen.  This was a great opportunity to use the chart created in Shading and Points with xtsExtra plot.xts.I was fairly please...

## Using Norms to Understand Linear Regression

March 22, 2013
Introduction In my last post, I described how we can derive modes, medians and means as three natural solutions to the problem of summarizing a list of numbers, $$(x_1, x_2, \ldots, x_n)$$, using a single number, $$s$$. In particular, we measured the quality of different potential summaries in three different ways, which led us to

## Split, Apply, and Combine for ffdf

March 22, 2013
Call me incompetent, but I just can’t get ffdfdply to work with my ffdf dataframes.  I’ve tried repeatedly and it just doesn’t seem to work!  I’ve seen numerous examples on stackoverflow, but maybe I’m applying them incorrectly.  Wanting to do some … Continue reading →

## Explore March Madness face-offs with this NCAA data visualizer

March 22, 2013
If you're laying down a friendly bet on the March Madness games or just tweaking your fantasy roster, this NCAA Data Visualizer by Rodrigo Zamith will be a boon. Just choose two teams to compare head-to-head, choose an attribute to compare them on. You can look at more than a dozen invividual player attributes (e.g. points scored, assists, 3-point...

## Are you a Type I or Type II Data Scientist?

March 22, 2013
The role of Data Scientist has been getting a lot of attention lately. Brendan Tierney's blog post titled Type I and Type II Data Scientists adds an interesting perspective by defining and characterizing two key types of Data Scientist, both of which are needed in an organization. Tierney writes about Type I Data Scientists, "These are...

## Veterinary Epidemiologic Research: GLM (part 4) – Exact and Conditional Logistic Regressions

March 22, 2013
Next topic on logistic regression: the exact and the conditional logistic regressions. Exact logistic regression When the dataset is very small or severely unbalanced, maximum likelihood estimates of coefficients may be biased. An alternative is to use exact logistic regression, available in R with the elrm package. Its syntax is based on an events/trials formulation.

## Modes, Medians and Means: A Unifying Perspective

March 22, 2013
Introduction / Warning Any traditional introductory statistics course will teach students the definitions of modes, medians and means. But, because introductory courses can’t assume that students have much mathematical maturity, the close relationship between these three summary statistics can’t be made clear. This post tries to remedy that situation by making it clear that all

## Plotting lm and glm models with ggplot #rstats

March 22, 2013
Update I followed the advice from Tim’s comment and changed the scaling in the sjPlotOdds-function to logarithmic scaling. The screenshots below showing the plotted glm’s have been updated. Summary In this posting I will show how to plot results from … Weiterlesen →

## Data visualisation talk: Presentation using reports package

March 21, 2013
Why I used html5 for my today’s talk?   My last presentation was in html5. This time I wanted to do my slides in something new.  I prepared  first few slides in Jessyink. Then I got to know that my friend … Continue reading →The post Data visualisation talk: Presentation using reports package appeared first on Fiddling...

## Maximum Sharpe Portfolio

March 21, 2013
Maximum Sharpe Portfolio or Tangency Portfolio is a portfolio on the efficient frontier at the point where line drawn from the point (0, risk-free rate) is tangent to the efficient frontier. There is a great discussion about Maximum Sharpe Portfolio or Tangency Portfolio at quadprog optimization question. In general case, finding the Maximum Sharpe Portfolio

March 21, 2013
Needless to say, it is with great pleasure I am back in beautiful Padova for the workshop Recent Advances in statistical inference: theory and case studies, organised by Laura Ventura and Walter Racugno. Esp. when considering this is one of the last places I met with George Casella, in June 2010. As we have plenty

## Using R: Correlation heatmap with ggplot2

March 21, 2013
Just a short post to celebrate that I learned today how incredibly easy it is to make a heatmap of correlations with ggplot2 (and reshape2, of course). So, what is going on in that short passage? cor makes a correlation matrix with all the pairwise correlations between variables (twice; plus a diagonal of ones). melt