## Are you a Type I or Type II Data Scientist?

March 22, 2013
The role of Data Scientist has been getting a lot of attention lately. Brendan Tierney's blog post titled Type I and Type II Data Scientists adds an interesting perspective by defining and characterizing two key types of Data Scientist, both of which are needed in an organization. Tierney writes about Type I Data Scientists, "These are...

## Veterinary Epidemiologic Research: GLM (part 4) – Exact and Conditional Logistic Regressions

March 22, 2013
Next topic on logistic regression: the exact and the conditional logistic regressions. Exact logistic regression When the dataset is very small or severely unbalanced, maximum likelihood estimates of coefficients may be biased. An alternative is to use exact logistic regression, available in R with the elrm package. Its syntax is based on an events/trials formulation.

## Modes, Medians and Means: A Unifying Perspective

March 22, 2013
Introduction / Warning Any traditional introductory statistics course will teach students the definitions of modes, medians and means. But, because introductory courses can’t assume that students have much mathematical maturity, the close relationship between these three summary statistics can’t be made clear. This post tries to remedy that situation by making it clear that all

## Plotting lm and glm models with ggplot #rstats

March 22, 2013
Update I followed the advice from Tim’s comment and changed the scaling in the sjPlotOdds-function to logarithmic scaling. The screenshots below showing the plotted glm’s have been updated. Summary In this posting I will show how to plot results from … Weiterlesen →

## Data visualisation talk: Presentation using reports package

March 21, 2013
Why I used html5 for my today’s talk?   My last presentation was in html5. This time I wanted to do my slides in something new.  I prepared  first few slides in Jessyink. Then I got to know that my friend … Continue reading →The post Data visualisation talk: Presentation using reports package appeared first on Fiddling...

## Maximum Sharpe Portfolio

March 21, 2013
Maximum Sharpe Portfolio or Tangency Portfolio is a portfolio on the efficient frontier at the point where line drawn from the point (0, risk-free rate) is tangent to the efficient frontier. There is a great discussion about Maximum Sharpe Portfolio or Tangency Portfolio at quadprog optimization question. In general case, finding the Maximum Sharpe Portfolio

March 21, 2013
Needless to say, it is with great pleasure I am back in beautiful Padova for the workshop Recent Advances in statistical inference: theory and case studies, organised by Laura Ventura and Walter Racugno. Esp. when considering this is one of the last places I met with George Casella, in June 2010. As we have plenty

## Using R: Correlation heatmap with ggplot2

March 21, 2013
Just a short post to celebrate that I learned today how incredibly easy it is to make a heatmap of correlations with ggplot2 (and reshape2, of course). So, what is going on in that short passage? cor makes a correlation matrix with all the pairwise correlations between variables (twice; plus a diagonal of ones). melt

## RMark: data.table merge vs core merge

March 21, 2013
This is the third post concerning fast merging in R, first here and second here. This time we are going to look at how the merge function from data.table package works in our case, requested by Uwe Block. As a reminder the first post concerns doing a...

## R’s Garden of Probability Distributions

March 21, 2013
by Joseph Rickert If you type ?Distributions at the R console you get a list of the 21 probability distributions included in the stats package that ships with base R. The same list appears in the Introduction to R Manual on CRAN and in most of the many fine introductory books available for the R language. These are indeed...

## And so begins English Composition I

March 21, 2013
This week started the English Composition I: Achieving Expertise course (Comer, 2013) that I have been looking forward to. I am not sure yet how long I will last, but I hope to enjoy it as much as I can. Plus, it should help me with my...

## Video: High scale in-database modeling in Greenplum with R

March 20, 2013
The following post presents the video of a talk by Hong Ooi who presented at Melbourne R Users, March 2013. Content: Greenplum is a massively parallel relational database platform. R is one of the top languages in the data scientist/applied … Continue reading →

## RserveCLI2, a .net client for Rserve

March 20, 2013
RserveCLI is a .net/cli client for Rserve, created by Oliver M. Haynold. Oliver has done a great job with this project. I forked this project to add features, fix bugs, and do some restructuring. I thought it was a significant enough depature to cre...

March 20, 2013
It is time for the NCAA Basketball Tournament. Sixty-four teams dream big (er…I mean 68…well actually by now, 64) and schools like Iona and Florida Gulf Coast University (go Eagles!) are hoping that Robert Morris astounding victory in the N.I.T. … Continue reading →

## Normalized Frequency of Terrorism in the US

March 20, 2013
I’ve been using the Global Terrorism Database a lot lately so I decided to share an interesting plot I made with the data. The GTD provides over 100,000 observations of terrorist incidents between 1970 and 2011. Of these, there are about 2400 observations in the USA. While this is not a large number, the graph still provides some interesting...

## Find the fairest place to meet on the Paris Métro

March 20, 2013
When I lived in Paris years ago, I worked near Gare du Nord, but my friend Jenny lived near République. If we wanted to meet up after work, we'd just meet halfway along the Orange Métro line, around Gare de l'Est. Easy. Since that's within walking distance we wouldn't actually take the Métro, but Métro stations are useful waypoints...

## Violin plots and regional income distribution

March 20, 2013
While preparing my slides for statistical graphics, a plot really caught my eye when I was playing around with the data. I started off by plotting the time seriesof GNI per capita by country, and as expected it got quite messy and...

## XLConnect on github

March 20, 2013
Mirai Solutions GmbH (http://www.mirai-solutions.com) is pleased to announce the availability of XLConnect on github. Whether you want to browse the code or simply want access to the latest development version of XLConnect, visit us on github. XLConnect can be directly … Continue reading →

## High Frequency GARCH: The multiplicative component GARCH (mcsGARCH) model

March 20, 2013
The interest in high frequency trading and models has grown exponentially in the last decade. While I have some doubts about the validity of any signals emerging from all the noise at higher and higher frequencies, I have nevertheless decided to look at the statistical modelling of intraday returns using GARCH models. Unlike daily and

## GeoCoding,R, and The Rolling Stones – Part 2

March 20, 2013
Welcome to Part 2 of the GeoCoding, R, and the Rolling Stones blog. Let’s apply some of the things we learned in Part 1 to a practical real world example. Mapping the Stones – A Real Example The Rolling Stones have toured for many years. You can go to Wikipedia and see information on the

## On the acceptance of R

March 20, 2013
Some history and a prediction. Past A discussion broke out on the R-help mailing list in January 2006 about a technical report put out by the statistical computing group at UCLA.  The report in question talked mainly about SAS, SPSS and Stata.  It talked briefly — and not especially positively — about R.  Someone accused The post On...

## Stan at Google this Thurs and at Berkeley this Fri noon

March 20, 2013
Michael Betancourt will be speaking at Google and at the University of California, Berkeley. The Google talk is closed to outsiders (but if you work at Google, you should go!); the Berkeley talk is open to all: Friday March 22, 12:10 pm, Evans Hall 1011. Title of talk: Stan: Practical Bayesian Inference with Hamiltonian Monte The post Stan...

## Fifth Torino R net meeting details – and Milano R announcement

March 20, 2013
Fifth Torino R net meeting on 11 Apr 2013, Campus Luigi Einaudi, Università degli Studi di Torino, will have three presentations Winning with R (and friends) – How data analysts affect the standings in sports championships, Massimilano Marchi, Regione Emilia-Romagna; Predictive … Continue reading →

## Decisionstats/OpenCPU interview: R, D3, security, the cloud, and snacks.

March 20, 2013
I had the pleasure of being interviewed by Ajay Ohri from decisionstats.com earlier this week. Ajay is a great interviewer and writer and has extensive knowledge and experience on how R fits into the BI tool kit. His book R for Business Analytics (Springer, 2012) is a good read for anyone in industry looking to ...

## Optimal Meeting Point on the Paris Metro

March 20, 2013
tl;dr: Play with the app here When you live in Paris, chances are you are (home or work) very close to a metro station, so when you want to meet with some friends, you usually end up picking another metro station as a meeting point. Yet, finding the optimal place to meet can easily become a complex problem considering...

## Behavioral Economics and Beer… highly correlated

March 19, 2013
Short: I plot the frequency of wikipedia searches of “Behavioral Economics”, and “Beer” – who knew the correlation would be 0.7! Data reference:Data on any wikipedia searches (back to 2007) are available at http://glimmer.rstudio.com/pssguy/wikiSearchRates/. The website allows you to download frequency hits per day as a csv, which is what I've done here....

## Animating neural networks from the nnet package

March 19, 2013
My research has allowed me to implement techniques for visualizing multivariate models in R and I wanted to share some additional techniques I’ve developed, in addition to my previous post. For example, I think a primary obstacle towards developing a useful neural network model is an under-appreciation of the effects model parameters have on model

## Samsung Phone Data Analysis Project

March 19, 2013
Below are my findings from the second data analysis project in Dr. Jeffery Leek’s John Hopkins Coursera class. Introduction I used the  “Human Activity Recognition Using Smartphones Dataset” (UCI, 2013) to build a model. This data  was recorded from a Samsung prototype smartphone with a built-in accelerometer. The purpose of my model was to recognize the type

## R’s 2012 Growth in Capability Exceeds SAS’ All Time Total

March 19, 2013
by Robert A. Muenchen I’m slowly gathering all the data needed to update my ongoing article, The Popularity of Data Analysis Software. The section below is the latest installment. Growth in Capability The capability of all the software in this … Continue reading →