Sometimes I actually use my experiments for real work. For example, I wanted to send an update on the Japanese Yen. This was a great opportunity to use the chart created in Shading and Points with xtsExtra plot.xts.I was fairly please...

Introduction In my last post, I described how we can derive modes, medians and means as three natural solutions to the problem of summarizing a list of numbers, \((x_1, x_2, \ldots, x_n)\), using a single number, \(s\). In particular, we measured the quality of different potential summaries in three different ways, which led us to

If you're laying down a friendly bet on the March Madness games or just tweaking your fantasy roster, this NCAA Data Visualizer by Rodrigo Zamith will be a boon. Just choose two teams to compare head-to-head, choose an attribute to compare them on. You can look at more than a dozen invividual player attributes (e.g. points scored, assists, 3-point...

The role of Data Scientist has been getting a lot of attention lately. Brendan Tierney's blog post titled Type I and Type II Data Scientists adds an interesting perspective by defining and characterizing two key types of Data Scientist, both of which are needed in an organization. Tierney writes about Type I Data Scientists, "These are...

Next topic on logistic regression: the exact and the conditional logistic regressions. Exact logistic regression When the dataset is very small or severely unbalanced, maximum likelihood estimates of coefficients may be biased. An alternative is to use exact logistic regression, available in R with the elrm package. Its syntax is based on an events/trials formulation.

Introduction / Warning Any traditional introductory statistics course will teach students the definitions of modes, medians and means. But, because introductory courses can’t assume that students have much mathematical maturity, the close relationship between these three summary statistics can’t be made clear. This post tries to remedy that situation by making it clear that all

Why I used html5 for my today’s talk? My last presentation was in html5. This time I wanted to do my slides in something new. I prepared first few slides in Jessyink. Then I got to know that my friend … Continue reading →The post Data visualisation talk: Presentation using reports package appeared first on Fiddling...

Maximum Sharpe Portfolio or Tangency Portfolio is a portfolio on the efficient frontier at the point where line drawn from the point (0, risk-free rate) is tangent to the efficient frontier. There is a great discussion about Maximum Sharpe Portfolio or Tangency Portfolio at quadprog optimization question. In general case, finding the Maximum Sharpe Portfolio

Needless to say, it is with great pleasure I am back in beautiful Padova for the workshop Recent Advances in statistical inference: theory and case studies, organised by Laura Ventura and Walter Racugno. Esp. when considering this is one of the last places I met with George Casella, in June 2010. As we have plenty

Just a short post to celebrate that I learned today how incredibly easy it is to make a heatmap of correlations with ggplot2 (and reshape2, of course). So, what is going on in that short passage? cor makes a correlation matrix with all the pairwise correlations between variables (twice; plus a diagonal of ones). melt

This is the third post concerning fast merging in R, first here and second here. This time we are going to look at how the merge function from data.table package works in our case, requested by Uwe Block. As a reminder the first post concerns doing a...

by Joseph Rickert If you type ?Distributions at the R console you get a list of the 21 probability distributions included in the stats package that ships with base R. The same list appears in the Introduction to R Manual on CRAN and in most of the many fine introductory books available for the R language. These are indeed...

RserveCLI is a .net/cli client for Rserve, created by Oliver M. Haynold. Oliver has done a great job with this project. I forked this project to add features, fix bugs, and do some restructuring. I thought it was a significant enough depature to cre...

When I lived in Paris years ago, I worked near Gare du Nord, but my friend Jenny lived near République. If we wanted to meet up after work, we'd just meet halfway along the Orange Métro line, around Gare de l'Est. Easy. Since that's within walking distance we wouldn't actually take the Métro, but Métro stations are useful waypoints...

The interest in high frequency trading and models has grown exponentially in the last decade. While I have some doubts about the validity of any signals emerging from all the noise at higher and higher frequencies, I have nevertheless decided to look at the statistical modelling of intraday returns using GARCH models. Unlike daily and

In this article I discuss a general approach for Geocoding a location from within R, processing XML reports, and using R packages to create interactive maps. There are various ways to accomplish this, though using Google’s GeoCoding service is a good place to start. We’ll also talk a bit about the XML package that is

Some history and a prediction. Past A discussion broke out on the R-help mailing list in January 2006 about a technical report put out by the statistical computing group at UCLA. The report in question talked mainly about SAS, SPSS and Stata. It talked briefly — and not especially positively — about R. Someone accused The post On...