## “Show me the way to the next whiskey bar” (The Doors – Alabama Song) – Interactive Location Recommendation using Tableau

February 2, 2014
Since I started using Tableau I’m quite fascinated about the capabilities of this piece of software. Before Christmas I was looking how I could build an interactive visualization that helps me to explore the relationships between different objects in a form that shows which objects are very close to each other according to some similarity measure or vice versa....

## Boxplot with mean and standard deviation in ggPlot2 (plus Jitter)

February 2, 2014
When you create a boxplot in R, it automatically computes median, first and third quartile ("hinges") and 95% confidence interval of median ("notches").But we would like to change the default values of boxplot graphics with the mean, the mean + st...

## Know India through Visualisations – 1

February 1, 2014
I'm going to produce just a couple of charts, a teaser of sorts in this post. In the forthcoming posts I'll dig deeper.I was amazed with the existing list of R packages to work with spatial data, without needing to get into much of the technical detail...

## Bad Bayes: an example of why you need hold-out testing

February 1, 2014
We demonstrate a dataset that causes many good machine learning algorithms to horribly overfit. The example is designed to imitate a common situation found in predictive analytic natural language processing. In this type of application you are often building a model using many rare text features. The rare text features are often nearly unique k-grams Related posts:

## Stick Figure Function Fun – R

January 31, 2014
I have created a stick figure generating function for the purposes of adding a human figure as a demonstration of scale to some of my graphs as well as potentially emoticons to my shiny/concerto applications.You can change basic graphing parameters li...

## Python and R: Is Python really faster than R?

January 30, 2014
A friend of mine asked me to code the following in R:Generate samples of size 10 from Normal distribution with $\mu$ = 3 and $\sigma^2$ = 5;Compute the $\bar{x}$ and $\bar{x}\mp z_{\alpha/2}\displaystyle\frac{\sigma}{\sqrt{n}}$ using the 95% confidence...

## Introduction to dplyr: data manipulation made easy(er) and fun(er)

January 30, 2014
If you are just getting started in R, checkout my post on good references for beginners.  Hadly Wickham has come out with yet another R package that is destined to improve my workflow and let me concentrate less on getting R to do things, and more on my research questions. The package is dplyr, a reboot...

## dplyr 0.1.1

January 30, 2014
We’re pleased to announce a new minor version of dplyr. This fixes a few bugs that crashed R, adds a few minor new features (like a sort argument to tally()), and uses shallow copying in a few more places. There is one backward incompatible change: explain_tbl() has been renamed to explain. For a complete list

## roxygen2 3.1.0

January 30, 2014
We’re pleased to announce a new version of roxygen2. The biggest news is that roxygen2 now recognises reference class method docstrings and will automatically add them to the documentation. 3.1.0 also offers a number of minor improvements and bug fixes, as listed on the github release notice. As always, you can install the latest version with install.packages("roxygen2").

## Introducing the ecoengine package

January 30, 2014
Natural history museums have long been valuable repositories of data on species diversity. These data have been critical for fostering and shaping the development of fields such as biogeography and systematics. The importance of these data repositories is becoming increasingly important, especially in the context of climate change, where a strong understanding of how species responded to past...