September 15, 2014
During July I was working with a commercial data source that provides extra data around IP addresses and it dawned on me: rather than pinging billions of IP addresses and creating map, I could create a map from all the geolocation data I had at my finger tips. At a high level I could answer “Where are all the IPv4 addresses worldwide?” But in...

## PCA / EOF for data with missing values – a comparison of accuracy

September 15, 2014
Not all Principal Component Analysis (PCA) (also called Empirical Orthogonal Function analysis, EOF) approaches are equal when it comes to dealing with a data field that contain missing values (i.e. "gappy"). The following post compares several methods by assessing the accuracy of the derived PCs to reconstruct the "true" data set, as was similarly...

## How do you say π^π^π?

September 15, 2014
Well, not that you really probably want to know how to say such an absurdly large number. However for those of you who are interested (allowing for rounding) it is:one quintillion, three hundred forty quadrillion, one hundred sixty-four trillion, one h...

## One datavis for you, ten for me

September 14, 2014
Over the years of my graduate studies I made a lot of plots. I mean tonnes. To get an extremely conservative estimate I grep’ed for every instance of “plot(” in all of the many R scripts I wrote over the past five years. The actual number is very likely orders of magnitude larger as 1) many

## Trying a prefmap

September 14, 2014
Preference mapping is a key technique in sensory and consumer research. It links the sensory perception on products to the liking of products and hence provides clues to the development of new, well tasting, products. Even though it is a key technique,...

## RDataMining Slides Series

September 14, 2014
by Yanchang Zhao, RDataMining.com I have made a series of slides on R and data mining, based on my book titled R and Data Mining — Examples and Case Studies. The slides will be used at my presentations at seminars … Continue reading →

## Newcastle R course, a write-up

September 13, 2014
I recently attended a week-long R course in Newcastle, taught by Colin Gillespie. It went from “An Introduction to R” to “Advanced Graphics” via a day each on modelling, efficiency and programming. Suffice to say it was an intense 5 days! Overall it was the best R course I’ve been on so far. I’d recommend it to others,...

## The Ecology of Data Matrices: A Metaphor for Simultaneous Clustering

September 13, 2014
"...a metaphor is an affair between a predicate with a past and an object that yields while protesting." Nelson Goodman (1976)It is, as if, data matrices were alive. The rows are species, and the columns are habitats. At least that seems to be the...

September 12, 2014
Google has just released a new package for R: CausalImpact. Amongst many other things, this package allows Google to resolve the classical conundrum: how can we asses the impact of an intervention (for example, the effect of an advertising campaign on website clicks) when we can't know what would have happened if we hadn't run the campaign? For a...

## R: k-Means Clustering on an Image

September 12, 2014
Enough with the theory we recently published, let's take a break and have fun on the application of Statistics used in Data Mining and Machine Learning, the k-Means Clustering.k-means clustering is a method of vector quantization, originally from signa...

## Conor Atom, a book for “children scientists” (an indiegogo campaign)

September 12, 2014
Mario Morales –a Colombian-American, Statistician-Bioinformatician, Member of the R community and a regular attendant of the UseR conference since 2007 has launched a book for Children called “Conor Atom, The child scientist”. While R is not the official language in the book, it is depicted as one of the interests of the character. The project has some goals such as not using...

## Embedding RData files in Rmarkdown files for more reproducible analyses

September 12, 2014
For those of us interested in reproducible analysis, Rmarkdown is a great way of communicating our code to other researchers. Rstudio, in particular, makes it very easy to create attractive HTML document containing text, code, and figures, which can then be sent to colleagues or put on the internet for anyone to see. If you aren't using...

## Read sas7bdat files in R with GGASoftware Parso library

September 12, 2014
... using the new R package sas7bdat.parso. The software company GGASoftware has extended the work of myself and others on the sas7bdat R package by developing a Java library called Parso, which also reads sas7bdat files. They have worked out most of the remaining kinks. For example, the Parso library reads sas7bdat files with compressed

## Princess Jasmine’s Trick

September 12, 2014
I’m history! No, I’m mythology! Nah, I don’t care what I am; I’m free hee! (Genie, when he is released from the magical oil lamp by Aladdin) A long time ago, in a kingdom far away, lived a beautiful princess named Jasmine. There also lived a very rich and evil wizard named Jafar, who was

## Using colorized PNG pictograms in R base plots

September 12, 2014
Today I stumbled across a figure in an explanation on multiple factor analysis which contained pictograms.   Figure 1 from Abdi & Valentin (2007), p. 8. I wanted to reproduce a similar figure in R using pictograms and additionally color them e.g. by group membership . I have almost no knowledge about image processing, so

## shinyStore – Persistent Client-Side Storage in Shiny

September 11, 2014
We’re thrilled to announce the availability of shinyStore, an R package that enables HTML5 Web Storage from Shiny, an interactive web application framework for R. A live demo of an example application is available here. Set a text value then refresh the page, or close the tab and come back in a new tab. You’ll

## UVA / Charlottesville R Meetup

September 11, 2014
TL;DR? We started an R Users group, awesome community, huge turnout at first meeting, lots of potential.---I've sat through many hours of meetings where faculty lament the fact that their trainees (and the faculty themselves!) are woefully ill-prepared...

## Martin Maechler Invited Talk at useR! 2014 – Good Practices in R Programming

September 11, 2014
Martin Maechler is a member of R-Core. This distinction puts him in the very apex...

## What makes a good academic conference?

September 11, 2014
What makes a good academic conference? Here's what we like. The post What makes a good academic conference? appeared first on Decision Science News.

## Save your simulation study seeds

September 11, 2014
Here in the Northern hemisphere, gardeners are gathering seeds from their prize-winning vegetables are storing them away for next year’s crop. Today at the 20th London Stata Users’ Group meeting, I learnt a similar trick. It’s strange I never thought … Continue reading →

## pkgKitten 0.1.2: Still creating R Packages that purr

September 11, 2014
A brown bag release 0.1.2 of pkgKitten is now on CRAN, following yesterday's 0.1.1 upload Next time I'll try to remember that when I have parameters name and path, it won't work so well to call them as path and name ... Changes in version 0.1.2 (20...

## R at Conferences this Fall

September 11, 2014
by Joseph Rickert The days are getting shorter here in California and the summer R conferences UseR!2014 and JSM are behind us, but there are still some very fine conferences for R users to look forward to before the year ends. DataWeek starts in San Francisco on September 15th. I will be conducting a bootcamp for new R users,...

## R User Group in Birmingham, AL

September 11, 2014
If Birmingham, UK has one, then Birmingham, AL, USA should too. There is a big gaping hole in R user groups between Georgia and Texas that I think needs filling.  Way back in 2011, I weakly posted R User Group Birmingham Alabama, but this time I am mo...

## “Probabilizing” uncertainty in the Brazilian Presidential Election

September 11, 2014
The following figure shows the probability distributions of vote intentions for the main candidates after distributing the stock of undecided voters. As Marina (PSB) is getting back to her track, a question that comes to light is whether Dilma will get more votes than the sum of the others, and what is the probability that

## Generalized Double Pareto Priors for Regression

September 10, 2014
This post is a review of the “GENERALIZED DOUBLE PARETO SHRINKAGE” Statistica Sinica (2012) paper by Armagan, Dunson and Lee. Consider the regression model (Y=Xbeta+varepsilon) where we put a generalized double pareto distribution as the prior on the regression coefficients (beta). The GDP distribution has density $$begin{equation} f(beta|xi,alpha)=frac{1}{2xi}left( 1+frac{|beta|}{alphaxi} right)^{-(alpha+1)}. label{} end{equation}$$ GDP as Scale The post

## Visualizing Website Pathing With Sankey Charts

September 10, 2014
In my prior post on visualizing website structure using network graphs, I referenced that network graphs showed the pairwise relationships between two pages (in a bi-directional manner). However, if you want to analyze how your visitors are pathing through your site, you can visualize your data using a Sankey chart. Visualizing Single Page-to-Next Page Pathing Related posts:

## pkgKitten 0.1.1: Still creating R Packages that purr

September 10, 2014
A maintenance release 0.1.1 of pkgKitten is now on CRAN. It has only one small change: the function playWithPerPackageHelpPage() was factored out of the main function kitten() as I happened to be needing something just like playWithPerPackageHelpPage...

## CausalImpact: A new open-source package for estimating causal effects in time series

September 10, 2014
How can we measure the number of additional clicks or sales that an AdWords campaign generated? How can we estimate the impact of a new feature on app downloads? How do we compare the effectiveness of publicity across countries?In principle, all of these questions can be answered through causal inference.In practice, estimating a causal effect...