R speeds up when the Basic Linear Algebra System (BLAS) it uses is well tuned. The reference BLAS that comes with R and Ubuntu isn’t very fast. On my machine, it takes 9 minutes to run a well known R … Continue reading →

Introduction Continuing my recent series on exploratory data analysis (EDA), today’s post focuses on histograms, which are very useful plots for visualizing the distribution of a data set. I will discuss how histograms are constructed and use histograms to assess the distribution of the “Ozone” data from the built-in “airquality” data set in R. In

Most of time, I don’t need to deal with different encodings at all. When possible, I use ASCII characters. And when there is a little processing in Chinese characters or other Unicode characters, I use .Net languages or JVM languages, in which every string is Unicode and of course when the characters are displayed they are displayed as characters...

(This article was first published on Milano R net, and kindly contributed to R-bloggers) As mentioned in the previous article, a possibility for dealing with some Big Data problems is to integrate R within the Hadoop ecosystem. Therefore, it's necessary to have a bridge between the two environments. It means that R should be capable of handling data the...

(This article was first published on Commodity Stat Arb, and kindly contributed to R-bloggers) I can’t believe it has been nearly 6 months since I last posted. Given the sustained heat it seemed like a good idea to finish off this subject.As hinted at in my last post, temperature is the missing variable to make sense of Residential electrical...

the american time use survey collects information about how we spend our time. it's a pretty simple setup: sampled individuals write down everything they do for a single twenty-four hour period, in ten minute intervals. those diaries are a...

