## Which functions in plyr do people use?

November 2, 2012
This is the question that Hadley Wickham recently set out to discovering by asking frequent R and plyr users how they use it in an online survey. Once a decent number of people have responded, Hadley quickly went forward and produced a short analysis of the plyr usage survey, and published it in RPubs.  With his permission, I am...

## googleVis 0.3.3 is released and on its way to CRAN

November 2, 2012
I am very grateful to all who provided feedback over the last two weeks and tested the previous versions 0.3.1 and 0.3.2, which were not released on CRAN. So, what changed since version 0.3.2?Not much, but plot.gvis didn't open a browser window when op...

## Ryan Peek on Customizing Your R Setup

November 2, 2012
Ryan Peek showed us how to use an .Rprofile file to customize your R setup. Here are his instructions and script: For Windows To change profile for R, go here: C:\Program Files\R\R-2.15.1\etc (or whatever version you are using) Edit the “Rprofile.site” file Restart R For Macs Create your Rprofile file. -use TextEdit or another editor to create a file called Rprofile.txt In a...

## Slides and replay for "The Rise of Data Science"

November 2, 2012
I had a great time presenting my new webinar yesterday, thanks to everyone who attended "The Rise of Data Science in the Age of Big Data Analytics" and especially those who submitted questions. Sorry I didn't have time to get to them all, but feel free to ask here in the comments. There's been some discussion recently about whether...

## The New Madrid Fault – Past, Present and Future

November 2, 2012
New Madrid, Territory of Missouri, March 22, 1816 Dear Sir, In compliance with your request, I will now give you a history, as full in detail as the limits of the letter will permit, of the late awful visitation of Providence in this place and vicinity.  On the 16th of December, 1811, about two o'clock, A.M., we were visited by a violent...

## Mapping Capabilities in R

November 2, 2012
From time-to-time creating a basic map of the United States or other parts of the world to complement some statistical analysis is useful to emphasize a point. The maps package in R provide a good way to produce these these maps.  These maps axes are based on latitude and longitude so overlaying other information on

## GGtutorial: Day 5 – Gradient Colors and Brewer Palettes

November 2, 2012
So, continuing with the short tutorials on how to do relatively simple (but sometimes very frustrating things) in ggplot, today’s post looks at how to use gradient colors and Brewer colors to color either continuous or discrete dependent variab...

## RAppArmor video tutorials: security in R!

November 2, 2012
Security and R One of the more challenging aspects of OpenCPU is security in R (or lack thereof). This is actually one of the reasons OpenCPU runs on Linux only at this point; other operating systems simply lack superpowers to implement open computing. (Maybe one exception is BSD, for which I lack superpowers). Security is ...

## PrettyR R

November 1, 2012
When it comes to R blogging I'm a complete newbie. So I'm still struggling with the technical details.Part of the process is prettifying the code snippets. One of the standard ways of doing this involves copy-and-paste-ing the R code into the Pretty R ...

## Data types, part 1: Ways to store variables

November 1, 2012
I've been alluding to different R data types, or classes, in various posts, so I want to go over them in more detail. This is part 1 of a 3 part series on data types. In this post, I'll describe and give a general overview of useful data types.  I...

## Watch Obama and Romney criss-cross the US

November 1, 2012
The Washington Post has an interactive graphic showing the rate at which the US presidential candidates Barack Obama and Mitt Romney have visited the various states for campaign rallies and fundraisers. Here's how it looks today: You can clearly see the focus on key swing states like Florida and Ohio, as well as non-competitive (but donor-rich) states like California...

## R in the Press

November 1, 2012
Here is the list of press reports and news about R Bits (A bog under The New York Times) R you ready for R? by Ashlee Vance Published: January 8, 2009, 1:52 PM The New York Times Data Analysts Captivated by R’s Power by Ashlee Vance Published: January 6, 2009  InfoWorld The BI battle isn’t

## Variable probability Bernoulli outcomes – Fast and Slow

November 1, 2012
I am working on a project that requires the generation of Bernoulli outcomes. Typically, I would go about this using the built in sample() function like so: This works great and is fast, even for large n. Problem is, I want to generate each sample with its own unique probability. Seems straight forward enough, I

## Correlation: Easy as 1-2-3?

November 1, 2012
I recently had a task to take a look at some assessment (audit) data. I was assuming, rather hoping for data with a normal distribution and thought it would be a quick case of Pearson correlation between two columns: "Duration" and "Score". Just conjecture at this point as I did not understand what the assessment process

## Upcoming R training by Hadley Wickham: SF Dec 3-4, DC Dec 10-11

November 1, 2012
(By Hadley Wickham) Hi all, I’d like to let you know about four R training courses that RStudio will be offering in December: * Effective data visualization (http://bit.ly/TY2ONI) Dec 3. San Francisco, CA * Reports and reproducible research (http://bit.ly/RsZmYr) Dec 4. San Francisco, CA * Advanced R programming (http://bit.ly/RvZDsd) Dec 10. Washington, DC * Package development (http://bit.ly/UhTIWz) Dec 11....

## New version of RStudio (v0.97)

November 1, 2012
Today a new version of RStudio (v0.97) is available for download from our website.  The principal focus of this release was creating comprehensive tools for R package development. We also implemented many other frequently requested enhancements including a new Vim editing mode and a much improved Find and Replace pane. Here’s a summary of what’s

## GGtutorial: Day 4 – More Colors

November 1, 2012
So far we’ve covered Melting and Casting data using the reshape() package and today we’re going to look at different ways of coloring and selecting palettes for plots. For these plots, we’re going to use the built in diamonds data...

## Why pictures are so important when modeling data?

October 31, 2012
$R^2$

(bis repetita) Consider the following regression summary,Call: lm(formula = y1 ~ x1)   Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 3.0001 1.1247 2.667 0.02573 * x1 0.5001 0.1179 4.241 0.00217 **...

## Regime Detection

October 31, 2012
Regime Detection comes handy when you are trying to decide which strategy to deploy. For example there are periods (regimes) when Trend Following strategies work better and there are periods when Mean Reversion strategies work better. Today I want to show you one way to detect market Regimes. To detect market Regimes, I will fit

## More data apps spawned by Sandy

October 31, 2012
As the clean-up continues on the eastern seaboard, I wanted to follow up on Monday's post on tracking Hurricane Sandy with Open Data with a couple of other R-based data applications spawned by the storm. Josef Fruehwald created an R script to tap into local weather sensors to keep track of air pressure, wind speed and rainfall near his...

## draw figures in CMYK mode in R

October 31, 2012
Print publication usually ask to use CMYK (instead of RGB) color mode for figures (because not every color can be print out), while we usually use RGB for screen reading (because screen has larger range of color scale). Of course we can convert RGB to ...

## Using R with Routino to provide road network paths between random Tweets and an iconic Smiths landmark

October 31, 2012
A couple of days ago I posted how you can go about installing Routino on OSX; and now I have just finished writing a quick post over on my Rpubs blog about how you go about using it from within R. I also wanted to know a bit more about how R and Twitter play

## Hierarchical linear models and lmer

October 31, 2012
Hierarchical linear models and lmer Article by Ben Ogorek Graphics by Bob Forrest Background My last article featured linear models with random slopes. For estimation and prediction, we used the lmer function from the lme4 package. Today we'll consider another level in the hierarchy, one...

## GGtutorial: Day 3 – Introduction to Colors

October 31, 2012
So, where does ggplot get its colors? If you’ve ever asked ggplot to color on the basis of a factor, you might have beeen surprised by the default color choices.  The fact is, ggplot colors factors on the basis of finding evenly spaced colors a...

## Fitting Distributions to Data with R

October 31, 2012
In “Fitting Distributions with R” Vito Ricci writes; “Fitting distributions consists in finding a mathematical function which represents in a good way a statistical variable. A statistician often is facing with this problem: he has some observations of a quantitative character and he wishes to test if those observations, being a sample of an unknown population, belong from a...

## Edmonton R User Group is going live

October 30, 2012
Edmonton has made a name for itself as the City of Champions, The Gateway to the North and the most northern city in North America with a population of over 1

## Makefiles for R/LaTeX projects

October 30, 2012
Updated: 21 November 2012 Make is a marvellous tool used by programmers to build software, but it can be used for much more than that. I use make whenever I have a large project involving R files and LaTeX files, which means I use it for almost all of the papers I write, and almost of the consulting reports...

## R among TechCrunch’s 5 Trendy Open-Source Techs for Big Data

October 30, 2012
Tim Gasper (Product Manager at Big Data platform Infochimps) has an informative article at TechCrunch that provides an overview of five open-source technologies trending now for Big Data applications. They are: Storm and Kafka (for processing stream data) Drill and Dremel (for ad-hoc queries of big data) R (for data science with big data) Gremlin and Giraph (for graph...

## visit to ISU

October 30, 2012
A short visit to ISU but and therefore a busy and proftable day! About ten appointments in Snedecor Hall after a nice morning run, a highly attended Zyskind Lecture, and many interesting discussions all over the day: e.g., I had a great time discussing using null recurrent Markov chains for integral approximations with Krishna