R is Hot: Part 4

October 28, 2010
By

(This article was first published on Revolutions, and kindly contributed to R-bloggers)

This is Part 4 of a five-part article series, with new parts published each Thursday. You can download the complete article from the Revolution Analytics website.

High Quality Graphics, Made Easy

R is especially useful for generating charts and graphics, quickly and easily. The ability to create visual plots of complex data is more than just a handy trick; it’s an incredibly important step in the analysis of data because it enables you to literally “see” the patterns and anomalies hidden within the data. 

The New York Times has been a leader in the use of charts and graphics that make it easier for readers to get the gist of complicated stories. Amanda Cox, a graphics editor at the Times, says R is particularly valuable in deadline situations when data is scant and time is precious. “If you can picture it in your head, chances are good that you can make it work in R,” says Cox. “R makes it easy to read data, generate lines and points, and place them where you want them. It’s very flexible and super quick. When you’ve only got two or three hours until deadline, R can be brilliant.”

When Michael Jackson died in 2009, the Times quickly prepared a graphic timeline showing how the artist’s songs had performed on the Billboard Hot 100 chart from 1971 to the present. It would have been difficult or impossible to prepare a similar chart on deadline using other analytic techniques.

Peter Aldhous, the San Francisco bureau chief of New Scientist magazine, has used R to generate information that is subsequently used by graphic designers to create some of the charts that illustrate his articles. But he also uses R to generate simple plots that allow him to perceive quickly what’s really going on underneath the data he collects. For a journalist, the ability to draw quick insights from data is absolutely invaluable.

“I’ve got a Ph.D. in animal behavior, so I have some statistical training,” says Aldhous. “R is great for doing exploratory work that gives me an idea of what the distributions look like. I’ve found it incredibly useful for processing data quickly.”

Recently, Aldhous investigated complaints about certain academic papers on stem cell research being subjected to “obstructive” reviews, resulting in delays or spurious rejections by peer-reviewed journals. Using an R package to generate a quick series of box plots and scatter plots, he saw that papers from scientists outside the US seemed to take longer to get accepted and published. He was then able to follow up and analyze the data in R, using the most appropriate statistical and graphical methods: Cox proportional hazards regression and Kaplan-Meier curves.

In an article headlined “Hey, Green Spender,” Aldhous and colleague Phil McKenna examined the gap between consumer perception and environmental realities across multiple industries such as retail, media, travel and leisure, food and beverages, technology, construction and chemicals.

When the data was plotted, the differences between the perceptions and the realities were immediately visible – and the reporters knew they were on the right track. 

“It’s not just about producing graphics for publication,” Aldhous explains. “It’s about playing around and making a bunch of graphics that help you explore your data. This kind of graphical analysis is a really useful way to help you understand what you’re dealing with, because if you can’t see it, you can’t really understand it. But when you start graphing it out, you can really see what you’ve got.”

R makes it possible for people who aren’t professional analysts to create high quality charts and graphics such as maps, 3-D surfaces, image plots, scatter plots, histograms, bar plots and pie charts. 

“R is far from easy when you first encounter it, especially if you’re not a programmer who is used to working in the command line. It has a very steep initial learning curve,” says Aldhous. “But once you get to grips with its conventions and quirks, and if you study the documentation, then it becomes easy to plug different variables into the same code to create a series of related graphics.”

When Aldhous hit a snag or got in over his head, he reached out to the R community for help.  “The community is delightful and incredibly helpful,” he says. “I could not have done all of this without expert help.”

Continued Thursdays

To leave a comment for the author, please follow the link and comment on his blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: , ,

Comments are closed.