304 search results for "hadoop"

In case you missed it: September Roundup

October 12, 2010
By

In case you missed them, here are some articles from August of particular interest to R users. We presented a profile of Hadley Wickham, author of many popular R packages including ggplot2 and reshape. We riffed the design of the new Twitter website into a discussion on calculating the Golden Mean with R. Several readers contributed 1-liners based on...

Read more »

The R-Files: Saptarshi Guha

October 11, 2010
By
The R-Files: Saptarshi Guha

"The R-Files" is an occasional series from Revolution Analytics, where we profile prominent members of the R Community. Name: Saptarshi Guha Background: Ph.D. in Statistics, Purdue University Nationality: India Years Using R: 6 Known for: Developing RHIPE package for R + Hadoop integration At just 31 years old, Saptarshi Guha has emerged as a cutting-edge contributor to the R...

Read more »

Making sense of MapReduce

September 24, 2010
By

From guest blogger Joseph Rickert. Last night I went to hear Ken Krugler of Bixolabs talk about Hadoop at the monthly meeting of the Software Developers Forum. Maybe because Ken is an unusually lucid speaker, or maybe because I just reached some sort of cumulative tipping point through the prep work of all those patient people who have tried...

Read more »

Taking R to the Limit: Parallelism and Big Data

August 23, 2010
By

In a two-part series at the Los Angeles R User Group, Ryan Rosario took a look at the many ways you can take the R language to the limits of high-performance computing. In Part I (see video at this link; slides and code also available), Ryan focuses on the various methods of parallel computing in R. There's some great...

Read more »

Taking R to the Limit, Part II – Large Datasets in R

August 20, 2010
By
Taking R to the Limit, Part II – Large Datasets in R

For Part I, Parallelism in R, click here. Tuesday night I again had the opportunity to present on high performance computing in R, at the Los Angeles R Users’ Group. This was the second part of a two part series called “Taking R to the Limit: High Performance Computing in R.” Part II discussed ways to work with large datasets...

Read more »

Announcing Big Data for Revolution R

August 3, 2010
By

I've hinted this was coming a few times before, but with today's press release the announcement is official: the next release of Revolution R Enterprise will include "Big Data" capabilities thanks to the new RevoScaleR package. We're pretty excited at how it's turned out: it's kinda amazing to be able to use R's formula syntax like this: arrDelayLm2 <-...

Read more »

Taking R to the Limit, Part I – Parallelization in R

July 28, 2010
By
Taking R to the Limit, Part I – Parallelization in R

Tuesday night I had the opportunity to present on high performance computing in R, and the Los Angeles R Users’ Group. There was so much to talk about that I had to split my talk into two parts. The first part was parallelization and the second ...

Read more »

An experiment in A/B Testing my Résumé

July 1, 2010
By
An experiment in A/B Testing my Résumé

Objective I’ll admit it: my résumé doesn’t stand out. I’ve had some great internships, but also a tendency to work for companies that aren’t (yet!) household names. And though I’m doing fine academically, it’s not well enough to stand out … Continue reading →

Read more »

Thoughts on Making Data Work

June 9, 2010
By

I really enjoyed all four talks at today's online conference, Making Data Work. (Disclosure: Revolution sponsored this conference.) I thought the four speakers together gave a great overview of issues related to the processing, analysis, and visualization of big data. Mike Driscoll started off with a useful categorization for data size. "Small Data" (<10Gb) fits in the memory of...

Read more »

Data preparation for Social Network Analysis using R and Gephi

June 2, 2010
By
Data preparation for Social Network Analysis using R and Gephi

I want to share my experience in generating the data for social network analysis using R and analyzing it using Gephi... WHICH DATA STRUCTURE TO USE FOR LARGE GRAPHS?I quickly realized that using edge lists and adjacency matrix gets difficult as the g...

Read more »