202 search results for "hadoop"

My Day at ACM Data Mining Camp III

November 13, 2010
By
My Day at ACM Data Mining Camp III

My first time at ACM Data Mining Camp was so awesome, that I was thrilled the make the trip up to San Jose for the November 2010 version. In July, I gave a talk at the Emerging Technologies for Online Learning Symposium conference with a faculty member in the Department of Statistics, at the Fairmont. The place was amazing,...

Read more »

Using R and Hadoop to analyze VOIP data

November 8, 2010
By

Last month, the newest member of Revolution's engineering team, Saptarshi Guha, gave a presentation at Hadoop World 2010 on using R and Hadoop to analyze 1.3 billion voice-over-IP packets to identify calls and measure call quality. Saptarshi, of course, is the author of RHIPE, which lets R programmers write map-reduce algorithms in the Hadoop framework without needing to learn...

Read more »

Handling Large Datasets in R

Handling large dataset in R, especially CSV data, was briefly discussed before at Excellent free CSV splitter and Handling Large CSV Files in R. My file at that time was around 2GB with 30 million number of rows and 8 columns. Recently I started to collect and analyze US corporate bonds tick data from year...

Read more »

RHIPE in the SD Times

October 12, 2010
By

Saptarshi Guha, who we profiled yesterday, is at the Hadoop World conference in New York City today. At 4PM, Saptarshi will give a presentation on RHIPE, his link between R and Hadoop. Saptarashi was interviewed yesterday by Alex Handy of the SD Times, where he talked about his background and his motivation to create RHIPE. Saptarshi was sponsored by...

Read more »

In case you missed it: September Roundup

October 12, 2010
By

In case you missed them, here are some articles from August of particular interest to R users. We presented a profile of Hadley Wickham, author of many popular R packages including ggplot2 and reshape. We riffed the design of the new Twitter website into a discussion on calculating the Golden Mean with R. Several readers contributed 1-liners based on...

Read more »

The R-Files: Saptarshi Guha

October 11, 2010
By
The R-Files: Saptarshi Guha

"The R-Files" is an occasional series from Revolution Analytics, where we profile prominent members of the R Community. Name: Saptarshi Guha Background: Ph.D. in Statistics, Purdue University Nationality: India Years Using R: 6 Known for: Developing RHIPE package for R + Hadoop integration At just 31 years old, Saptarshi Guha has emerged as a cutting-edge contributor to the R...

Read more »

Making sense of MapReduce

September 24, 2010
By

From guest blogger Joseph Rickert. Last night I went to hear Ken Krugler of Bixolabs talk about Hadoop at the monthly meeting of the Software Developers Forum. Maybe because Ken is an unusually lucid speaker, or maybe because I just reached some sort of cumulative tipping point through the prep work of all those patient people who have tried...

Read more »

Saptarshi Guha on Hadoop, R

September 20, 2010
By

Saptarshi Guha (author of the Rhipe package) joins the likes of Ebay, Yahoo, Twitter and Facebook and as one of just 37 presenters at the Hadoop World conference. (Revolution Analytics is proud to sponsor Saptarshi's presence at this event, which take place in New York on October 12.) He'll be talking about using R and Hadoop to analyze Voice-over-IP...

Read more »

Taking R to the Limit: Parallelism and Big Data

August 23, 2010
By

In a two-part series at the Los Angeles R User Group, Ryan Rosario took a look at the many ways you can take the R language to the limits of high-performance computing. In Part I (see video at this link; slides and code also available), Ryan focuses on the various methods of parallel computing in R. There's some great...

Read more »

Taking R to the Limit, Part II – Large Datasets in R

August 20, 2010
By
Taking R to the Limit, Part II – Large Datasets in R

For Part I, Parallelism in R, click here.

Tuesday night I again had the opportunity to present on high performance computing in R, at the Los Angeles R Users’ Group. This was the second part of a two part series called “Taking R to the Limit: High Performance Computing in R.” Part II discussed ways to work with large datasets...

Read more »