205 search results for "hadoop"

Saptarshi Guha on Hadoop, R

September 20, 2010
By

Saptarshi Guha (author of the Rhipe package) joins the likes of Ebay, Yahoo, Twitter and Facebook and as one of just 37 presenters at the Hadoop World conference. (Revolution Analytics is proud to sponsor Saptarshi's presence at this event, which take place in New York on October 12.) He'll be talking about using R and Hadoop to analyze Voice-over-IP...

Read more »

Taking R to the Limit: Parallelism and Big Data

August 23, 2010
By

In a two-part series at the Los Angeles R User Group, Ryan Rosario took a look at the many ways you can take the R language to the limits of high-performance computing. In Part I (see video at this link; slides and code also available), Ryan focuses on the various methods of parallel computing in R. There's some great...

Read more »

Taking R to the Limit, Part II – Large Datasets in R

August 20, 2010
By
Taking R to the Limit, Part II – Large Datasets in R

For Part I, Parallelism in R, click here.

Tuesday night I again had the opportunity to present on high performance computing in R, at the Los Angeles R Users’ Group. This was the second part of a two part series called “Taking R to the Limit: High Performance Computing in R.” Part II discussed ways to work with large datasets...

Read more »

Announcing Big Data for Revolution R

August 3, 2010
By

I've hinted this was coming a few times before, but with today's press release the announcement is official: the next release of Revolution R Enterprise will include "Big Data" capabilities thanks to the new RevoScaleR package. We're pretty excited at how it's turned out: it's kinda amazing to be able to use R's formula syntax like this: arrDelayLm2 <-...

Read more »

Taking R to the Limit, Part I – Parallelization in R

July 28, 2010
By
Taking R to the Limit, Part I – Parallelization in R

Tuesday night I had the opportunity to present on high performance computing in R, and the Los Angeles R Users’ Group. There was so much to talk about that I had to split my talk into two parts. The first part was parallelization and the second ...

Read more »

An experiment in A/B Testing my Résumé

July 1, 2010
By
An experiment in A/B Testing my Résumé

Objective I’ll admit it: my résumé doesn’t stand out. I’ve had some great internships, but also a tendency to work for companies that aren’t (yet!) household names. And though I’m doing fine academically, it’s not well enough to stand out … Continue reading

Read more »

My Experience at Hadoop Summit 2010 #hadoopsummit

June 30, 2010
By
My Experience at Hadoop Summit 2010 #hadoopsummit

This week I had the opportunity the trek up north to Silicon Valley to attend Yahoo’s Hadoop Summit 2010. I love Silicon Valley. The few times I’ve been there the weather was perfect (often warmer than LA), little to no traffic, no road rage and people overall seem friendly and happy. Not to mention there are so many trees...

Read more »

Thoughts on Making Data Work

June 9, 2010
By

I really enjoyed all four talks at today's online conference, Making Data Work. (Disclosure: Revolution sponsored this conference.) I thought the four speakers together gave a great overview of issues related to the processing, analysis, and visualization of big data. Mike Driscoll started off with a useful categorization for data size. "Small Data" (<10Gb) fits in the memory of...

Read more »

Data preparation for Social Network Analysis using R and Gephi

June 2, 2010
By
Data preparation for Social Network Analysis using R and Gephi

I want to share my experience in generating the data for social network analysis using R and analyzing it using Gephi... WHICH DATA STRUCTURE TO USE FOR LARGE GRAPHS?I quickly realized that using edge lists and adjacency matrix gets difficult as the g...

Read more »

The Next Big Thing: SAS and SPSS!…wait, what?

April 15, 2010
By
The Next Big Thing: SAS and SPSS!…wait, what?

Thanks to the R Bloggers aggregator I came across Yihui Xie’s post on a piece currently making the rounds about statistical analysis platforms. In The Next Big Thing, AnnMaria De Mars makes the argument that R—as a statistical computing platform—is not well suited for what she views as the next big things in data

Read more »