350 search results for "hadoop"

Taking R to the Limit, Part II – Large Datasets in R

August 20, 2010
By
Taking R to the Limit, Part II – Large Datasets in R

For Part I, Parallelism in R, click here. Tuesday night I again had the opportunity to present on high performance computing in R, at the Los Angeles R Users’ Group. This was the second part of a two part series called “Taking R to the Limit: High Performance Computing in R.” Part II discussed ways to work with large datasets...

Read more »

Announcing Big Data for Revolution R

August 3, 2010
By

I've hinted this was coming a few times before, but with today's press release the announcement is official: the next release of Revolution R Enterprise will include "Big Data" capabilities thanks to the new RevoScaleR package. We're pretty excited at how it's turned out: it's kinda amazing to be able to use R's formula syntax like this: arrDelayLm2 <-...

Read more »

Taking R to the Limit, Part I – Parallelization in R

July 28, 2010
By
Taking R to the Limit, Part I – Parallelization in R

Tuesday night I had the opportunity to present on high performance computing in R, and the Los Angeles R Users’ Group. There was so much to talk about that I had to split my talk into two parts. The first part was parallelization and the second ...

Read more »

An experiment in A/B Testing my Résumé

July 1, 2010
By
An experiment in A/B Testing my Résumé

Objective I’ll admit it: my résumé doesn’t stand out. I’ve had some great internships, but also a tendency to work for companies that aren’t (yet!) household names. And though I’m doing fine academically, it’s not well enough to stand out … Continue reading →

Read more »

Thoughts on Making Data Work

June 9, 2010
By

I really enjoyed all four talks at today's online conference, Making Data Work. (Disclosure: Revolution sponsored this conference.) I thought the four speakers together gave a great overview of issues related to the processing, analysis, and visualization of big data. Mike Driscoll started off with a useful categorization for data size. "Small Data" (<10Gb) fits in the memory of...

Read more »

Data preparation for Social Network Analysis using R and Gephi

June 2, 2010
By
Data preparation for Social Network Analysis using R and Gephi

I want to share my experience in generating the data for social network analysis using R and analyzing it using Gephi... WHICH DATA STRUCTURE TO USE FOR LARGE GRAPHS?I quickly realized that using edge lists and adjacency matrix gets difficult as the g...

Read more »

The Next Big Thing: SAS and SPSS!…wait, what?

April 15, 2010
By
The Next Big Thing: SAS and SPSS!…wait, what?

Thanks to the R Bloggers aggregator I came across Yihui Xie’s post on a piece currently making the rounds about statistical analysis platforms. In The Next Big Thing, AnnMaria De Mars makes the argument that R—as a statistical computing platform—is not well suited for what she views as the next big things in data

Read more »

Lessons Learned from EC2

March 24, 2010
By
Lessons Learned from EC2

A week or so ago I had my first experience using someone else’s cluster on Amazon EC2. EC2 is the Amazon Elastic Compute Cloud. Users set up a virtual computing platform that runs on Amazon’s servers “in the cloud.” Amazon EC2 is not just another cluster. EC2 allows the user to create a disk image containing an operating system...

Read more »

My Experience at ACM Data Mining Camp #DMcamp

March 21, 2010
By
My Experience at ACM Data Mining Camp #DMcamp

My parents and I made plans to visit San Jose and Saratoga on my grandmother’s birthday, March 19, since that is where she grew up. I randomly saw someone tweet about the ACM Data Mining Camp unconference that happened to be the next day, March 20, only a couple of miles from our hotel in Santa Clara. This was...

Read more »

Open Source is Opening Data to Predictive Analytics

March 9, 2010
By

This article by REvolution Computing CEO Norman Nie is crossposted from the Future of Open Source Forum. The R Project: despite there being over 2 million users of this open-source language for statistical data analysis, you might not have heard of it ... yet. You might have seen this feature in the New York Times last year, and you...

Read more »