Posts Tagged ‘ MapReduce ’

SIGKDD 2011 Conference — Days 2/3/4 Summary

August 27, 2011
By
SIGKDD 2011 Conference — Days 2/3/4 Summary

<< My review of Day 1.

I am summarizing all of the days together since each talk was short, and I was too exhausted to write a post after each day. Due to the broken-up schedule of the KDD sessions, I group everything together instead of switching back and forth among a dozen different topics....

Read more »

SIGKDD 2011 Conference — Day 1 (Graph Mining and David Blei/Topic Models)

August 22, 2011
By
SIGKDD 2011 Conference — Day 1 (Graph Mining and David Blei/Topic Models)

I have been waiting for the KDD conference to come to California, and I was ecstatic to see it held in San Diego this year. AdMeld did an awesome job displaying KDD ads on the sites that I visit, sometimes multiple times per page. That’s good targeting!

Mining and Learning on Graphs Workshop 2011

I had...

Read more »

Review of 2011 Data Scientist Summit

May 13, 2011
By
Review of 2011 Data Scientist Summit

Some time over the past 6 weeks I randomly saw a tweet announcing the “Data Scientist Summit” and shortly below it I saw that it would be held in Las Vegas at the Venetian. Being a Data Scientist myself is reason enough to not pass up this opportunity, but Vegas definitely sweetens the deal!...

Read more »

RHIPE: An Interface Between Hadoop and R for Large and Complex Data Analysis

February 16, 2011
By
RHIPE: An Interface Between Hadoop and R for Large and Complex Data Analysis

RHIPE: An Interface Between Hadoop and R Presented by Saptarshi Guha About the Video: I filmed the event using LectureMaker’s live event recording technique. One special feature I add to my R video recordings is the addition of my own R source code … Continue reading

Read more »

Abusing Amazon’s Elastic MapReduce Hadoop service… easily, from R

January 10, 2011
By
Abusing Amazon’s Elastic MapReduce Hadoop service… easily, from R

JD Long's experimental segue package makes it easy to use Amazon's Elastic MapReduce service to fire up a Hadoop cluster and use it for non-Big Data, computationally-intensive tasks. The package provides a cluster-aware version of lapply() which "just works".

Read more »

My Day at ACM Data Mining Camp III #DMCAMP

November 13, 2010
By
My Day at ACM Data Mining Camp III #DMCAMP

My first time at ACM Data Mining Camp was so awesome, that I was thrilled the make the trip up to San Jose for the November 2010 version. In July, I gave a talk at the Emerging Technologies for Online Learning Symposium conference with a faculty member in the Department of Statistics, at the...

Read more »

Taking R to the Limit, Part II – Large Datasets in R

August 20, 2010
By
Taking R to the Limit, Part II – Large Datasets in R

For Part I, Parallelism in R, click here.

Tuesday night I again had the opportunity to present on high performance computing in R, at the Los Angeles R Users’ Group. This was the second part of a two part series called “Taking R to the Limit: High Performance Computing in R.” Part II discussed ways...

Read more »

Lessons Learned from EC2

March 24, 2010
By
Lessons Learned from EC2

A week or so ago I had my first experience using someone else’s cluster on Amazon EC2. EC2 is the Amazon Elastic Compute Cloud. Users set up a virtual computing platform that runs on Amazon’s servers “in the cloud.” Amazon EC2 is not just another cluster. EC2 allows the user to create a disk...

Read more »

My Experience at ACM Data Mining Camp #DMcamp

March 21, 2010
By
My Experience at ACM Data Mining Camp #DMcamp

My parents and I made plans to visit San Jose and Saratoga on my grandmother’s birthday, March 19, since that is where she grew up. I randomly saw someone tweet about the ACM Data Mining Camp unconference that happened to be the next day, March 20, only a couple of miles from our hotel...

Read more »