Posts Tagged ‘ Hadoop ’

My Day at ACM Data Mining Camp III #DMCAMP

November 13, 2010
By
My Day at ACM Data Mining Camp III #DMCAMP

My first time at ACM Data Mining Camp was so awesome, that I was thrilled the make the trip up to San Jose for the November 2010 version. In July, I gave a talk at the Emerging Technologies for Online Learning Symposium conference with a faculty member in the Department of Statistics, at the Fairmont. The place was amazing,...

Read more »

Taking R to the Limit, Part II – Large Datasets in R

August 20, 2010
By
Taking R to the Limit, Part II – Large Datasets in R

For Part I, Parallelism in R, click here.

Tuesday night I again had the opportunity to present on high performance computing in R, at the Los Angeles R Users’ Group. This was the second part of a two part series called “Taking R to the Limit: High Performance Computing in R.” Part II discussed ways to work with large datasets...

Read more »

My Experience at Hadoop Summit 2010 #hadoopsummit

June 30, 2010
By
My Experience at Hadoop Summit 2010 #hadoopsummit

This week I had the opportunity the trek up north to Silicon Valley to attend Yahoo’s Hadoop Summit 2010. I love Silicon Valley. The few times I’ve been there the weather was perfect (often warmer than LA), little to no traffic, no road rage and people overall seem friendly and happy. Not to mention there are so many trees...

Read more »

Lessons Learned from EC2

March 24, 2010
By
Lessons Learned from EC2

A week or so ago I had my first experience using someone else’s cluster on Amazon EC2. EC2 is the Amazon Elastic Compute Cloud. Users set up a virtual computing platform that runs on Amazon’s servers “in the cloud.” Amazon EC2 is not just another cluster. EC2 allows the user to create a disk image containing an operating system...

Read more »

My Experience at ACM Data Mining Camp #DMcamp

March 21, 2010
By
My Experience at ACM Data Mining Camp #DMcamp

My parents and I made plans to visit San Jose and Saratoga on my grandmother’s birthday, March 19, since that is where she grew up. I randomly saw someone tweet about the ACM Data Mining Camp unconference that happened to be the next day, March 20, only a couple of miles from our hotel in Santa Clara. This was...

Read more »

You can Hadoop it! It’s elastic! Boogie woogie woog-ie!

February 16, 2010
By
You can Hadoop it! It’s elastic! Boogie woogie woog-ie!

I just came back from the future and let me be the first to tell you this: Learn some Chinese. And more than just cào nǐ niáng (肏你娘) which your friend in grad school told you means “Live happy with many blessings”. Trust me, I’ve been hanging with Madam Wu and she told me

Read more »

Analytic Infrastructure – Three Trends

May 11, 2009
By
Analytic Infrastructure – Three Trends

This is a post about systems, applications, services and architectures for building and deploying analytics. Sometimes this is called analytic infrastructure. In this post, we look at several trends impacting analytic infrastructure. Trend 1. Open source analytics has reached Main Street. R, which was first released in 1996, is now

Read more »

Analytic Infrastructure – Three Trends

May 11, 2009
By
Analytic Infrastructure – Three Trends

This is a post about systems, applications, services and architectures for building and deploying analytics. Sometimes this is called analytic infrastructure. In this post, we look at several trends impacting analytic infrastructure. Trend 1. Open source analytics has reached Main Street. R, which was first released in 1996, is now

Read more »

Analytic Infrastructure – Three Trends

May 11, 2009
By

This is a post about systems, applications, services and architectures for building and deploying analytics. Sometimes this is called analytic infrastructure. In this post, we look at several trends impacting analytic infrastructure. Trend 1. Open source analytics has reached Main Street. R, which was first released in 1996, is now

Read more »

Streaming Hadoop Data Into R Scripts

March 23, 2009
By
Streaming Hadoop Data Into R Scripts

Along the lines of Mongo Measurement Requires Mongo Management, the HadoopStreaming package on CRAN provides utilities for applying R scripts to Hadoop streaming. Hadoop is used on Amazon's EC2.

Read more »