Posts Tagged ‘ Hadoop ’

Learning R has really made me appreciate SAS

July 25, 2012
By

For the past 18 months, it seems like all I’ve heard about in the digital marketing industry is “big data”, and with that, mentions of using Hadoop and R to solve these sorts of problems.  Why are these tools the … Continue reading →Learning R has really made me appreciate SAS is an article from randyzwitch.com,...

Read more »

Heartbeat of a Cycling City: Bixi data at Hack/Reduce

May 8, 2012
By
Heartbeat of a Cycling City: Bixi data at Hack/Reduce

The recent Hack/Reduce hackathon in Montreal was a tonne of fun. Our team tackled a data set of consisting of Bixi (Montreal’s bicycle share system) station states at one minute temporal resolution. We used Hadoop and mapreduce to pull out some features of user behaviours. One of the things we extracted was the flux at

Read more »

Computational Journalism Server – The Way Forward

May 2, 2012
By

As I’ve noted here, the Computational Journalism Server “wants to be a Platform-as-a-Service (PaaS) when it grows up.” In plotting the way forward to that goal, I’ve looked at three options: Remain on openSUSE / SUSE Studio and ...

Read more »

Software tools for data analysis – an overview

February 19, 2011
By
Software tools for data analysis – an overview

by Szilard Pafka Discussions on various software tools (C, C++, Perl, Python, Unix shell, R, Matlab, SAS, SPSS, Excel, databases, Hadoop etc.) used in data analysis. Szilard Pafka (founder and co-organizer of the Los Angeles R users group) presents an … Continue reading →

Read more »

RHIPE: An Interface Between Hadoop and R for Large and Complex Data Analysis

February 16, 2011
By
RHIPE: An Interface Between Hadoop and R for Large and Complex Data Analysis

RHIPE: An Interface Between Hadoop and R Presented by Saptarshi Guha About the Video: I filmed the event using LectureMaker’s live event recording technique. One special feature I add to my R video recordings is the addition of my own R source code … Continue reading →

Read more »

Abusing Amazon’s Elastic MapReduce Hadoop service… easily, from R

January 10, 2011
By
Abusing Amazon’s Elastic MapReduce Hadoop service… easily, from R

JD Long's experimental segue package makes it easy to use Amazon's Elastic MapReduce service to fire up a Hadoop cluster and use it for non-Big Data, computationally-intensive tasks. The package provides a cluster-aware version of lapply() which "just works".

Read more »

My Experience at Hadoop Summit 2010 #hadoopsummit

June 30, 2010
By
My Experience at Hadoop Summit 2010 #hadoopsummit

This week I had the opportunity the trek up north to Silicon Valley to attend Yahoo’s Hadoop Summit 2010. I love Silicon Valley. The few times I’ve been there the weather was perfect (often warmer than LA), little to no traffic, no road rage and people overall seem friendly and happy. Not to mention there are so many trees...

Read more »

Lessons Learned from EC2

March 24, 2010
By
Lessons Learned from EC2

A week or so ago I had my first experience using someone else’s cluster on Amazon EC2. EC2 is the Amazon Elastic Compute Cloud. Users set up a virtual computing platform that runs on Amazon’s servers “in the cloud.” Amazon EC2 is not just another cluster. EC2 allows the user to create a disk image containing an operating system...

Read more »

You can Hadoop it! It’s elastic! Boogie woogie woog-ie!

February 16, 2010
By
You can Hadoop it! It’s elastic! Boogie woogie woog-ie!

I just came back from the future and let me be the first to tell you this: Learn some Chinese. And more than just cào nǐ niáng (肏你娘) which your friend in grad school told you means “Live happy with many blessings”. Trust me, I’ve been hanging with Madam Wu and she told me

Read more »

Analytic Infrastructure – Three Trends

May 11, 2009
By
Analytic Infrastructure – Three Trends

This is a post about systems, applications, services and architectures for building and deploying analytics. Sometimes this is called analytic infrastructure. In this post, we look at several trends impacting analytic infrastructure. Trend 1. Open source analytics has reached Main Street. R, which was first released in 1996, is now

Read more »