468 search results for "hadoop"

You can Hadoop it! It’s elastic! Boogie woogie woog-ie!

February 16, 2010
By
You can Hadoop it! It’s elastic! Boogie woogie woog-ie!

I just came back from the future and let me be the first to tell you this: Learn some Chinese. And more than just cào nǐ niáng (肏你娘) which your friend in grad school told you means “Live happy with many blessings”. Trust me, I’ve been hanging with Madam Wu and she told me

Read more »

Streaming Hadoop Data Into R Scripts

March 23, 2009
By
Streaming Hadoop Data Into R Scripts

Along the lines of Mongo Measurement Requires Mongo Management, the HadoopStreaming package on CRAN provides utilities for applying R scripts to Hadoop streaming. Hadoop is used on Amazon's EC2.

Read more »

Databases in containers

February 8, 2016
By
Databases in containers

A great number of readers reacted very positively to Nina Zumel‘s article Using PostgreSQL in R: A quick how-to. Part of the reason is she described an incredibly powerful data science pattern: using a formerly expensive permanent system infrastructure as a simple transient tool. In her case the tools were the data manipulation grammars SQL … Continue reading...

Read more »

A Million Text Files And A Single Laptop

January 28, 2016
By
A Million Text Files And A Single Laptop

More often that I would like, I receive datasets where the data has only been partially cleaned, such as the picture on the right: hundreds, thousands…even millions of tiny files. Usually when this happens, the data all have the same format (such as having being generated by sensors or other memory-constrained devices). The problem with data

Read more »

Some Comments on Donaho’s “50 Years of Data Science”

January 23, 2016
By
Some Comments on Donaho’s “50 Years of Data Science”

An old friend recently called my attention to a thoughtful essay by Stanford statistics professor David Donaho, titled “50 Years of Data Science.” Given the keen interest these days in data science, the essay is quite timely. The work clearly shows that Donaho is not only a grandmaster theoretician, but also a statistical philosopher. The … Continue reading...

Read more »

Running R jobs quickly on many machines

January 22, 2016
By
Running R jobs quickly on many machines

As we demonstrated in “A gentle introduction to parallel computing in R” one of the great things about R is how easy it is to take advantage of parallel processing capabilities to speed up calculation. In this note we will show how to move from running jobs multiple CPUs/cores to running jobs multiple machines (for … Continue reading...

Read more »

A gentle introduction to parallel computing in R

January 19, 2016
By
A gentle introduction to parallel computing in R

by John Mount Ph.D. Data Scientist at Win-Vector LLC Let's talk about the use and benefits of parallel computation in R. IBM's Blue Gene/P massively parallel supercomputer (Wikipedia). Parallel computing is a type of computation in which many calculations are carried out simultaneously." Wikipedia quoting: Gottlieb, Allan; Almasi, George S. (1989). Highly parallel computing The reason we care is:...

Read more »

A gentle introduction to parallel computing in R

January 18, 2016
By
A gentle introduction to parallel computing in R

Let’s talk about the use and benefits of parallel computation in R. IBM’s Blue Gene/P massively parallel supercomputer (Wikipedia). Parallel computing is a type of computation in which many calculations are carried out simultaneously.” Wikipedia quoting: Gottlieb, Allan; Almasi, George S. (1989). Highly parallel computing The reason we care is: by making the computer work … Continue reading...

Read more »

Microsoft R Server available free to students with DreamSpark

January 12, 2016
By
Microsoft R Server available free to students with DreamSpark

by Joseph Rickert Over the last 6 years, thousands of students and faculty have downloaded Revolution R Enterprise (RRE) from Revolution Analytics for free, making it possible for them to do statistical modeling on large data sets with the same R language used by savvy statisticians and data scientists in business and industry. In addition to this individual scholar...

Read more »

Revolution R renamed Microsoft R, available free to developers and students

January 12, 2016
By
Revolution R renamed Microsoft R, available free to developers and students

In the nine months since Microsoft acquired Revolution Analytics, there have been a steady stream of updates to Revolution R Open and Revolution R Enterprise (not to mention integration of R with SQL Server, PowerBI, Azure and Cortana Analytics). Now, we have yet more updates to announce along with fresh new names. Revolution R Open is now Microsoft R...

Read more »

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)