515 search results for "hadoop"

A Million Text Files And A Single Laptop

January 28, 2016
By
A Million Text Files And A Single Laptop

More often that I would like, I receive datasets where the data has only been partially cleaned, such as the picture on the right: hundreds, thousands…even millions of tiny files. Usually when this happens, the data all have the same format (such as having being generated by sensors or other memory-constrained devices). The problem with data

Read more »

Some Comments on Donaho’s “50 Years of Data Science”

January 23, 2016
By
Some Comments on Donaho’s “50 Years of Data Science”

An old friend recently called my attention to a thoughtful essay by Stanford statistics professor David Donaho, titled “50 Years of Data Science.” Given the keen interest these days in data science, the essay is quite timely. The work clearly shows that Donaho is not only a grandmaster theoretician, but also a statistical philosopher. The … Continue reading...

Read more »

Running R jobs quickly on many machines

January 22, 2016
By
Running R jobs quickly on many machines

As we demonstrated in “A gentle introduction to parallel computing in R” one of the great things about R is how easy it is to take advantage of parallel processing capabilities to speed up calculation. In this note we will show how to move from running jobs multiple CPUs/cores to running jobs multiple machines (for … Continue reading...

Read more »

A gentle introduction to parallel computing in R

January 19, 2016
By
A gentle introduction to parallel computing in R

by John Mount Ph.D. Data Scientist at Win-Vector LLC Let's talk about the use and benefits of parallel computation in R. IBM's Blue Gene/P massively parallel supercomputer (Wikipedia). Parallel computing is a type of computation in which many calculations are carried out simultaneously." Wikipedia quoting: Gottlieb, Allan; Almasi, George S. (1989). Highly parallel computing The reason we care is:...

Read more »

A gentle introduction to parallel computing in R

January 18, 2016
By
A gentle introduction to parallel computing in R

Let’s talk about the use and benefits of parallel computation in R. IBM’s Blue Gene/P massively parallel supercomputer (Wikipedia). Parallel computing is a type of computation in which many calculations are carried out simultaneously.” Wikipedia quoting: Gottlieb, Allan; Almasi, George S. (1989). Highly parallel computing The reason we care is: by making the computer work … Continue reading...

Read more »

Microsoft R Server available free to students with DreamSpark

January 12, 2016
By
Microsoft R Server available free to students with DreamSpark

by Joseph Rickert Over the last 6 years, thousands of students and faculty have downloaded Revolution R Enterprise (RRE) from Revolution Analytics for free, making it possible for them to do statistical modeling on large data sets with the same R language used by savvy statisticians and data scientists in business and industry. In addition to this individual scholar...

Read more »

Revolution R renamed Microsoft R, available free to developers and students

January 12, 2016
By
Revolution R renamed Microsoft R, available free to developers and students

In the nine months since Microsoft acquired Revolution Analytics, there have been a steady stream of updates to Revolution R Open and Revolution R Enterprise (not to mention integration of R with SQL Server, PowerBI, Azure and Cortana Analytics). Now, we have yet more updates to announce along with fresh new names. Revolution R Open is now Microsoft R...

Read more »

Online R courses at Udemy – for only $10 (“New Year Deal”) until Nov 11th

January 8, 2016
By

tl;dr: $10 new year deal at Udemy – until the Jan 11 2016. For the next 9 days (until 2016-01-11), Udemy is offering readers of R-bloggers access to its global online learning marketplace with a (special) $10 (up to 97% off) deal on hundreds of their courses (including many R-Programming, data science, machine learning etc.) Click here to browse ALL (R and non-R) courses...

Read more »

Set up Sublime Text for light-weight all-in-one data science IDE

December 23, 2015
By
Set up Sublime Text for light-weight all-in-one data science IDE

tl;dr Sublime Text is a powerful text editor. Here I introduce how to add custom REPL config for remote/local R, Python, Scala, Spark, Hive, you name it (this is only tested for OS X). The example below interprets local Python (top), R (middle) and Hive (bottom) code on remote. IDE for everything Good IDEs are everywhere. RStudio for R, Pycharm for...

Read more »

All I want for Christmas is you big data analytics!

December 21, 2015
By
All I want for Christmas is you big data analytics!

By Hannah Evans Sound familiar? All businesses have data. But whether it is used to drive business value is another question entirely. Traditionally, technical analysts have made decisions about data technology, without truly understanding the business challenges beforehand, meaning that … Continue reading →

Read more »

Sponsors

Mango solutions



RStudio homepage



Zero Inflated Models and Generalized Linear Mixed Models with R

Dommino data lab

Quantide: statistical consulting and training



http://www.eoda.de







ODSC

ODSC

CRC R books series





Six Sigma Online Training





Contact us if you wish to help support R-bloggers, and place your banner here.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)