487 search results for "hadoop"

Data Science Virtual Machine updated with Microsoft R Server

March 2, 2016
By
Data Science Virtual Machine updated with Microsoft R Server

Microsoft has updated the Data Science Virtual Machine, a data science toolkit-in-a-box that you can easily spin up on the Microsoft Azure cloud service. The virtual machine now comes pre-configured with Microsoft R Server Developer Edition (upgraded from Microsoft R Open), Anaconda Python, Jupyter notebooks for Python and R, Visual Studio Community Edition, Power BI desktop, and SQL Server...

Read more »

Nairobi Data Science Meet Up:Finding deep structures in data with Chris Orwa

February 22, 2016
By

I sat down with former rugby school captain whose rugby career was cut short by a shoulder injury while playing for Black Blad at Kenyatta University. It is always a great pleasure to talk to someone who is extremely passionate about what he does and his passion for Data Science was evident during my chat with “BlackOrwa” at iHub...

Read more »

Read from hdfs with R. Brief overview of SparkR.

February 19, 2016
By

Disclaimer: originally I planned to write post about R functions/packages which allow to read data from hdfs (with benchmarks), but in the end it became more like an overview of SparkR capabilities. Nowadays working with “big data” almost always means working with hadoop ecosystem. A few years ago this also meant that you also would have to be a good...

Read more »

Read from hdfs with R. Brief overview of SparkR.

February 19, 2016
By

Disclaimer: originally I planned to write post about R functions/packages which allow to read data from hdfs (with benchmarks), but in the end it became more like an overview of SparkR capabilities. Nowadays working with “big data” almost always means working with hadoop ecosystem. A few years ago this also meant that you also would have to be a good...

Read more »

Discount promo code for R courses in Statistics.com

February 15, 2016
By

Statistics.com is an online learning website with 100+ courses in statistics, analytics, data mining, text mining, forecasting, social network analysis, spatial analysis, etc. They have kindly agreed to offer R-Bloggers readers a reduced rate of $399 for any of their 23 courses in R, Python, SQL or SAS (a saving of $150-$200).  These are high-impact courses, each 4-weeks long (normally costing...

Read more »

Using Microsoft R Server to Address Scalability Issues in R

February 15, 2016
By

If you missed the recent webinar presented by Derek Norton, Using Microsoft R Server to Address Scalability Issues in R, you can now catch up with the replay below. In the webinar, Derek compares Microsoft R Open and Microsoft R Server, and demonstrates using Microsoft R Server to model a 40-million-row data file using logistic regression. (The demo begins...

Read more »

Databases in containers

February 8, 2016
By
Databases in containers

A great number of readers reacted very positively to Nina Zumel‘s article Using PostgreSQL in R: A quick how-to. Part of the reason is she described an incredibly powerful data science pattern: using a formerly expensive permanent system infrastructure as a simple transient tool. In her case the tools were the data manipulation grammars SQL … Continue reading...

Read more »

A Million Text Files And A Single Laptop

January 28, 2016
By
A Million Text Files And A Single Laptop

More often that I would like, I receive datasets where the data has only been partially cleaned, such as the picture on the right: hundreds, thousands…even millions of tiny files. Usually when this happens, the data all have the same format (such as having being generated by sensors or other memory-constrained devices). The problem with data

Read more »

Some Comments on Donaho’s “50 Years of Data Science”

January 23, 2016
By
Some Comments on Donaho’s “50 Years of Data Science”

An old friend recently called my attention to a thoughtful essay by Stanford statistics professor David Donaho, titled “50 Years of Data Science.” Given the keen interest these days in data science, the essay is quite timely. The work clearly shows that Donaho is not only a grandmaster theoretician, but also a statistical philosopher. The … Continue reading...

Read more »

Running R jobs quickly on many machines

January 22, 2016
By
Running R jobs quickly on many machines

As we demonstrated in “A gentle introduction to parallel computing in R” one of the great things about R is how easy it is to take advantage of parallel processing capabilities to speed up calculation. In this note we will show how to move from running jobs multiple CPUs/cores to running jobs multiple machines (for … Continue reading...

Read more »

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)