501 search results for "hadoop"

Discount promo code for R courses in Statistics.com

February 15, 2016
By

Statistics.com is an online learning website with 100+ courses in statistics, analytics, data mining, text mining, forecasting, social network analysis, spatial analysis, etc. They have kindly agreed to offer R-Bloggers readers a reduced rate of $399 for any of their 23 courses in R, Python, SQL or SAS (a saving of $150-$200).  These are high-impact courses, each 4-weeks long (normally costing...

Read more »

Querying Big Data SQL tables with Oracle R Enterprise

February 15, 2016
By
Querying Big Data SQL tables with Oracle R Enterprise

I was wondering recently if I could use Oracle R Enterprise (ORE) to query Big Data SQL tables (i.e. Oracle Database external tables based on HDFS or Hive data), since I have never seen such a combination mentioned in the relevant Oracle documentation and white papers. I am happy to announce that the answer is an unconditional yes. In...

Read more »

Using Microsoft R Server to Address Scalability Issues in R

February 15, 2016
By

If you missed the recent webinar presented by Derek Norton, Using Microsoft R Server to Address Scalability Issues in R, you can now catch up with the replay below. In the webinar, Derek compares Microsoft R Open and Microsoft R Server, and demonstrates using Microsoft R Server to model a 40-million-row data file using logistic regression. (The demo begins...

Read more »

Databases in containers

February 8, 2016
By
Databases in containers

A great number of readers reacted very positively to Nina Zumel‘s article Using PostgreSQL in R: A quick how-to. Part of the reason is she described an incredibly powerful data science pattern: using a formerly expensive permanent system infrastructure as a simple transient tool. In her case the tools were the data manipulation grammars SQL … Continue reading...

Read more »

Plotting Lots of Data with rbokeh

February 4, 2016
By

A common issue when dealing with more than a few thousand data points is how to effectively make scatterplots. There is a lot of research on this topic that I won’t go into in any detail, but in this post I’ll just point out a few features that co...

Read more »

A Million Text Files And A Single Laptop

January 28, 2016
By
A Million Text Files And A Single Laptop

More often that I would like, I receive datasets where the data has only been partially cleaned, such as the picture on the right: hundreds, thousands…even millions of tiny files. Usually when this happens, the data all have the same format (such as having being generated by sensors or other memory-constrained devices). The problem with data

Read more »

Some Comments on Donaho’s “50 Years of Data Science”

January 23, 2016
By
Some Comments on Donaho’s “50 Years of Data Science”

An old friend recently called my attention to a thoughtful essay by Stanford statistics professor David Donaho, titled “50 Years of Data Science.” Given the keen interest these days in data science, the essay is quite timely. The work clearly shows that Donaho is not only a grandmaster theoretician, but also a statistical philosopher. The … Continue reading...

Read more »

Running R jobs quickly on many machines

January 22, 2016
By
Running R jobs quickly on many machines

As we demonstrated in “A gentle introduction to parallel computing in R” one of the great things about R is how easy it is to take advantage of parallel processing capabilities to speed up calculation. In this note we will show how to move from running jobs multiple CPUs/cores to running jobs multiple machines (for … Continue reading...

Read more »

A gentle introduction to parallel computing in R

January 19, 2016
By
A gentle introduction to parallel computing in R

by John Mount Ph.D. Data Scientist at Win-Vector LLC Let's talk about the use and benefits of parallel computation in R. IBM's Blue Gene/P massively parallel supercomputer (Wikipedia). Parallel computing is a type of computation in which many calculations are carried out simultaneously." Wikipedia quoting: Gottlieb, Allan; Almasi, George S. (1989). Highly parallel computing The reason we care is:...

Read more »

Microsoft R Server available free to students with DreamSpark

January 12, 2016
By
Microsoft R Server available free to students with DreamSpark

by Joseph Rickert Over the last 6 years, thousands of students and faculty have downloaded Revolution R Enterprise (RRE) from Revolution Analytics for free, making it possible for them to do statistical modeling on large data sets with the same R language used by savvy statisticians and data scientists in business and industry. In addition to this individual scholar...

Read more »

Sponsors

Mango solutions



RStudio homepage



Zero Inflated Models and Generalized Linear Mixed Models with R

Quantide: statistical consulting and training

datasociety

http://www.eoda.de





ODSC

ODSC

CRC R books series





Six Sigma Online Training









Contact us if you wish to help support R-bloggers, and place your banner here.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)