406 search results for "hadoop"

useR 2015: Computational

July 1, 2015
By
useR 2015: Computational

These are my initial notes from useR 2015. I will/may revise when I have time. Computational Performance; Chair: Dirk Eddelbuettel Running R+Hadoop using Docker Containers (E. James Harner) Introduction Big data architectures: HDFS/Hadoop: software framework for distributed storage and distributed processing Tachyon/Spark: uses in-memory Rc2 server (R cloud computing) Has an editor & output panel.

Read more »

Exploring SparkR

Exploring SparkR

A colleague from work, asked me to investigate about Spark and R. So the most obvious thing to was to investigate about SparkR -;)I installed Scala, Hadoop, Spark and SparkR...not sure Hadoop is needed for this...but I wanted to have the full picture -...

Read more »

SparkR: Distributed data frames with Spark and R

June 12, 2015
By

R is now integrated with Apache Spark, the open-source cluster computing framework. The Databricks blog announced this week that yesterday's release of Spark 1.4 would include SparkR, "an R package that allows data scientists to analyze large datasets and interactively run jobs on them from the R shell". The SparkR 1.4 announcement led with the news: Spark 1.4 introduces...

Read more »

Estimating Analytics Software Market Share by Counting Books

June 9, 2015
By
Estimating Analytics Software Market Share by Counting Books

Below is the latest update to The Popularity of Data Analysis Software. Books The number of books published on each software package or language reflects its relative popularity. Amazon.com offers an advanced search method which works well for all the software except R … Continue reading →

Read more »

R in a 64 bit world

June 8, 2015
By
R in a 64 bit world

32 bit data structures (pointers, integer representations, single precision floating point) have been past their “best before date” for quite some time. R itself moved to a 64 bit memory model some time ago, but still has only 32 bit integers. This is going to get more and more awkward going forward. What is R … Continue reading...

Read more »

Any R code as a cloud service: R demonstration at BUILD

June 5, 2015
By
Any R code as a cloud service: R demonstration at BUILD

At last month's BUILD conference for Microsoft developers in San Francisco, R was front-and-center on the keynote stage. In the keynote, Microsoft CVP Joseph Sirosh introduced the "language of data": open source R. Sirosh encouraged the audience to learn R, saying "if there is a single language that you choose to learn today .. let it be R". The...

Read more »

Update on Snowdoop, a MapReduce Alternative

May 29, 2015
By
Update on Snowdoop, a MapReduce Alternative

In blog posts a few months ago, I proposed an alternative to MapReduce, e.g. to Hadoop, which I called “Snowdoop.” I pointed out that systems like Hadoop and Spark are very difficult to install and configure, are either too primitive (Hadoop)  or too abstract (Spark) to program, and above all, are SLOW. Spark is of … Continue reading...

Read more »

SparkR preview by Vincent Warmerdam

May 28, 2015
By
SparkR preview by Vincent Warmerdam

SparkR preview in Rstudio Apache Spark is the hip new technology on the block. It allows you to write scripts in a functional style and the technology behind it will allow you to run iterative tasks very quickly on a cluster of machines. It’s benchmarked to be quicker than hadoop for most machine learning use

Read more »

RevoScaleR’s Naive Bayes Classifier rxNaiveBayes()

May 28, 2015
By
RevoScaleR’s Naive Bayes Classifier rxNaiveBayes()

by Joseph Rickert, Because of its simplicity and good performance over a wide spectrum of classification problems the Naïve Bayes classifier ought to be on everyone's short list of machine learning algorithms. Now, with version 7.4 we have a high performance Naïve Bayes classifier in Revolution R Enterprise too. Like all Parallel External Memory Algorithms (PEMAs) in the RevoScaleR...

Read more »

R tops 2015 KDnuggets Software Poll

May 27, 2015
By
R tops 2015 KDnuggets Software Poll

R is the leading choice for Predictive Analytics / Data Mining / Data Science software according to the results of the 2015 KDnuggets Software Poll, now in its 16th year. Each of the 28,000 participants selected one or more tools they had used in the last year from a list of 93 options, and R was selected by 46.9%...

Read more »