508 search results for "hadoop"

Extending sparklyr to Compute Cost for K-means on YARN Cluster with Spark ML Library

August 24, 2016
By
Extending sparklyr to Compute Cost for K-means on YARN Cluster with Spark ML Library

Machine and statistical learning wizards are becoming more eager to perform analysis with Spark ML library if this is only possible. It’s trendy, posh, spicy and gives the feeling of doing state of the art machine learning and being up to date with ...

Read more »

Building a Data Science Platform for R&D, Part 3 – R, R Studio Server, SparkR & Sparklyr

August 22, 2016
By
Building a Data Science Platform for R&D, Part 3 – R, R Studio Server, SparkR & Sparklyr

Part 1 and Part 2 of this series dealt with setting up AWS, loading data into S3, deploying a Spark cluster and using it to access our data. In this part we will deploy R and R Studio Server to … Continue reading →

Read more »

Five Questions about Data Science

August 21, 2016
By
Five Questions about Data Science

From Safari Books Online (https://www.safaribooksonline.com/blog/2016/02/10/data-science-qa/) ---Recently, we were able to ask five questions of Murtaza Haider, about the new book from IBM Press called “Getting Started with Data Science: Making Sense of Data with Analytics.” Below, the author talks about the benefits of data science in today’s professional world.Getting Started with Data ScienceRead more »

Tuning Apache Spark for faster analysis with Microsoft R Server

August 12, 2016
By
Tuning Apache Spark for faster analysis with Microsoft R Server

My colleagues Max Kaznady, Jason Zhang, Arijit Tarafdar and Miguel Fierro recently posted a really useful guide with lots of tips to speed up prototyping models with Microsoft R Server on Apache Spark. These tips apply when using Spark on Azure HDInsight, where you can spin up a Spark cluster the cloud with Microsoft R installed on the head...

Read more »

So you want to be a data scientist

August 10, 2016
By
So you want to be a data scientist

From HuffingtonPostThe New York Times made it look so easy. Take a few courses in data science and a web-based startup will readily pay top dollars for your newly acquired skills.Since the McKinsey Global Institute reported on the impending shortage of data crunchers, the wanna be data scientists are searching for...

Read more »

Deep Learning Part 1: Comparison of Symbolic Deep Learning Frameworks

August 9, 2016
By

by Anusua Trivedi, Microsoft Data Scientist Background and Approach This blog series is based on my upcoming talk on re-usability of Deep Learning Models at the Hadoop+Strata World Conference in Singapore. This blog series will be in several parts – where I describe my experiences and go deep into the reasons behind my choices. Deep learning is an emerging...

Read more »

New cheat-sheet for the dplyrXdf package

August 8, 2016
By
New cheat-sheet for the dplyrXdf package

Hadley Wickham's dplyr package is an amazing tool for restructuring, filtering, and aggregating data sets using its elegant grammar of data manipulation. By default, it works on in-memory data frames, which means you're limited to the amount of data you can fit into R's memory. Hadley also provided an extension mechanism to make dplyr work with external data sources,...

Read more »

stacksurveyr: An R package with the 2016 Developer Survey Results

July 18, 2016
By
stacksurveyr: An R package with the 2016 Developer Survey Results

This year, more than fifty thousand programmers answered the Stack Overflow 2016 Developer Survey, in the largest survey of professional developers in history. Last week Stack Overflow released the full (anonymized) results of the survey at stackoverf...

Read more »

New Release of partools Package

July 17, 2016
By
New Release of partools Package

My new release of partools is now on CRAN. The package is aimed at doing parallel data science in what I call an “un-MapReduce” manner. It takes the point of view that MapReduce-based frameworks such as Hadoop and Spark are fine for the types of applications their designers had in mind, namely rather simple SQL … Continue...

Read more »

Notes from the Kölner R meeting, 9 July 2016

July 13, 2016
By
Notes from the Kölner R meeting, 9 July 2016

Last Thursday the Cologne R user group came together again. This time, our two speakers arrived from Bavaria, to talk about Spark and R Server.Introduction to Apache SparkDownload slidesDubravko Dulic gave an introduction to Apache Spark and why Spark might be of interest to data scientists using...

Read more »

Sponsors

Mango solutions



plotly webpage

dominolab webpage



Zero Inflated Models and Generalized Linear Mixed Models with R

Quantide: statistical consulting and training

datasociety

http://www.eoda.de





ODSC

ODSC

CRC R books series





Six Sigma Online Training









Contact us if you wish to help support R-bloggers, and place your banner here.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)