544 search results for "hadoop"

Learning Statistics on Youtube

September 19, 2016
By
youtube

Youtube.com is the second most accessed website in the world (surpassed only by its parent, google.com). It has a whopping 1 billion unique views a month. It is a force to be reckoned with. In the video sharing platform, there are many brilliant and hard-working content creators producing high-quality and free educational videos...

Read more »

A few thoughts on the existing code parallelization

September 17, 2016
By

A few weeks ago I worked on some old code parallelization. The whole process made me think about how efficient parallelization of the existing code in R can really be and what should be considered efficient. There is a lot … Continue reading →

Read more »

GoodReads: Exploratory data analysis and sentiment analysis (Part 2)

September 14, 2016
By
GoodReads: Exploratory data analysis and sentiment analysis (Part 2)

After scraping reviews from Goodreads in the first installment of this series, we are now ready to do some exploratory data analysis to get a better sense of the data we have. This will also allow us to create features that we will use in future analyses. Setup and data preparation We start by loading Related Post

Read more »

GoodReads: Webscraping and Text Analysis with R (Part 1)

September 8, 2016
By
GoodReads: Webscraping and Text Analysis with R (Part 1)

Inspired by this article about sentiment analysis and this guide to webscraping, I have decided to get my hands dirty by scraping and analyzing a sample of reviews on the website Goodreads. The goal of this project is to demonstrate a complete example, going from data collection to machine learning analysis, and to illustrate a Related Post

Read more »

Classification in Spark 2.0: “Input validation failed” and other wondrous tales

September 6, 2016
By

Spark 2.0 has been released since last July but, despite the numerous improvements and new features, several annoyances still remain and can cause headaches, especially in the Spark machine learning APIs. Today we’ll have a look at some of them, inspired by a recent answer of mine in a Stack Overflow question (the question was about Spark 1.6 but,...

Read more »

Extending sparklyr to Compute Cost for K-means on YARN Cluster with Spark ML Library

August 24, 2016
By
Extending sparklyr to Compute Cost for K-means on YARN Cluster with Spark ML Library

Machine and statistical learning wizards are becoming more eager to perform analysis with Spark ML library if this is only possible. It’s trendy, posh, spicy and gives the feeling of doing state of the art machine learning and being up to date with ...

Read more »

Building a Data Science Platform for R&D, Part 3 – R, R Studio Server, SparkR & Sparklyr

August 22, 2016
By
Building a Data Science Platform for R&D, Part 3 – R, R Studio Server, SparkR & Sparklyr

Part 1 and Part 2 of this series dealt with setting up AWS, loading data into S3, deploying a Spark cluster and using it to access our data. In this part we will deploy R and R Studio Server to … Continue reading →

Read more »

Five Questions about Data Science

August 21, 2016
By
Five Questions about Data Science

From Safari Books Online (https://www.safaribooksonline.com/blog/2016/02/10/data-science-qa/) ---Recently, we were able to ask five questions of Murtaza Haider, about the new book from IBM Press called “Getting Started with Data Science: Making Sense of Data with Analytics.” Below, the author talks about the benefits of data science in today’s professional world.Getting Started with Data ScienceRead more »

Tuning Apache Spark for faster analysis with Microsoft R Server

August 12, 2016
By
Tuning Apache Spark for faster analysis with Microsoft R Server

My colleagues Max Kaznady, Jason Zhang, Arijit Tarafdar and Miguel Fierro recently posted a really useful guide with lots of tips to speed up prototyping models with Microsoft R Server on Apache Spark. These tips apply when using Spark on Azure HDInsight, where you can spin up a Spark cluster the cloud with Microsoft R installed on the head...

Read more »

So you want to be a data scientist

August 10, 2016
By
So you want to be a data scientist

From HuffingtonPostThe New York Times made it look so easy. Take a few courses in data science and a web-based startup will readily pay top dollars for your newly acquired skills.Since the McKinsey Global Institute reported on the impending shortage of data crunchers, the wanna be data scientists are searching for...

Read more »

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)