545 search results for "Hadoop"

Using Microsoft R Server on a single machine for experiments with 600 million taxi rides.

June 14, 2016
By
Using Microsoft R Server on a single machine for experiments with 600 million taxi rides.

by Dmitry Pechyoni, Microsoft Data Scientist The New York City taxi dataset is one of the largest publicly available datasets. It has about 1.1 billion taxi rides in New York City. Previously this dataset was explored and visualized in a number of blog posts, where the authors used various technologies (e.g., PostgreSQL and Apache Elastic Search). Moreoever, in a...

Read more »

R holds top ranking in KDnuggets software poll

June 13, 2016
By
R holds top ranking in KDnuggets software poll

The open-source R language is the most frequently used analytics / data science software, selected by 49% of the 2895 voters of the 2016 KDNuggets Software Poll. (R was also the top selection in last year's poll.) Python was a close second at 45.8%, and SQL was third at 35.5%. (Respondents could select multiple tools in the poll, and...

Read more »

R Passes SAS in Scholarly Use (finally)

R Passes SAS in Scholarly Use (finally)

Way back in 2012 I published a forecast that showed that the use of R for scholarly publications would likely pass the use of SAS in 2015. But I didn’t believe the forecast since I expected the sharp decline in SAS … Continue reading →

Read more »

Adobe Analytics Clickstream Data Feed: Calculations and Outlier Analysis

May 24, 2016
By

In a previous post, I outlined how to load daily Adobe Analytics Clickstream data feeds into a PostgreSQL database. While this isn’t a long-term scalable solution for large e-commerce companies doing millions of page views per day, for exploratory analysis a relational database structure can work well until a more robust solution is put into

Read more »

Spark 2.0: more performance, more statistical models

May 18, 2016
By
Spark 2.0: more performance, more statistical models

Apache Spark, the open-source cluster computing framework, will soon see a major update with the upcoming release of Spark 2.0. This update promises to be faster than Spark 1.6, thanks to a run-time compiler that generates optimized bytecode. It also promises to be easier for developers to use, with streamlined APIs and a more complete SQL implementation. (Here's a...

Read more »

Online R courses at Udemy – 30% promo code ($14-$35 per course)

May 16, 2016
By
Online R courses at Udemy – 30% promo code ($14-$35 per course)

Udemy is offering readers of R-bloggers access to its global online learning marketplace with a (special) 30% off promo code (price range of $14-$35 per course). This deal is for hundreds of their courses (including many R-Programming, data science, machine learning etc.) use the code RBLOGGERS30 for an extra 30% discount Click here to browse ALL (R and non-R) courses Advanced R courses:  The...

Read more »

Documentation for Microsoft R Server now online

May 16, 2016
By
Documentation for Microsoft R Server now online

If you've been thinking about trying the big-data capabilities of Microsoft R Server but wanted to check out the documentation first, you're in luck: the complete Microsoft R Server documentation is now available on MSDN (and is accessible to anyone). There's lots to explore here, but a few highlights you might want to check out include: Getting Started with...

Read more »

R 3.3.0 is another motivation for Docker

May 12, 2016
By

Have you ever encountered R packages versioning issues when one application required different dependent packages versions than other? Have you ever got stuck with your project because of wrong pre-installed software versions on machine on which you should run your code? Or maybe you had heavy adventures with installing R software on a new machine because...

Read more »

Bike Rental Demand Estimation with Microsoft R Server

May 10, 2016
By
Bike Rental Demand Estimation with Microsoft R Server

by Katherine Zhao, Hong Lu, Zhongmou Li, Data Scientists at Microsoft Bicycle rental has become popular as a convenient and environmentally friendly transportation option. Accurate estimation of bike demand at different locations and different times would help bicycle-sharing systems better meet rental demand and allocate bikes to locations. In this blog post, we walk through how to use Microsoft...

Read more »

In case you missed it: April 2016 roundup

May 9, 2016
By

In case you missed them, here are some articles from April of particular interest to R users. Lukasz Piwek recreates classic graphs from Tufte's 'The Visual Display of Quantitative Information' in R. A preview of upcoming R conferences in Europe. Andrie de Vries updates the data on R package growth on CRAN, and finds a segmented regression model with...

Read more »

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)