515 search results for "hadoop"

In case you missed it: June 2016 roundup

July 8, 2016
By

In case you missed them, here are some articles from June of particular interest to R users. A preview of the tutorials presented at the useR! 2016 conference. A "advanced beginner's" guide to R published by ComputerWorld includes guides on data wrangling, visualization, and data APIs. Microsoft R Server now runs on Apache Spark, bringing high performance to big-data...

Read more »

Euro 2016 analytics: Who’s playing the toughest game?

July 1, 2016
By
Euro 2016 analytics: Who’s playing the toughest game?

I am really enjoying Uefa Euro 2016 Footbal Competition, even because our national team has done pretty well so far. That’s why after  browsing for a while statistics section of official EURO 2016 website I decided to do some analysis on the data they share Just to be clear from the beginning: we are not talking Related Post

Read more »

Microsoft Analytics in 2016

June 23, 2016
By
Microsoft Analytics in 2016

If you had asked me two years ago if Microsoft was a serious vendor for data science and analytics infrastructure and tools, I would have laughed. At the time their offering seemed to me to consist of Excel against SQL Server. There is nothing really wrong (or exciting) about SQL Server, but friends don’t let friends use Excel for...

Read more »

Using Microsoft R Server on a single machine for experiments with 600 million taxi rides.

June 14, 2016
By
Using Microsoft R Server on a single machine for experiments with 600 million taxi rides.

by Dmitry Pechyoni, Microsoft Data Scientist The New York City taxi dataset is one of the largest publicly available datasets. It has about 1.1 billion taxi rides in New York City. Previously this dataset was explored and visualized in a number of blog posts, where the authors used various technologies (e.g., PostgreSQL and Apache Elastic Search). Moreoever, in a...

Read more »

R holds top ranking in KDnuggets software poll

June 13, 2016
By
R holds top ranking in KDnuggets software poll

The open-source R language is the most frequently used analytics / data science software, selected by 49% of the 2895 voters of the 2016 KDNuggets Software Poll. (R was also the top selection in last year's poll.) Python was a close second at 45.8%, and SQL was third at 35.5%. (Respondents could select multiple tools in the poll, and...

Read more »

R Passes SAS in Scholarly Use (finally)

R Passes SAS in Scholarly Use (finally)

Way back in 2012 I published a forecast that showed that the use of R for scholarly publications would likely pass the use of SAS in 2015. But I didn’t believe the forecast since I expected the sharp decline in SAS … Continue reading →

Read more »

Adobe Analytics Clickstream Data Feed: Calculations and Outlier Analysis

May 24, 2016
By

In a previous post, I outlined how to load daily Adobe Analytics Clickstream data feeds into a PostgreSQL database. While this isn’t a long-term scalable solution for large e-commerce companies doing millions of page views per day, for exploratory analysis a relational database structure can work well until a more robust solution is put into

Read more »

Spark 2.0: more performance, more statistical models

May 18, 2016
By
Spark 2.0: more performance, more statistical models

Apache Spark, the open-source cluster computing framework, will soon see a major update with the upcoming release of Spark 2.0. This update promises to be faster than Spark 1.6, thanks to a run-time compiler that generates optimized bytecode. It also promises to be easier for developers to use, with streamlined APIs and a more complete SQL implementation. (Here's a...

Read more »

Online R courses at Udemy – 30% promo code ($14-$35 per course)

May 16, 2016
By
Online R courses at Udemy – 30% promo code ($14-$35 per course)

Udemy is offering readers of R-bloggers access to its global online learning marketplace with a (special) 30% off promo code (price range of $14-$35 per course). This deal is for hundreds of their courses (including many R-Programming, data science, machine learning etc.) use the code RBLOGGERS30 for an extra 30% discount Click here to browse ALL (R and non-R) courses Advanced R courses:  The...

Read more »

Documentation for Microsoft R Server now online

May 16, 2016
By
Documentation for Microsoft R Server now online

If you've been thinking about trying the big-data capabilities of Microsoft R Server but wanted to check out the documentation first, you're in luck: the complete Microsoft R Server documentation is now available on MSDN (and is accessible to anyone). There's lots to explore here, but a few highlights you might want to check out include: Getting Started with...

Read more »

Sponsors

Mango solutions



RStudio homepage



Zero Inflated Models and Generalized Linear Mixed Models with R

Dommino data lab

Quantide: statistical consulting and training



http://www.eoda.de







ODSC

ODSC

CRC R books series





Six Sigma Online Training





Contact us if you wish to help support R-bloggers, and place your banner here.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)