501 search results for "hadoop"

New cheat-sheet for the dplyrXdf package

August 8, 2016
By
New cheat-sheet for the dplyrXdf package

Hadley Wickham's dplyr package is an amazing tool for restructuring, filtering, and aggregating data sets using its elegant grammar of data manipulation. By default, it works on in-memory data frames, which means you're limited to the amount of data you can fit into R's memory. Hadley also provided an extension mechanism to make dplyr work with external data sources,...

Read more »

stacksurveyr: An R package with the 2016 Developer Survey Results

July 18, 2016
By
stacksurveyr: An R package with the 2016 Developer Survey Results

This year, more than fifty thousand programmers answered the Stack Overflow 2016 Developer Survey, in the largest survey of professional developers in history. Last week Stack Overflow released the full (anonymized) results of the survey at stackoverf...

Read more »

New Release of partools Package

July 17, 2016
By
New Release of partools Package

My new release of partools is now on CRAN. The package is aimed at doing parallel data science in what I call an “un-MapReduce” manner. It takes the point of view that MapReduce-based frameworks such as Hadoop and Spark are fine for the types of applications their designers had in mind, namely rather simple SQL … Continue...

Read more »

Notes from the Kölner R meeting, 9 July 2016

July 13, 2016
By
Notes from the Kölner R meeting, 9 July 2016

Last Thursday the Cologne R user group came together again. This time, our two speakers arrived from Bavaria, to talk about Spark and R Server.Introduction to Apache SparkDownload slidesDubravko Dulic gave an introduction to Apache Spark and why Spark might be of interest to data scientists using...

Read more »

Introducing the free Microsoft R Client

July 11, 2016
By

Over the years, we've shared several posts on using the ScaleR package to import, process, visualize and analyze large data sets with R. Until now, you needed to have access to a Microsoft R Server license to take advantage of the package. Now, you can use all of the capabilities of ScaleR free of charge with Microsoft R Client...

Read more »

In case you missed it: June 2016 roundup

July 8, 2016
By

In case you missed them, here are some articles from June of particular interest to R users. A preview of the tutorials presented at the useR! 2016 conference. A "advanced beginner's" guide to R published by ComputerWorld includes guides on data wrangling, visualization, and data APIs. Microsoft R Server now runs on Apache Spark, bringing high performance to big-data...

Read more »

Euro 2016 analytics: Who’s playing the toughest game?

July 1, 2016
By
Euro 2016 analytics: Who’s playing the toughest game?

I am really enjoying Uefa Euro 2016 Footbal Competition, even because our national team has done pretty well so far. That’s why after  browsing for a while statistics section of official EURO 2016 website I decided to do some analysis on the data they share Just to be clear from the beginning: we are not talking Related Post

Read more »

Microsoft Analytics in 2016

June 23, 2016
By
Microsoft Analytics in 2016

If you had asked me two years ago if Microsoft was a serious vendor for data science and analytics infrastructure and tools, I would have laughed. At the time their offering seemed to me to consist of Excel against SQL Server. There is nothing really wrong (or exciting) about SQL Server, but friends don’t let friends use Excel for...

Read more »

Using Microsoft R Server on a single machine for experiments with 600 million taxi rides.

June 14, 2016
By
Using Microsoft R Server on a single machine for experiments with 600 million taxi rides.

by Dmitry Pechyoni, Microsoft Data Scientist The New York City taxi dataset is one of the largest publicly available datasets. It has about 1.1 billion taxi rides in New York City. Previously this dataset was explored and visualized in a number of blog posts, where the authors used various technologies (e.g., PostgreSQL and Apache Elastic Search). Moreoever, in a...

Read more »

R holds top ranking in KDnuggets software poll

June 13, 2016
By
R holds top ranking in KDnuggets software poll

The open-source R language is the most frequently used analytics / data science software, selected by 49% of the 2895 voters of the 2016 KDNuggets Software Poll. (R was also the top selection in last year's poll.) Python was a close second at 45.8%, and SQL was third at 35.5%. (Respondents could select multiple tools in the poll, and...

Read more »

Sponsors

Mango solutions



RStudio homepage



Zero Inflated Models and Generalized Linear Mixed Models with R

Quantide: statistical consulting and training

datasociety

http://www.eoda.de





ODSC

ODSC

CRC R books series





Six Sigma Online Training









Contact us if you wish to help support R-bloggers, and place your banner here.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)