500 search results for "hadoop"

My Experience at Hadoop Summit 2010 #hadoopsummit

June 30, 2010
By
My Experience at Hadoop Summit 2010 #hadoopsummit

This week I had the opportunity the trek up north to Silicon Valley to attend Yahoo’s Hadoop Summit 2010. I love Silicon Valley. The few times I’ve been there the weather was perfect (often warmer than LA), little to no traffic, no road rage and people overall seem friendly and happy. Not to mention there are so many trees...

Read more »

You can Hadoop it! It’s elastic! Boogie woogie woog-ie!

February 16, 2010
By
You can Hadoop it! It’s elastic! Boogie woogie woog-ie!

I just came back from the future and let me be the first to tell you this: Learn some Chinese. And more than just cào nǐ niáng (肏你娘) which your friend in grad school told you means “Live happy with many blessings”. Trust me, I’ve been hanging with Madam Wu and she told me

Read more »

Streaming Hadoop Data Into R Scripts

March 23, 2009
By
Streaming Hadoop Data Into R Scripts

Along the lines of Mongo Measurement Requires Mongo Management, the HadoopStreaming package on CRAN provides utilities for applying R scripts to Hadoop streaming. Hadoop is used on Amazon's EC2.

Read more »

Microsoft Analytics in 2016

June 23, 2016
By
Microsoft Analytics in 2016

If you had asked me two years ago if Microsoft was a serious vendor for data science and analytics infrastructure and tools, I would have laughed. At the time their offering seemed to me to consist of Excel against SQL Server. There is nothing really wrong (or exciting) about SQL Server, but friends don’t let friends use Excel for...

Read more »

Using Microsoft R Server on a single machine for experiments with 600 million taxi rides.

June 14, 2016
By
Using Microsoft R Server on a single machine for experiments with 600 million taxi rides.

by Dmitry Pechyoni, Microsoft Data Scientist The New York City taxi dataset is one of the largest publicly available datasets. It has about 1.1 billion taxi rides in New York City. Previously this dataset was explored and visualized in a number of blog posts, where the authors used various technologies (e.g., PostgreSQL and Apache Elastic Search). Moreoever, in a...

Read more »

R holds top ranking in KDnuggets software poll

June 13, 2016
By
R holds top ranking in KDnuggets software poll

The open-source R language is the most frequently used analytics / data science software, selected by 49% of the 2895 voters of the 2016 KDNuggets Software Poll. (R was also the top selection in last year's poll.) Python was a close second at 45.8%, and SQL was third at 35.5%. (Respondents could select multiple tools in the poll, and...

Read more »

R Passes SAS in Scholarly Use (finally)

R Passes SAS in Scholarly Use (finally)

Way back in 2012 I published a forecast that showed that the use of R for scholarly publications would likely pass the use of SAS in 2015. But I didn’t believe the forecast since I expected the sharp decline in SAS … Continue reading →

Read more »

Adobe Analytics Clickstream Data Feed: Calculations and Outlier Analysis

May 24, 2016
By

In a previous post, I outlined how to load daily Adobe Analytics Clickstream data feeds into a PostgreSQL database. While this isn’t a long-term scalable solution for large e-commerce companies doing millions of page views per day, for exploratory analysis a relational database structure can work well until a more robust solution is put into

Read more »

Spark 2.0: more performance, more statistical models

May 18, 2016
By
Spark 2.0: more performance, more statistical models

Apache Spark, the open-source cluster computing framework, will soon see a major update with the upcoming release of Spark 2.0. This update promises to be faster than Spark 1.6, thanks to a run-time compiler that generates optimized bytecode. It also promises to be easier for developers to use, with streamlined APIs and a more complete SQL implementation. (Here's a...

Read more »

Online R courses at Udemy – 30% promo code ($14-$35 per course)

May 16, 2016
By
Online R courses at Udemy – 30% promo code ($14-$35 per course)

Udemy is offering readers of R-bloggers access to its global online learning marketplace with a (special) 30% off promo code (price range of $14-$35 per course). This deal is for hundreds of their courses (including many R-Programming, data science, machine learning etc.) use the code RBLOGGERS30 for an extra 30% discount Click here to browse ALL (R and non-R) courses Advanced R courses:  The...

Read more »

Sponsors

Mango solutions



RStudio homepage



Zero Inflated Models and Generalized Linear Mixed Models with R

Quantide: statistical consulting and training



http://www.eoda.de







ODSC

ODSC

CRC R books series











Contact us if you wish to help support R-bloggers, and place your banner here.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)