534 search results for "hadoop"

sparklyr 0.5

January 24, 2017
By
sparklyr 0.5

We’re happy to announce that version 0.5 of the sparklyr package is now available on CRAN. The new version comes with many improvements over the first release, including: Extended dplyr support by implementing: do() and n_distinct(). New functions including sdf_quantile(), ft_tokenizer() and ft_regex_tokenizer(). Improved compatibility, sparklyr now respects the value of the ‘na.action’ R option and dim(), nrow() and ncol(). Experimental

Read more »

Create Parquet Files From R Data Frames With sergeant & Apache Drill (a.k.a. Make Parquet Files Great Again in R)

January 22, 2017
By

Apache Drill is a nice tool to have in the toolbox as it provides a SQL front-end to a wide array of database and file back-ends and runs in standalone/embedded mode on every modern operating system (i.e. you can get started with or play locally with Drill w/o needing a Hadoop cluster but scale up... Continue reading...

Read more »

Microsoft R Server tips from the Tiger Team

January 13, 2017
By

The Microsoft R Server Tiger Team assists customers around the world to implement large-scale analyytic solutions. Along the way, they discover useful tips and best practices, and share them on the Tiger Team blog. Here are a few recent tips from the Tiger Team on using Microsoft R Server: Gather metadata and exlore numeric summaries of large data sets...

Read more »

Last day – Online R courses at Udemy for only $10 – last day! (until Jan 10th)

January 10, 2017
By
udemy-november-coupon-2015

Udemy is offering readers of R-bloggers access to its global online learning marketplace for only $10 per course! This deal (offering over 50%-90% discount) is for hundreds of their courses – including many R-Programming, data science, machine learning etc. Click here to browse ALL (R and non-R) courses Advanced R courses:  The Comprehensive Programming in R Course (25 Hours of video) Graphs in R (ggplot2,...

Read more »

Technologies worth learning for data science

As a complement to my note on R as a data science language, this note lists ten other technologies that you might want to learn to use, or at least monitor, if you are interested in learning data science. Communication Git is a concurrent versioning...

Read more »

Sparse matrices, k-means clustering, topic modelling with posts on the 2004 US Presidential election

December 30, 2016
By
Sparse matrices, k-means clustering, topic modelling with posts on the 2004 US Presidential election

Daily Kos bags of words from the time of the 2004 Presidential election This is a bit of a rambly blog entry today. My original motivation was to just explore moving data around from R into the H2O machine learning software. While successful on this,...

Read more »

Did you say SQL Server? Yes I did….

December 23, 2016
By
Did you say SQL Server? Yes I did….

Introduction My last blog post in 2016 on SQL Server 2016….. Some years ago, I have heard predictions from ‘experts‘ that within a few years Hadoop / Spark systems would take over traditional RDBMS’s like SQL Server. I don’t think … Continue reading →

Read more »

DataCamp’s 2017 Conference Guide

December 22, 2016
By
DataCamp’s 2017 Conference Guide

2017 is bound to be an exciting year in Data Science. Here's DataCamp's list of conferences that we're most excited about in the new year. Whether you're an R user, a Python hacker, or just a general data science fan - you're sure to find a great confe...

Read more »

R for Enterprise: How to Scale Your Analytics Using R

December 21, 2016
By
R for Enterprise: How to Scale Your Analytics Using R

by Sean Lopp At RStudio, we work with many companies interested in scaling R. They typically want to know: How can R scale for big data or big computation? How can R scale for a growing team of data scientists? This post provides a framework for answering both questions. Scaling R for Big Data or

Read more »

Introducing the AzureSMR package: Manage Azure services from your R session

December 21, 2016
By

by Alan Weaver, Advanced Analytics Specialist at Microsoft Very often data scientists and analysts require access to back-end resources on Azure. For example, they may need to start a virtual machine or resize a Hadoop cluster. This typically requires making a request to the IT department and patiently waiting. AzureSMR is a simple R package that enables those users...

Read more »

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)