Blog Archives

Compare outlier detection methods with the OutliersO3 package

March 8, 2018
By
Compare outlier detection methods with the OutliersO3 package

by Antony Unwin, University of Augsburg, Germany There are many different methods for identifying outliers and a lot of them are available in R. But are outliers a matter of opinion? Do all methods give the same results? Articles on outlier methods use a mixture of theory and practice. Theory is all very well, but outliers are outliers because...

Read more »

DataExplorer: Fast Data Exploration With Minimum Code

February 8, 2018
By
DataExplorer: Fast Data Exploration With Minimum Code

by Boxuan Cui, Data Scientist at Smarter Travel Once upon a time, there was a joke: In Data Science, 80% of time spent prepare data, 20% of time spent complain about need for prepare data. — Big Data Borat (@BigDataBorat) February 27, 2013 According to a Forbes article, cleaning and organizing data is the most time-consuming and least enjoyable...

Read more »

An introduction to seplyr

December 14, 2017
By
An introduction to seplyr

by John Mount, Win-Vector LLC (https://winvector.github.io/seplyr/) is an (https://www.r-project.org) package that supplies improved standard evaluation interfaces for many common data wrangling tasks. The core of `seplyr` is a re-skinning of (https://CRAN.R-project.org/package=dplyr)'s functionality to `seplyr` conventions (similar to how (https://CRAN.R-project.org/package=stringr) re-skins the implementing package (https://CRAN.R-project.org/package=stringi)). ## Standard Evaluation and Non-Standard Evaluation "Standard evaluation" is the name we are using for...

Read more »

How to make Python easier for the R user: revoscalepy

November 28, 2017
By

by Siddarth Ramesh, Data Scientist, Microsoft I’m an R programmer. To me, R has been great for data exploration, transformation, statistical modeling, and visualizations. However, there is a huge community of Data Scientists and Analysts who turn to Python for these tasks. Moreover, both R and Python experts exist in most analytics organizations, and it is important for both...

Read more »

Scale up your parallel R workloads with containers and doAzureParallel

November 21, 2017
By
Scale up your parallel R workloads with containers and doAzureParallel

by JS Tan (Program Manager, Microsoft) The R language is by and far the most popular statistical language, and has seen massive adoption in both academia and industry. In our new data-centric economy, the models and algorithms that data scientists build in R are not just being used for research and experimentation. They are now also being deployed into...

Read more »

Recap: EARL Boston 2017

November 9, 2017
By

By Emmanuel Awa, Francesca Lazzeri and Jaya Mathew, data scientists at Microsoft A few of us got to attend EARL conference in Boston last week which brought together a group of talented users of R from academia and industry. The conference highlighted various Enterprise Applications of R. Despite being a small conference, the quality of the talks were great...

Read more »

Role Playing with Probabilities: The Importance of Distributions

November 2, 2017
By
Role Playing with Probabilities: The Importance of Distributions

by Jocelyn Barker, Data Scientist at Microsoft I have a confession to make. I am not just a statistics nerd; I am also a role-playing games geek. I have been playing Dungeons and Dragons (DnD) and its variants since high school. While playing with my friends the other day it occurred to me, DnD may have some lessons to...

Read more »

Estimating mean variance and mean absolute bias of a regression tree by bootstrapping using foreach and rpart packages

October 26, 2017
By

by Błażej Moska, computer science student and data science intern One of the most important thing in predictive modelling is how our algorithm will cope with various datasets, both training and testing (previously unseen). This is strictly connected with the concept of bias-variance tradeoff. Roughly speaking, variance of an estimator describes, how do estimator value ranges from dataset to...

Read more »

Calculating a fuzzy kmeans membership matrix with R and Rcpp

August 24, 2017
By

by Błażej Moska, computer science student and data science intern Suppose that we have performed clustering K-means clustering in R and are satisfied with our results, but later we realize that it would also be useful to have a membership matrix. Of course it would be easier to repeat clustering using one of the fuzzy kmeans functions available in...

Read more »

Tutorial: Deep Learning with R on Azure with Keras and CNTK

August 9, 2017
By
Tutorial: Deep Learning with R on Azure with Keras and CNTK

by Le Zhang (Data Scientist, Microsoft) and Graham Williams (Director of Data Science, Microsoft) Microsoft's Cognitive Toolkit (better known as CNTK) is a commercial-grade and open-source framework for deep learning tasks. At present CNTK does not have a native R interface but can be accessed through Keras, a high-level API which wraps various deep learning backends including CNTK, TensorFlow,...

Read more »

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)