eXtremely Boost your machine learning Exercises (Part-1)

September 24, 2017
By
eXtremely Boost your machine learning Exercises (Part-1)

eXtreme Gradient Boosting is a machine learning model which became really popular few years ago after winning several Kaggle competitions. It is very powerful algorithm that use an ensemble of weak learners to obtain a strong learner. Its R implementation is available in xgboost package and it is really worth including into anyone’s machine learning Related exercise sets: Model Evaluation...

Read more »

RcppGSL 0.3.3

September 24, 2017
By

A maintenance update RcppGSL 0.3.3 is now on CRAN. It switched the vignette to the our new pinp package and its two-column pdf default. The RcppGSL package provides an interface from R to the GNU GSL using the Rcpp package. No user-facing new code or...

Read more »

Postgresql + R Sandbox

September 23, 2017
By
Postgresql + R Sandbox

ElephantSQL ElephantSQL offers a free instance of Postgresql, with a limit of 20 MB and 5 concurrent connections. For example, you can upload a shiny application that depends on data from ElephantSQL. You only need to register to the site and automat...

Read more »

RcppCNPy 0.2.7

September 23, 2017
By

A new version of the RcppCNPy package arrived on CRAN yesterday. RcppCNPy provides R with read and write access to NumPy files thanks to the cnpy library by Carl Rogers. This version updates internals for function registration, but otherwise mostly s...

Read more »

RcppClassic 0.9.8

September 23, 2017
By

A bug-fix release RcppClassic 0.9.8 for the very recent 0.9.7 release which fixes a build issue on macOS introduced in 0.9.7. No other changes. Courtesy of CRANberries, there are changes relative to the previous release. Questions, comments etc shoul...

Read more »

Upcoming data preparation and modeling article series

September 23, 2017
By
Upcoming data preparation and modeling article series

I am pleased to announce that vtreat version 0.6.0 is now available to R users on CRAN. vtreat is an excellent way to prepare data for machine learning, statistical inference, and predictive analytic projects. If you are an R user we strongly suggest you incorporate vtreat into your projects. vtreat handles, in a statistically sound … Continue reading Upcoming...

Read more »

Hacking statistics or: How I Learned to Stop Worrying About Calculus and Love Stats Exercises (Part-9)

September 23, 2017
By
Hacking statistics or: How I Learned to Stop Worrying About Calculus and Love Stats Exercises (Part-9)

Statistics are often taught in school by and for people who like Mathematics. As a consequence, in those class emphasis is put on leaning equations, solving calculus problems and creating mathematics models instead of building an intuition for probabilistic problems. But, if you read this, you know a bit of R programming and have access Related exercise sets: Hacking statistics...

Read more »

How Random Forests improve simple Regression Trees?

September 22, 2017
By
How Random Forests improve simple Regression Trees?

By Gabriel Vasconcelos Regression Trees In this post I am going to discuss some features of Regression Trees an Random Forests. Regression Trees are know to be very unstable, in other words, a small change in your data may drastically … Continue reading →

Read more »

Welcome to R/exams

September 22, 2017
By
Welcome to R/exams

Welcome everybody, we are proud to introduce the brand new web page and blog http://www.R-exams.org/. This provides a central access point for the open-source software “exams” implemented in the R system for statistical computing. R/exams is a one-...

Read more »

Big Data Analytics with H20 in R Exercises -Part 1

September 22, 2017
By
Big Data Analytics with H20 in R Exercises -Part 1

We have dabbled with RevoScaleR before , In this exercise we will work with H2O , another high performance R library which can handle big data very effectively .It will be a series of exercises with increasing degree of difficulty . So Please do this in sequence . H2O requires you to have Java installed Related exercise sets: Big Data...

Read more »

Tutorial: Launch a Spark and R cluster with HDInsight

September 22, 2017
By

If you'd like to get started using R with Spark, you'll need to set up a Spark cluster and install R and all the other necessary software on the nodes. A really easy way to achieve that is to launch an HDInsight cluster on Azure, which is just a managed Spark cluster with some useful extra components. You'll just...

Read more »

Multi-Dimensional Reduction and Visualisation with t-SNE

September 22, 2017
By
Multi-Dimensional Reduction and Visualisation with t-SNE

t-SNE is a very powerful technique that can be used for visualising (looking for patterns) in multi-dimensional data. Great things have been said about this technique. In this blog post I did a few experiments with t-SNE in R to learn about this technique and its uses. Its power to visualise complex multi-dimensional data is Related Post Comparing Trump and...

Read more »

My advice on dplyr::mutate()

September 22, 2017
By
My advice on dplyr::mutate()

There are substantial differences between ad-hoc analyses (be they: machine learning research, data science contests, or other demonstrations) and production worthy systems. Roughly: ad-hoc analyses have to be correct only at the moment they are run (and often once they are correct, that is the last time they are run; obviously the idea of reproducible … Continue reading My...

Read more »

Mining USPTO full text patent data – Analysis of machine learning and AI related patents granted in 2017 so far – Part 1

September 21, 2017
By
Mining USPTO full text patent data – Analysis of machine learning and AI related patents granted in 2017 so far – Part 1

The United States Patent and Trademark office (USPTO) provides immense amounts of data (the data I used are in the form of XML files). After coming across these datasets, I thought that it would be a good idea to explore where and how my areas of interest fall into the intellectual property space; my areas of interest being machine...

Read more »

Will Stanton hit 61 home runs this season?

September 21, 2017
By
Will Stanton hit 61 home runs this season?

So far, Giancarlo Stanton has hit 56 home runs in 555 at bats over 149 games. Miami has 10 games left to play. What’s the chance he’ll The post Will Stanton...

Read more »

Pirating Pirate Data for Pirate Day

September 21, 2017
By
Pirating Pirate Data for Pirate Day

This past Tuesday was Talk Like A Pirate Date, the unofficial holiday of R (aRRR!) users worldwide. In recognition of the day, Bob Rudis used R to create this map of worldwide piracy incidents from 2013 to 2017. The post provides a useful and practical example of extracting data from a website without an API, otherwise known as "scraping"...

Read more »

Exploratory Data Analysis of Tropical Storms in R

September 21, 2017
By
Exploratory Data Analysis of Tropical Storms in R

Exploratory Data Analysis of Tropical Storms in R The disastrous impact of recent hurricanes, Harvey and Irma, generated a large influx of data within the online community. I was curious about the history of hurricanes and tropical storms so I found a data set on data.world and started some basic Exploratory data analysis (EDA). EDA

Read more »

Gold-Mining – Week 3 (2017)

September 21, 2017
By

Week 3 Gold Mining and Fantasy Football Projection Roundup now available. Go get that free agent gold! The post Gold-Mining – Week 3 (2017) appeared first on Fantasy Football Analytics.

Read more »

Don’t teach students the hard way first

September 21, 2017
By

Imagine you were going to a party in an unfamiliar area, and asked the host for directions to their house. It takes you thirty minutes to get there, on a path that takes you on a long winding road with slow traffic. As the party ends, the host tells you “You can take the highway on your way back,...

Read more »

ggformula: another option for teaching graphics in R to beginners

September 21, 2017
By
ggformula: another option for teaching graphics in R to beginners

A previous entry (http://sas-and-r.blogspot.com/2017/07/options-for-teaching-r-to-beginners.html) describes an approach to teaching graphics in R that also “get students doing powerful things quickly”, as David Robinson suggested. In this guest blog entry, Randall Pruim offers an alternative way based on a different formula interface. Here's Randall: For a number of years I and several of my colleagues have been teaching R to beginners...

Read more »

Comparing Trump and Clinton’s Facebook pages during the US presidential election, 2016

September 21, 2017
By

R has a lot of packages for users to analyse posts on social media. As an experiment in this field, I decided to start with the biggest one: Facebook. I decided to look at the Facebook activity of Donald Trump and Hillary Clinton during the 2016 presidential election in the United States. The winner may Related Post Analyzing Obesity across...

Read more »

Visualizing the Spanish Contribution to The Metropolitan Museum of Art

September 21, 2017
By
Visualizing the Spanish Contribution to The Metropolitan Museum of Art

Well I walk upon the river like it’s easier than land (Love is All, The Tallest Man on Earth) The Metropolitan Museum of Art provides here a dataset with information on more than 450.000 artworks in its collection. You can do anything you want with these data: there are no restrictions of use. Each record … Continue reading Visualizing...

Read more »

Pandigital Products: Euler Problem 32

September 20, 2017
By

Euler Problem 32 returns to pandigital numbers, which are numbers that contain one of each digit. Like so many of the Euler Problems, these numbers serve no practical purpose whatsoever, other than some entertainment value. You can find all pandigital … Continue reading → The post Pandigital Products: Euler Problem 32 appeared first on The Devil is in the Data.

Read more »

Report from Mexico City

September 20, 2017
By
Report from Mexico City

Editors Note: It has been heartbreaking watching the images from México City. Teresa Ortiz, co-organizer of R-Ladies CDMX reports on efforts of data scientists to help. Our thoughts are with them, and with the people of México. It has been a hard couple of days around here. In less than 2 weeks, México has gone through two devastating earthquakes and the...

Read more »

Monte Carlo Simulations & the "SimDesign" Package in R

September 20, 2017
By

Past posts on this blog have included several relating to Monte Carlo simulation - e.g., see here, here, and here.Recently I came across a great article by Matthew Sigal and Philip Chalmers in the Journal of Statistics Education. It's titled, "Play it Again: Teaching Statistics With Monte Carlo Simulation", and the full reference appears below. The authors provide a really...

Read more »

Answer probability questions with simulation (part-2)

September 20, 2017
By
Answer probability questions with simulation (part-2)

This is the second exercise set on answering probability questions with simulation. Finishing the first exercise set is not a prerequisite. The difficulty level is about the same – thus if you are looking for a challenge aim at writing up faster more elegant algorithms. As always, it pays off to read the instructions carefully Related exercise sets: Answer probability...

Read more »

EARL London 2017 – That’s a wrap!

September 20, 2017
By
EARL London 2017 – That’s a wrap!

...

Read more »

Preview: ALTREP promises to bring major performance improvements to R

September 20, 2017
By

Changes are coming to the internals of the R engine which promise to improve performance and reduce memory use, with dramatic impacts in some circumstances. The changes were first proposed by Gabe Becker at the DSC Conference in 2016 (and updated in 2017), and the implementation by Luke Tierney and Gabe Becker is now making its way into the...

Read more »

pinp 0.0.2: Onwards

September 20, 2017
By
pinp 0.0.2: Onwards

A first update 0.0.2 of the pinp package arrived on CRAN just a few days after the initial release. We added a new vignette for the package (see below), extended a few nice features, and smoothed a few corners. The NEWS entry for this release f...

Read more »

Search R-bloggers

Sponsors

Mango solutions







Zero Inflated Models and Generalized Linear Mixed Models with R



Quantide: statistical consulting and training

ODSC2 west

ODSC1

datasociety

http://www.eoda.de





CRC R books series







Six Sigma Online Training



mljar.com



Contact us if you wish to help support R-bloggers, and place your banner here.