R code to accompany Real-World Machine Learning (Chapter 3)

October 15, 2016
By

(This article was first published on data prone - R, and kindly contributed to R-bloggers)

Abstract

The rwml-R Github repo is updated with R code to accompany Chapter 3 of the book “Real-World Machine Learning” by Henrik Brink, Joseph W. Richards, and Mark Fetherolf.

Survivors on the Titanic

The Titanic Passengers dataset is used to illustrate various processes used
to prepare data for modeling, including
conversion of factor variables to dummy variables. For example, the code
to produce the
following table of processed data is provided:

Survived.yes Pclass Sex.male Age SibSp Parch Embarked.Q Embarked.S sqrtFare
0 3 1 22 1 0 0 1 2.692582
1 1 0 38 1 0 0 0 8.442944
1 3 0 26 0 0 0 1 2.815138
1 1 0 35 1 0 0 1 7.286975
0 3 1 35 0 0 0 1 2.837252
0 3 1 -1 0 0 1 0 2.908316

I also go “off-script” a bit (do some things not contained in the book) and
demonstrate some useful visualization, modeling, and performance
measuring techniques available with the
caret and AppliedPredictiveModeling packages.

MNIST database of handwritten digits

A k-nearest neighbors classifier (from the kknn package) is used to
predict the numbers represented in the MNIST database of handwritten digits.
Examples of the types of digits present in the dataset and the R code to
display them:

Figure generated by above code

Auto MPG dataset

As an example of a linear regression analysis, the Auto MPG dataset introduced
in Chapter 2 resurfaces and fuel economy is predicted from origin, year of
production, and performance characteristics such as horsepower and engine
displacement.

As always, feedback is welcome

As always, I’d love to hear from you if you find the project helpful or if you
have any suggestions. Please leave a comment below or use the Tweet button.
Also, feel free to fork the rwml-R repo
and submit a pull request if you wish to contribute.

Download
Fork

To leave a comment for the author, please follow the link and comment on their blog: data prone - R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)