Example of linear regression and regularization in R

April 28, 2014
By

(This article was first published on R - Data School, and kindly contributed to R-bloggers)

When getting started in machine learning, it’s often helpful to see a worked example of a real-world problem from start to finish. But it can be hard to find an example with the “right” level of complexity for a novice. Here’s what I look for:

  • uses real-world data, not artificially simple data
  • demonstrates multiple models on the same data and compares them using a reasonable evaluation metric
  • explains the thinking of the modeler at each step in the process
  • includes readable, commented code

My linear regression example

In my Data Science class, we were assigned to perform linear regression on a dataset based on Kaggle’s Job Salary Prediction competition. I posted my solution on RPubs, and thought it might be helpful as a regression example for other machine learning novices. Here’s what my solution entails:

  • reading in the data from a CSV file
  • visualizing the data using the ggplot2 package
  • exploring the data using the table() and tapply() functions
  • creating text-based features using regular expressions
  • building linear models with different features, and comparing their performance using RMSE on a validation set
  • building regularized models using ridge regression and lasso (from the glmnet package)
  • selecting features using a forward stepwise approach (from the leaps package)
  • choosing the best model, training it on the full training set, and predicting on the test set

Please check it out, and let me know what you think! You can also run the code yourself if you download the data files into your working directory in R.

I’m happy to answer your questions! I admit that I didn’t include nearly enough explanation for someone who is unfamiliar with these techniques, though I hope you find it useful in any case.

Publishing your own document to RPubs

If you’ve never used RPubs, it’s an easy (and free) way to publish “R Markdown” documents directly from RStudio. It allows you to weave together your code, output (including plots), and explanation (written in standard Markdown) into a single document. Here’s how to get started with R Markdown, and how to publish to RPubs.

To leave a comment for the author, please follow the link and comment on their blog: R - Data School.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)