Blog Archives

Restricted Boltzmann Machines in R

January 14, 2013
By
Restricted Boltzmann Machines in R

Restricted Boltzmann Machines (RBMs) are an unsupervised learning method (like principal components). An RBM is a probabilistic and undirected graphical model. They are becoming more popular in machine learning due to recent success in training them with contrastive divergence. They have been proven useful in collaborative filtering, being one of the most successful methods...

Read more »

Factor Analysis of Baseball’s Hall of Fame Voters

January 9, 2013
By
Factor Analysis of Baseball’s Hall of Fame Voters

Factor Analysis of Baseball's Hall of Fame VotersRecently, Nate Silver wrote a post which analyzed how voters who voted for and against Barry Bonds for Baseball's Hall of Fame differed. Not surprisingly, those who voted for Bonds were more likely to vote for other suspected steroids users (like Roger Clemens). This got...

Read more »

Quick Post About Getting and Plotting Polls in R

November 5, 2012
By
Quick Post About Getting and Plotting Polls in R

With the election nearly upon us, I wanted to share an easy way I just found to download polling data and graph a few with ggplot2. dlinzer at github created a function to download poll data from the Huffington Post's Pollster API.The default is to dow...

Read more »

Finding the Best Subset of a GAM using Tabu Search and Visualizing It in R

August 24, 2012
By
Finding the Best Subset of a GAM using Tabu Search and Visualizing It in R

Finding the best subset of variables for a regression is a very common task in statistics and machine learning. There are statistical methods based on asymptotic normal theory that can help you decide whether to add or remove a variable at a time. The ...

Read more »

Random Forest Variable Importance

July 19, 2012
By

Random forests ™ are great. They are one of the best "black-box" supervised learning methods. If you have lots of data and lots of predictor variables, you can do worse than random forests. They can deal with messy, real data. If there are lots of extraneous predictors, it has no problem. It automatically does a good job...

Read more »

Rounding in R

June 15, 2012
By

Forgive me if you are already aware of this, but I found it quite alarming. I know that most code is interpreted by the computer in binary and we input in decimal, so problems can arise in conversion and with floating point. But the example I have below is so simple that it really surprised me.I was converting...

Read more »

Space Time Swing Probability Plot for Ichiro

May 30, 2012
By

I was having some fun with PITCHf/x data and generalize additive models. PITCHf/x keeps track of the trajectory, path, location of every pitch in the MLB. It is pretty accurate and opens up baseball to more analyses than ever before. Generalized additi...

Read more »

Sending a Text in R

May 25, 2012
By
Sending a Text in R

Don't you hate it when you are running a long piece of code and you keep checking the results every 15 minutes, hoping it will finish? There is a better way.I got the idea from here. He uses a Python script and the text interface is not free. I thought...

Read more »

Cleveland Indians’ Attendance

May 20, 2012
By
Cleveland Indians’ Attendance

Recently, Chris Perez, the closer for the Indians, displayed some frustration with the fans for not supporting the team. Currently, they have the lowest attendance in the majors -- by a decent margin. The Indians are averaging about 15,000 fans per hom...

Read more »

What’s Up with Albert Pujols?

May 5, 2012
By
What’s Up with Albert Pujols?

After signing a huge deal with the Angels, Pujols has been having a really bad year. He hasn't hit a home run this year, breaking a career long streak. So I thought it would be a good idea to use some statistics to tell how good or bad we think Pujols will actually be this year.Coming into the year,...

Read more »