Blog Archives

Lists of English Words

October 12, 2010
By
Lists of English Words

When I was a kid, I went through an 80s music phase…well, some things never change. “People just love to play with words…” Know that song? Anyway…

One of the biggest pains of text mining and NLP is colloquialism — language that is only appropriate in casual language and not in formal speech or writing. Words such as informal contractions...

Read more »

My Crappy Fantasy Football Draft

September 22, 2010
By
My Crappy Fantasy Football Draft

I compared the results of my fantasy football draft with the results of more than 1500 mock drafts at the Fantasy Football Calculator (FFC).  I looked at where player X was drafted in our league, subtracted off the average draft … Continue reading

Read more »

Using XML package vs. BeautifulSoup

August 31, 2010
By
Using XML package vs. BeautifulSoup

A while back I posted something about scraping a webpage using the BeautifulSoup module in Python.  One of the comments to that post was by Larry — a blogger over at IEORTools — suggesting that I take a look at … Continue reading

Read more »

Taking R to the Limit: Large Datasets; Predictive modeling with PMML and ADAPA

August 30, 2010
By
Taking R to the Limit: Large Datasets; Predictive modeling with PMML and ADAPA

During the first part of our meeting, Ryan Rosario presented on the topic of large datasets in R. Video, slides and code of the talk “Taking R to the Limit: Large Datasets” by Ryan Rosario at the Los Angeles area … Continue reading

Read more »

A Rule Change in Major League Soccer?

August 23, 2010
By
A Rule Change in Major League Soccer?

I have to admit that working with my Major League Soccer data set has been slow going.  There are a few reasons:  (1) I have a full-time job at the National Renewable Energy Lab and (2) the data isn’t quite … Continue reading

Read more »

Goals per Game in MLS

August 16, 2010
By
Goals per Game in MLS

I promised something related to Major League Soccer and here it is.  Caveat:  It’s not much.  Why so sparse?  (1) The data is a bit messy due to teams folding, expansion, name changes, etc.  (2)  I was backpacking all weekend and didn’t have time to work on this side project.  Yes, I have a real

Read more »

Apologies and Style Guides

August 13, 2010
By
Apologies and Style Guides

I have to say that it’s pretty exciting to watch your blog go from a few hits over its lifetime to getting almost 200 in a single day.  I am currently negotiating with Google over the purchase of this blog.  Or maybe not.  Again, thanks be to @revodavid for posting to the Revolution Analytics Blog.

Read more »

Are MLB Games Getting Longer?

August 5, 2010
By
Are MLB Games Getting Longer?

On July 29, 2010, I had a flight from Denver to Cincinnati.  About an hour before boarding, I went to ESPN’s website and found a new article by Bill Simmons, a.k.a The Sports Guy (@sportsguy33 on Twitter).  The basic premise of this article is that a core group of fans is losing interest in Red

Read more »

Taking R to the Limit, Part I – Parallelization in R

July 28, 2010
By
Taking R to the Limit, Part I – Parallelization in R

Tuesday night I had the opportunity to present on high performance computing in R, and the Los Angeles R Users’ Group. There was so much to talk about that I had to split my talk into two parts. The first part was parallelization and the second ...

Read more »

My Experience at Hadoop Summit 2010 #hadoopsummit

June 30, 2010
By
My Experience at Hadoop Summit 2010 #hadoopsummit

This week I had the opportunity the trek up north to Silicon Valley to attend Yahoo’s Hadoop Summit 2010. I love Silicon Valley. The few times I’ve been there the weather was perfect (often warmer than LA), little to no traffic, no road rage and people overall seem friendly and happy. Not to mention there are so many trees...

Read more »