Blog Archives

Accessing R from Python using RPy2

October 24, 2010
By

This past Tuesday I had the opportunity to present a short talk (a bit long) related to text mining at the Los Angeles R Users’ Group. Since I do most of my text mining in Python, I took this opportunity to discuss RPy2, an interface to R from Python. My slides are below:Accessing R from Python...

Read more »

Transactions, and Pondering their Use in Casinos

October 20, 2010
By
Transactions, and Pondering their Use in Casinos

A couple of weeks ago, Bradford Cross of FlightCaster posted in Measuring Measures that transactions are the next big data category. I argue that they already are, and from reading his blog post, he seems to suggest this as well but I will admit that I think I missed his point. There are some clear examples of transactions and...

Read more »

Introduction to statistical finance with R

October 19, 2010
By
Introduction to statistical finance with R

During the first part of our meeting, Nicolas Christou gave an introduction of statistical finance in R, and presented a package he co-authored with previous PhD student David Diez (2010). Video of the talk is below: During the second part, … Continue reading →

Read more »

Lists of English Words

October 12, 2010
By
Lists of English Words

When I was a kid, I went through an 80s music phase…well, some things never change. “People just love to play with words…” Know that song? Anyway… One of the biggest pains of text mining and NLP is colloquialism — language that is only appropriate in casual language and not in formal speech or writing. Words such as informal contractions...

Read more »

My Crappy Fantasy Football Draft

September 22, 2010
By
My Crappy Fantasy Football Draft

I compared the results of my fantasy football draft with the results of more than 1500 mock drafts at the Fantasy Football Calculator (FFC).  I looked at where player X was drafted in our league, subtracted off the average draft … Continue reading →

Read more »

Using XML package vs. BeautifulSoup

August 31, 2010
By
Using XML package vs. BeautifulSoup

A while back I posted something about scraping a webpage using the BeautifulSoup module in Python.  One of the comments to that post was by Larry — a blogger over at IEORTools — suggesting that I take a look at … Continue reading →

Read more »

Taking R to the Limit: Large Datasets; Predictive modeling with PMML and ADAPA

August 30, 2010
By
Taking R to the Limit: Large Datasets; Predictive modeling with PMML and ADAPA

During the first part of our meeting, Ryan Rosario presented on the topic of large datasets in R. Video, slides and code of the talk “Taking R to the Limit: Large Datasets” by Ryan Rosario at the Los Angeles area … Continue reading →

Read more »

A Rule Change in Major League Soccer?

August 23, 2010
By
A Rule Change in Major League Soccer?

I have to admit that working with my Major League Soccer data set has been slow going.  There are a few reasons:  (1) I have a full-time job at the National Renewable Energy Lab and (2) the data isn’t quite … Continue reading →

Read more »

Goals per Game in MLS

August 16, 2010
By
Goals per Game in MLS

I promised something related to Major League Soccer and here it is.  Caveat:  It’s not much.  Why so sparse?  (1) The data is a bit messy due to teams folding, expansion, name changes, etc.  (2)  I was backpacking all weekend and didn’t have time to work on this side project.  Yes, I have a real

Read more »

Apologies and Style Guides

August 13, 2010
By
Apologies and Style Guides

I have to say that it’s pretty exciting to watch your blog go from a few hits over its lifetime to getting almost 200 in a single day.  I am currently negotiating with Google over the purchase of this blog.  Or maybe not.  Again, thanks be to @revodavid for posting to the Revolution Analytics Blog.

Read more »