2157 search results for "twitter"

Pre-processing text: R/tm vs. python/NLTK

February 16, 2011
By
Pre-processing text: R/tm vs. python/NLTK

  Let’s say that you want to take a set of documents and apply a computational linguistic technique.  If your method is based on the bag-of-words model, you probably need to pre-process these documents first by segmenting, tokenizing, stripping, stopwording, and … Continue reading →

Read more »

Mixed models – Part 2: lme lmer

February 15, 2011
By
Mixed models – Part 2: lme lmer

Getting more into mixed models, I’ve been playing around with both nlme::lme and lme4::lmer. http://tolstoy.newcastle.edu.au/R/e2/help/06/10/3345.html was quite a good post at explaining the differences, which from what I gather is largely performance based when using crossed or partially crossed models. In the models I am tinkering with at the moment I am noticing differences in

Read more »

ABC in London

February 15, 2011
By
ABC in London

After the very exciting and I think quite successful ABC in Paris meeting two years ago, Michael Stumpf from Imperial College London suggested a second edition in London along the same lines. Michael kindly associated me with the planning of this meeting. It is (logically) called ABC in London (or ABCiL) and will take place

Read more »

Reaching 1000

February 14, 2011
By
Reaching 1000

This is the 1000th post on the ‘Og! Here are the entries that have had above 1000 views (not viewers) so far: In{s}a(ne)!! 5,353 “simply start over and build something better” 4,345 Julien on R shortcomings 1,966 Sudoku via simulated annealing 1,762 Of black swans and bleak prospects 1,462 Do we need an integrated Bayesian/likelihood

Read more »

The Most Romantic Electro-Grunge Statistical Computing Song Ever Made

February 14, 2011
By
The Most Romantic Electro-Grunge Statistical Computing Song Ever Made

Warning message: This song contains highly suggestive coefficients and graphic depictions of exuberant R-core lovin’. “Plotting Ihaka” is based on Rotting Piñata by Sponge, and reflects a small measure of my boundless joy in the world of R. Despite being a firm proponent of muffins, I can confidently say that I would rather live in

Read more »

Another Bernoulli factory

February 13, 2011
By
Another Bernoulli factory

The paper “Exact sampling for intractable probability distributions via a Bernoulli factory” by James Flegal and Radu Herbei got posted on arXiv without me noticing, presumably because it came out just between Larry Brown’s conference in Philadelphia and my skiing vacations! I became aware of it only yesterday and find it quite interesting in that

Read more »

Visualize NHL Play-by-Play using Tableau Public and R

February 13, 2011
By
Visualize NHL Play-by-Play using Tableau Public and R

Nothing like a little Sunday morning data hacking before a big game!  I have been wanting to play with the NHL play-by-play event files for some time now.  The JSON datasets provide a wealth of information about each event in the game including the location, as defined by the fields xcoord and ycoord. I am

Read more »

Parallel computation [back]

February 12, 2011
By
Parallel computation [back]

We have now received reports back from JCGS for our parallel MCMC paper and they all are very nice and supportive! The reviewers essentially all like the Rao-Blackwellisation concept we developed in the paper and ask for additions towards a more concrete feeling for the practical consequences of the method. We should thus be able

Read more »

Le Monde puzzle [#5]

February 10, 2011
By
Le Monde puzzle [#5]

Another Sudoku-like puzzle from the weekend edition of Le Monde. The object it starts with is a 9×9 table where each entry is an integer and where neighbours take adjacent values. (Neighbours are defined as north, west, south and east of an entry.) The question is about whether or not it is possible to find

Read more »

Model weights for model choice

February 9, 2011
By
Model weights for model choice

An ‘Og reader. Emmanuel Charpentier, sent me the following email about model choice: I read with great interest your critique of Peter Congdon’s 2006 paper (CSDA, 50(2):346-357) proposing a method of estimation of posterior model probabilities based on improper distributions for parameters not present in the model inder examination, as well as a more general

Read more »