Blog Archives

A bit of benchmarking with string distances

September 7, 2013
By

After my last post about the stringdist package, Zachary Mayer pointed out to me that the implementation of the Levenshtein and Jaro-Winkler distances implemented in the RecordLinkage package are about two-three times faster. His benchmark compares randomly generated character strings … Continue reading →

Read more »

Approximate string matching in R

August 9, 2013
By

I have released a new version of the stringdist package. Besides a some new string distance algorithms it now contains two convenient matching functions: amatch: Equivalent to R's match function but allowing for approximate matching. ain: Similar to R's %in% … Continue reading →

Read more »

The stringdist package

February 26, 2013
By

String metrics have important applications in web search, spelling correction and computational biology amongst others. Many different metrics exist, but the most well-known are based on counting the number of basic edit operations it takes to turn one string into … Continue reading →

Read more »

Learning RStudio for R Statistical Computing

December 31, 2012
By
Learning RStudio for R Statistical Computing

I am happy to announce that our book on RStudio has been released last week.

Read more »

Representation of numerical NA’s in R and the 1954 enigma

July 8, 2012
By
Representation of numerical NA’s in R and the 1954 enigma

I've always wondered how exactly the missing value (NA) in R is represented under the hood. Last weekend I was working on a little project that gave me enough excuse to spend some time on finding this out. So, I … Continue reading →

Read more »

Deductive imputation with the deducorrect package

November 26, 2011
By
Deductive imputation with the deducorrect package

Missing data hinders statistical analyses. Estimating missing values (imputation) prior to analysis is one way to deal with that. In some cases however, the missings need not be estimated at all, since they can be derived with certainty from other … Continue reading →

Read more »

What do your rules look like? editrules 1.8-x answers with the help of igraph

October 26, 2011
By
What do your rules look like? editrules 1.8-x answers with the help of igraph

We (Edwin de Jonge and me) have recently updated our editrules package. The most important new features include (beta) support for categorical data. However, in this post I'm going to show some visualizations we included, made possible by Gabor Csardi's … Continue reading →

Read more »

A multidimensional “which” function

September 16, 2011
By
A multidimensional “which” function

update Henrik Bengtsson commented that which(x, arr.ind=TRUE) gives the same result, rendering the blog below academic (thanks for the comment!). So, for academic interest, I'll leave it. In my defense, I implemented this kind of functionality in C some time … Continue reading →

Read more »

Fourier-Motzkin elimination with the editrules package

August 26, 2011
By

Last week I talked about our editrules package at the useR!2011 conference. In the coming time I plan to write a short series of blogs about the functionality of editrules. Below I describe the eliminate and isFeasible functions. But first: … Continue reading →

Read more »

useR!2011

August 19, 2011
By
useR!2011

useR!2011 ended yesterday. First of all, much thanks to the organizers who managed to run a conference with 400+ participants, from 41 countries smoothly. Thumbs up! It was great to meet some people from the R blog-O-sphere in person, like … Continue reading →

Read more »