Blog Archives

Easy data validation with the validate package

March 25, 2016
By
Easy data validation with the validate package

The validate package is our attempt to make checking data against domain knowledge as easy as possible. Here is an example. The summary gives an overview of the number of items checked. For an aggregated test, such as the … Continue reading →

Read more »

settings 0.2.3

October 27, 2015
By

An updated version of the settings package has been accepted on CRAN. The settings package provides alternative options settings management for R. It is aimed to allow for layered options management where global options are the default that can easily … Continue reading →

Read more »

stringdist 0.9.4 and 0.9.3: distances between integer sequences

October 27, 2015
By

A new release of stringdist has been accepted on CRAN. stringdist offers a number of popular distance functions between sequences of integers or characters that are independent of character encoding. version 0.9.4 bugfix: edge case for zero-size for lower tridiagonal … Continue reading →

Read more »

Stringdist 0.9.2: dist objects, string similarities and some deprecated arguments

June 24, 2015
By
Stringdist 0.9.2: dist objects, string similarities and some deprecated arguments

On 24-06-2015 stringdist 0.9.2 was accepted on CRAN. A summary of new features can be found in the NEWS file; here I discuss the changes with some examples. Computing 'dist' objects with 'stringdistmatrix' The R dist object is used as … Continue reading →

Read more »

stringdist 0.9: exercise all your cores

January 26, 2015
By
stringdist 0.9: exercise all your cores

The latest release of the stringdist package for approximate text matching has two performance-enhancing novelties. First of all, encoding conversion got a lot faster since this is now done from C rather than from R. Secondly, stringdist now employs multithreading … Continue reading →

Read more »

Easy to use option settings management with the ‘settings’ package

November 5, 2014
By

Last week I released a new package called settings. It grew out of my frustration built up during several small projects where I'm generating heavily parameterized d3/js output. What I wanted was support to define a whole bunch of option … Continue reading →

Read more »

stringdist 0.8: now with soundex

August 22, 2014
By

An update to the stringdist package was released earlier this month. Thanks to a contribution of Jan van der Laan the package now includes a method to compute soundex codes as defined here. Briefly, soundex encoding aims to translate words … Continue reading →

Read more »

sort.data.frame

August 15, 2014
By

I came accross this post on SO, where several solutions to sorting data.frames are presented. It must have been solved a million times, but here's a solution I like to use. It benefits from the fact that sort is an … Continue reading →

Read more »

Review of “Building interactive graphs with ggplot2 and shiny”

August 4, 2014
By

Recently, Packt published a video course with the above title, and I've just spent a pleasant morning reviewing it on Packt's request. Pleasant, because I think the course gives an excellent introduction to both ggplot2 and shiny. The course is … Continue reading →

Read more »

A bit of benchmarking with string distances

September 7, 2013
By

After my last post about the stringdist package, Zachary Mayer pointed out to me that the implementation of the Levenshtein and Jaro-Winkler distances implemented in the RecordLinkage package are about two-three times faster. His benchmark compares randomly generated character strings … Continue reading →

Read more »

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)