Articles by mark

Track changes in data with the lumberjack %>>%

June 23, 2017 | mark

So you are using this pipeline to have data treated by different functions in R. For example, you may be imputing some missing values using the simputation package. Let us first load the only realistic dataset in R __ data(retailers, … Continue reading →

[Read more...]

Announcing the simputation package: make imputation simple

September 13, 2016 | mark

I am happy to announce that my simputation package has appeared on CRAN this weekend. This package aims to simplify missing value imputation. In particular it offers standardized interfaces that make it easy to define both imputation method and imputation … Continue reading → [Read more...]

stringdist 0.9.4.2 released

September 11, 2016 | mark

stringdist 0.9.4.2 was accepted on CRAN at the end of last week. This release just fixes a few bugs affecting the stringdistmatrix function, when called with a single argument. From the NEWS file: bugfix in stringdistmatrix(a): value of p, for … Continue reading →

[Read more...]

validate version 1.5 is out

June 24, 2016 | mark

A new version of the validate package for data validation was just accepted on CRAN and will be available on all mirrors in a few days. The most important addition is that you can now reference the data set as … Continue reading → [Read more...]

Easy data validation with the validate package

March 25, 2016 | mark

The validate package is our attempt to make checking data against domain knowledge as easy as possible. Here is an example. [crayon-56f5bac53c653388423735/] The summary gives an overview of the number of items checked. For an aggregated test, such as the … Continue reading → [Read more...]

settings 0.2.3

October 27, 2015 | mark

An updated version of the settings package has been accepted on CRAN. The settings package provides alternative options settings management for R. It is aimed to allow for layered options management where global options are the default that can easily … Continue reading → [Read more...]

stringdist 0.9.4 and 0.9.3: distances between integer sequences

October 27, 2015 | mark

A new release of stringdist has been accepted on CRAN. stringdist offers a number of popular distance functions between sequences of integers or characters that are independent of character encoding. version 0.9.4 bugfix: edge case for zero-size for lower tridiagonal … Continue reading → [Read more...]

Stringdist 0.9.2: dist objects, string similarities and some deprecated arguments

June 24, 2015 | mark

On 24-06-2015 stringdist 0.9.2 was accepted on CRAN. A summary of new features can be found in the NEWS file; here I discuss the changes with some examples. Computing 'dist' objects with 'stringdistmatrix' The R dist object is used as … Continue reading → [Read more...]

stringdist 0.9: exercise all your cores

January 26, 2015 | mark

The latest release of the stringdist package for approximate text matching has two performance-enhancing novelties. First of all, encoding conversion got a lot faster since this is now done from C rather than from R. Secondly, stringdist now employs multithreading … Continue reading → [Read more...]

Easy to use option settings management with the ‘settings’ package

November 5, 2014 | mark

Last week I released a new package called settings. It grew out of my frustration built up during several small projects where I'm generating heavily parameterized d3/js output. What I wanted was support to define a whole bunch of option … Continue reading → [Read more...]

stringdist 0.8: now with soundex

August 22, 2014 | mark

An update to the stringdist package was released earlier this month. Thanks to a contribution of Jan van der Laan the package now includes a method to compute soundex codes as defined here. Briefly, soundex encoding aims to translate words … Continue reading → [Read more...]

sort.data.frame

August 15, 2014 | mark

I came accross this post on SO, where several solutions to sorting data.frames are presented. It must have been solved a million times, but here's a solution I like to use. It benefits from the fact that sort is an … Continue reading → [Read more...]

Review of “Building interactive graphs with ggplot2 and shiny”

August 4, 2014 | mark

Recently, Packt published a video course with the above title, and I've just spent a pleasant morning reviewing it on Packt's request. Pleasant, because I think the course gives an excellent introduction to both ggplot2 and shiny. The course is … Continue reading → [Read more...]

A bit of benchmarking with string distances

September 7, 2013 | mark

After my last post about the stringdist package, Zachary Mayer pointed out to me that the implementation of the Levenshtein and Jaro-Winkler distances implemented in the RecordLinkage package are about two-three times faster. His benchmark compares randomly generated character strings … Continue reading → [Read more...]

Approximate string matching in R

August 9, 2013 | mark

I have released a new version of the stringdist package. Besides a some new string distance algorithms it now contains two convenient matching functions: amatch: Equivalent to R's match function but allowing for approximate matching. ain: Similar to R's %in% … Continue reading → [Read more...]

The stringdist package

February 26, 2013 | mark

String metrics have important applications in web search, spelling correction and computational biology amongst others. Many different metrics exist, but the most well-known are based on counting the number of basic edit operations it takes to turn one string into … Continue reading → [Read more...]

Representation of numerical NA’s in R and the 1954 enigma

July 8, 2012 | mark

I've always wondered how exactly the missing value (NA) in R is represented under the hood. Last weekend I was working on a little project that gave me enough excuse to spend some time on finding this out. So, I … Continue reading → [Read more...]

Deductive imputation with the deducorrect package

November 26, 2011 | mark

Missing data hinders statistical analyses. Estimating missing values (imputation) prior to analysis is one way to deal with that. In some cases however, the missings need not be estimated at all, since they can be derived with certainty from other … Continue reading → [Read more...]

What do your rules look like? editrules 1.8-x answers with the help of igraph

October 26, 2011 | mark

We (Edwin de Jonge and me) have recently updated our editrules package. The most important new features include (beta) support for categorical data. However, in this post I'm going to show some visualizations we included, made possible by Gabor Csardi's … Continue reading → [Read more...]

A multidimensional “which” function

September 16, 2011 | mark

update Henrik Bengtsson commented that which(x, arr.ind=TRUE) gives the same result, rendering the blog below academic (thanks for the comment!). So, for academic interest, I'll leave it. In my defense, I implemented this kind of functionality in C some time … Continue reading → [Read more...]

« 1 2

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Articles by mark

Track changes in data with the lumberjack %>>%

Announcing the simputation package: make imputation simple

stringdist 0.9.4.2 released

validate version 1.5 is out

Easy data validation with the validate package

settings 0.2.3

stringdist 0.9.4 and 0.9.3: distances between integer sequences

Stringdist 0.9.2: dist objects, string similarities and some deprecated arguments

stringdist 0.9: exercise all your cores

Easy to use option settings management with the ‘settings’ package

stringdist 0.8: now with soundex

sort.data.frame

Review of “Building interactive graphs with ggplot2 and shiny”

A bit of benchmarking with string distances

Approximate string matching in R

The stringdist package

Representation of numerical NA’s in R and the 1954 enigma

Deductive imputation with the deducorrect package

What do your rules look like? editrules 1.8-x answers with the help of igraph

A multidimensional “which” function

Articles by mark

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)