A new version of the validate package for data validation was just accepted on CRAN and will be available on all mirrors in a few days.
The most important addition is that you can now reference the data set as a whole, using the “dot” syntax like so:
iris %>% check_that( nrow(.)>100 , "Sepal.Width" %in% names(.)) %>% summary() rule items passes fails nNA error warning expression 1 V1 1 1 0 0 FALSE FALSE nrow(.) > 100 2 V2 1 1 0 0 FALSE FALSE "Sepal.Width" %in% names(.)
Also, it is now possible to return a logical, even when the result is NA, by passing the
dat = data.frame(x=c(1,NA,-1)) v = validator(x > 0) values(confront(dat,v)) V1 [1,] TRUE [2,] NA [3,] FALSE values(confront(dat,v,na.value=FALSE)) V1 [1,] TRUE [2,] FALSE [3,] FALSE
A complete list of changes and bugfixes can be found in the NEWS file. Below I include changes in 1.4 since I did not write about it before.
– The ‘.’ is now used to reference the validated data set as whole.
– Small change in output of ‘compare’ to match the table in van den Broek et al. (2013)
– ‘confront’ now emits a warining when variable name conflicts with name of a reference data set
– Deprecated ‘validate_reset’, in favour of the shorter ‘reset’ (use ‘validate::reset’ in case of ambiguity)
– Deprecated ‘validate_options’ in favour of the shorter ‘voptions’
– New option na.value with default value NA, controlling the output when a rule evaluates to NA.
– Added rules from the ESSnet on validation (deliverable 17) to automated tests.
– added ‘grepl’ to allowed validation syntax (suggested by Dusan Sovic)
– exported a few functions w/ keywords internal for extensibility
– Bugfix: blocks sometimes reported wrong nr of blocks (in case of a single connected block.)
– Bugfix: macro expansion failed when macros were reused in other macros.
– Bugfix: certain nonlinear relations were recognized as linear
– Bugfix: rules that use (anonymous) function definitions raised error when printed.