Articles by John Mount

Binning Data in a Database

February 28, 2019 | John Mount

Roz King just wrote an interesting article on binning data (a common data analytics step) in a database. He compares a case-based approach (where the bin divisions are stuffed into code) with a join based approach. He shares code and timings. Best of all: rquery gets some attention and turns ...
[Read more...]

R Journal Volume 10/2, December 2018 is out!

February 25, 2019 | John Mount

We forgot to say: R Journal Volume 10/2, December 2018 is out! A huge thanks to the editors who work very hard to make this possible. And big “thank you” to the editors, referees, and journal for helping improve, and for including our note on pipes in R.
[Read more...]

More on Macros in R

February 23, 2019 | John Mount

Recently ran into something interesting in the R macros/quasi-quotation/substitution/syntax front: Romain François: “.@_lionelhenry reveals planned double curly syntax At #satRdayParis as a possible replacement, addition to !! and enquo()” It appears !! is no longer the last word in substitution (it certainly wasn’t the first). The described ...
[Read more...]

Getting Started With rquery

February 20, 2019 | John Mount

To make getting started with rquery (an advanced query generator for R) easier we have re-worked the package README for various data-sources (including SparkR!). Here are our current examples: rquery and MonetDBLite rquery and RPostgreSQL rquery and RSQLite rquery and SparkR rquery and sparklyr For the MonetDBLite the query diagrammer ...
[Read more...]

Playing With Pipe Notations

February 19, 2019 | John Mount

Recently Hadley Wickham prescribed pronouncing the magrittr pipe as “then” and using right-assignment as follows: I am not sure if it is a good or bad idea. But let’s play with it a bit, and perhaps readers can submit their experience and opinions in the comments section. Right assignment ...
[Read more...]

Query Generation in R

February 16, 2019 | John Mount

R users have been enjoying the benefits of SQL query generators for quite some time, most notably using the dbplyr package. I would like to talk about some features of our own rquery query generator, concentrating on derived result re-use. Introduction SQL represents value use by nesting. To use a ...
[Read more...]

cdata Control Table Keys

February 11, 2019 | John Mount

In our cdata R package and training materials we emphasize the record-oriented thinking and how to design a transform control table. We now have an additional exciting new feature: control table keys. The user can now control which columns of a cdata control table are the keys, including now using ... [Read more...]

Function Objects and Pipelines in R

February 3, 2019 | John Mount

Composing functions and sequencing operations are core programming concepts. Some notable realizations of sequencing or pipelining operations include: Unix’s |-pipe CMS Pipelines. F#‘s forward pipe operator |__. Haskel’s Data.Function & operator. The R magrittr forward pipe. Scikit-learn‘s sklearn.pipeline.Pipeline. The idea is: many important calculations can ... [Read more...]

Fully General Record Transforms with cdata

January 20, 2019 | John Mount

One of the design goals of the cdata R package is that very powerful and arbitrary record transforms should be convenient and take only one or two steps. In fact it is the goal to take just about any record shape to any other in two steps: first convert to ... [Read more...]

Make Teaching R Quasi-Quotation Easier

January 17, 2019 | John Mount

To make teaching R quasi-quotation easier it would be nice if R string-interpolation and quasi-quotation both used the same notation. They are related concepts. So some commonality of notation would actually be clarifying, and help teach the concepts. We will define both of the above terms, and demonstrate the relation ...
[Read more...]

R Tip: Use Inline Operators For Legibility

January 14, 2019 | John Mount

R Tip: use inline operators for legibility. A Python feature I miss when working in R is the convenience of Python‘s inline + operator. In Python, + does the right thing for some built in data types: It concatenates lists: [1,2] + [3] is [1, 2, 3]. It concatenates strings: 'a' + 'b' is 'ab'. … Continue reading R ... [Read more...]

Practical Data Science with R, 2nd Edition discount!

January 12, 2019 | John Mount

Please help share our news and this discount. The second edition of our best-selling book Practical Data Science with R2, Zumel, Mount is featured as deal of the day at Manning. The second edition isn’t finished yet, but chapters 1 through 4 are available in the Manning Early Access Program (MEAP), ...
[Read more...]

R Tip: Use seqi() For Indexes

January 11, 2019 | John Mount

R Tip: use seqi() for indexing. R‘s “1:0 trap” is a mal-feature that confuses newcomers and is a reliable source of bugs. This note will show how to use seqi() to write more reliable code and document intent. The issue is, contrary to expectations (formed in working with other programming ... [Read more...]

A Beautiful 2 by 2 Matrix Identity

January 8, 2019 | John Mount

While working on a variation of the RcppDynProg algorithm we derived the following beautiful identity of 2 by 2 real matrices: The superscript “top” denoting the transpose operation, the ||.||^2_2 denoting sum of squares norm, and the single |.| denoting determinant. This is derived from one of the check equations for the Moore–Penrose ...
[Read more...]

Timing the Same Algorithm in R, Python, and C++

January 6, 2019 | John Mount

While developing the RcppDynProg R package I took a little extra time to port the core algorithm from C++ to both R and Python. This means I can time the exact same algorithm implemented nearly identically in each of these three languages. So I can extract some comparative “apples to ... [Read more...]

What does it mean to write “vectorized” code in R?

January 3, 2019 | John Mount

One often hears that R can not be fast (false), or more correctly that for fast code in R you may have to consider “vectorizing.” A lot of knowledgable R users are not comfortable with the term “vectorize”, and not really familiar with the method. “Vectorize” is just a slightly ...
[Read more...]

Introducing RcppDynProg

December 31, 2018 | John Mount

RcppDynProg is a new Rcpp based R package that implements simple, but powerful, table-based dynamic programming. This package can be used to optimally solve the minimum cost partition into intervals problem (described below) and is useful in building piecewise estimates of functions (shown in this note). The abstract problem The ...
[Read more...]

vtreat Variable Importance

December 17, 2018 | John Mount

vtreat‘s purpose is to produce pure numeric R data.frames that are ready for supervised predictive modeling (predicting a value from other values). By ready we mean: a purely numeric data frame with no missing values and a reasonable number of columns (missing-values re-encoded with indicators, and high-degree categorical ... [Read more...]
1 4 5 6 7 8 22

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)