Why we Did Not Name the cdata Transforms wide/tall/long/short
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
We recently saw this UX (user experience) question from the tidyr author as he adapts tidyr to cdata techniques.
The terminology that he is not adopting from cdata is “unpivot_to_blocks()” and “pivot_to_rowrecs()”. One of the research ideas in the cdata package is that the important thing to call out is record structure.
The key point is: are we in a very de-normalized form where all facts about an instance are in a single row (which we called “row records”), or are we in a record oriented form where all the facts about an instances are in several rows (which we called “block records”)? The point is: row records don’t necessarily have more columns than block records. This makes shape based naming of the transforms problematic, no matter what names you pick for the shapes. There is an advantage to using intent or semantic based naming.
Below is a simple example.
Notice the width of the result relative to input width varies as function of the input data, even though we were always calling the same transform. This makes it incorrect to characterize these transforms as merely widening or narrowing.
There are still some subtle points (for instance row records are in fact instances of block records), but overall the scheme we (Nina Zumel, and myself: John Mount) worked out, tested, and promoted is pretty good. A lot of our work researching this topic can be found here.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.