Articles by John Mount

New Introduction to rquery

October 27, 2019 | John Mount

Introduction rquery is a data wrangling system designed to express complex data manipulation as a series of simple data transforms. This is in the spirit of R’s base::transform(), or dplyr’s dplyr::mutate() and uses a pipe in the style popularized in R with magrittr. The operators themselves ... [Read more...]

Practical Data Science with R 2nd Edition update

October 17, 2019 | John Mount

We are in the last stages of proofing the galleys/typesetting of Zumel, Mount, Practical Data Science with R, 2nd Edition, Manning 2019. So this edition will definitely be out soon! If you ever wanted to see what Nina Zumel and John Mount are like when we have the help of ... [Read more...]

vtreat Cross Validation

October 5, 2019 | John Mount

Nina Zumel finished new documentation on how vtreat‘s cross validation works, which I want to share here. vtreat is a system that makes data preparation for machine learning a “one-liner” (available in R or available in Python). We have a set of starting off points here. These documents describe ... [Read more...]

You Can Override Just About Anything in R

October 2, 2019 | John Mount

To understand computations in R, two slogans are helpful: Everything that exists is an object. Everything that happens is a function call. John Chambers In R, the “[” array access operator is a function call. And it is one a user can re-bind to the new effect of their own choosing. ... [Read more...]

How to Prepare Data

September 26, 2019 | John Mount

Real world data can present a number of challenges to data science workflows. Even properly structured data (each interesting measurement already landed in distinct columns), can present problems, such as missing values and high cardinality categorical variables. In this note we describe some great tools for working with such data. ... [Read more...]

Preparing Data for Supervised Classification

September 24, 2019 | John Mount

Nina Zumel has been polishing up new vtreat for Python documentation and tutorials. They are coming out so good that I find to be fair to the R community I must start to back-port this new documentation to vtreat for R. vtreat is a package for systematically preparing data for ... [Read more...]

The Advantages of Record Transform Specifications

September 18, 2019 | John Mount

Nina Zumel had a really great article on how to prepare a nice Keras performance plot using R. I will use this example to show some of the advantages of cdata record transform specifications. The model performance data from Keras is in the following format: # R code library(wrapr) df
[Read more...]

Practical Data Science with R update

September 15, 2019 | John Mount

Just got the following note from a new reader: Thank you for writing Practical Data Science with R. It’s challenging for me, but I am learning a lot by following your steps and entering the commands. Wow, this is exactly what Nina Zumel and I hoped for. We wish ... [Read more...]

Advanced Data Reshaping in Python and R

September 4, 2019 | John Mount

This note is a simple data wrangling example worked using both the Python data_algebra package and the R cdata package. Both of these packages make data wrangling easy through he use of coordinatized data concepts (relying heavily on Codd’s “rule of access”). The advantages of data_algebra and ... [Read more...]

Why R?

August 30, 2019 | John Mount

I was working with our copy editor on Appendix A of Practical Data Science with R, 2nd Edition; Zumel, Mount; Manning 2019, and ran into this little point (unfortunately) buried in the back of the book. In our opinion the R ecosystem is the fastest path to substantial data science, statistical, ... [Read more...]

It is Time for CRAN to Ban Package Ads

August 30, 2019 | John Mount

NPM (a popular Javascript package repository) just banned package advertisements. I feel the CRAN repository should do the same. Not all R-users are fully aware of package advertisements. But they clutter up work, interfere with reproducibility, and frankly are just wrong. Here is an example which could be considered to ... [Read more...]

Introducing data_algebra

August 26, 2019 | John Mount

This article introduces the data_algebra project: a data processing tool family available in R and Python. These tools are designed to transform data either in-memory or on remote databases. In particular we will discuss the Python implementation (also called data_algebra) and its relation to the mature R implementations (...
[Read more...]

What is vtreat?

August 14, 2019 | John Mount

vtreat is a DataFrame processor/conditioner that prepares real-world data for supervised machine learning or predictive modeling in a statistically sound manner. vtreat takes an input DataFrame that has a specified column called “the outcome variable” (or “y”) that is the quantity to be predicted (and must not have missing ...
[Read more...]

Speaking at BARUG

August 13, 2019 | John Mount

We will be speaking at the Tuesday, September 3, 2019 BARUG. If you are in the Bay Area, please come see us. Nina Zumel & John Mount Practical Data Science with R Practical Data Science with R (Zumel and Mount) was one of the first, and most widely-read books on the practice of ... [Read more...]

vtreat up on PyPi

August 11, 2019 | John Mount

I am excited to announce vtreat is now available for Python on PyPi, in addition for R on CRAN. vtreat is: A data.frame processor/conditioner that prepares real-world data for predictive modeling in a statistically sound manner. vtreat prepares variables so that data has fewer exceptional cases, making it ...
[Read more...]

Returning to Tides

August 10, 2019 | John Mount

Fred Viole shared a great “data only” R solution to the forecasting tides problem. The methodology comes from a finance perspective, and has some great associated notes and articles. This gives me a chance to comment on the odd relation between prediction and profit in finance. If there really was ...
[Read more...]

Lord Kelvin, Data Scientist

August 6, 2019 | John Mount

In 1876 A. Légé & Co., 20 Cross Street, Hatton Gardens, London completed the first “tide calculating machine” for William Thomson (later Lord Kelvin) (ref). Thomson’s (Lord Kelvin) First Tide Predicting Machine, 1876 The results were plotted on the paper cylinders, and one literally “turned the crank” to perform the calculations. The ...
[Read more...]

Some Notes on GNU Licenses in R Packages

July 30, 2019 | John Mount

I was recently asked if Win-Vector LLC would move the R wrapr package from a GPL-3 license to an LGPL license. In the end I decided to move wrapr distribution to a “GPL-2 | GPL-3” license. This means the package is now available under both GPL-2 and GPL-3 licensing, allowing the ...
[Read more...]
1 2 3 4 5 22

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)