Articles by John Mount

Lord Kelvin, Data Scientist

August 6, 2019 | John Mount

In 1876 A. Légé & Co., 20 Cross Street, Hatton Gardens, London completed the first “tide calculating machine” for William Thomson (later Lord Kelvin) (ref). Thomson’s (Lord Kelvin) First Tide Predicting Machine, 1876 The results were plotted on the paper cylinders, and one literally “turned the crank” to perform the calculations. The ...
[Read more...]

Some Notes on GNU Licenses in R Packages

July 30, 2019 | John Mount

I was recently asked if Win-Vector LLC would move the R wrapr package from a GPL-3 license to an LGPL license. In the end I decided to move wrapr distribution to a “GPL-2 | GPL-3” license. This means the package is now available under both GPL-2 and GPL-3 licensing, allowing the ...
[Read more...]

A Kind Note That We Really Appreciate

July 25, 2019 | John Mount

The following really made my day. I tell every data scientist I know about vtreat and urge them to read the paper. Jason Wolosonovich Jason, thanks for your support and thank you so much for taking the time to say this (and for your permission to quote you on this). ... [Read more...]

R Books Discount!

July 21, 2019 | John Mount

We, the community of Manning R and data science authors, have talked Manning into offering a catalog-wide 40% discount on all books. Please take a look at some great deals on some great technical books here: http://mng.bz/adRj !
[Read more...]

Big News: Porting vtreat to Python

July 20, 2019 | John Mount

We at Win-Vector LLC have some big news. We are finally porting a streamlined version of our R vtreat variable preparation package to Python. vtreat is a great system for preparing messy data for suprevised machine learning. The new implementation is based on Pandas, and we are experimenting with pushing ... [Read more...]

Some Details on Running xgboost

July 14, 2019 | John Mount

While reading Dr. Nina Zumel’s excellent note on bias in common ensemble methods, I ran the examples to see the effects she described (and I think it is very important that she is establishing the issue, prior to discussing mitigation). In doing that I ran into one more avoidable ... [Read more...]

Programming Over lm() in R

July 6, 2019 | John Mount

Here is simple modeling problem in R. We want to fit a linear model where the names of the data columns carrying the outcome to predict (y), the explanatory variables (x1, x2), and per-example row weights (wt) are given to us as strings. Lets start with our example data and ... [Read more...]

Replicating a Linear Model

July 3, 2019 | John Mount

For a few of my commercial projects I have been in the seemingly strange place being asked to port a linear model from one data science system to another. Now I try to emphasize that it is better going forward to port procedures and build new models with training data. ... [Read more...]

My Favorite data.table Feature

June 29, 2019 | John Mount

My favorite R data.table feature is the “by” grouping notation when combined with the := notation. Let’s take a look at this powerful notation. First, let’s build an example data.frame. d [Read more...]

data.table is Much Better Than You Have Been Told

June 26, 2019 | John Mount

There is interest in converting relational query languages (that work both over SQL databases and on local data) into data.table commands, to take advantage of data.table‘s superior performance. Obviously if one wants to use data.table it is best to learn data.table. But if we want ...
[Read more...]

Technical books are amazing opportunities

June 6, 2019 | John Mount

Nina and I have been sending out drafts of our book Practical Data Science with R 2nd Edition for technical review. A few of the reviews came back from reviewers that described themselves with variations of: Senior Business Analyst for COMPANYNAME. I have been involved in presenting graphs of data ...
[Read more...]

Practical Data Science with R, half off sale!

May 24, 2019 | John Mount

Our publisher, Manning, is running a Memorial Day sale this weekend (May 24-27, 2019), with a new offer every day. Fri: Half off all eBooks Sat: Half off all MEAPs Sun: Half off all pBooks and liveVideos Mon: Half off everything The discount code is: wm052419au. Many great opportunities to ...
[Read more...]

What is “Tidy Data”?

May 11, 2019 | John Mount

I would like to write a bit on the meaning and history of the phrase “tidy data.” Hadley Wickham has been promoting the term “tidy data.” For example in an eponymous paper, he wrote: In tidy data: Each variable forms a column. Each observation forms a row. Each type of ... [Read more...]

Could not Resist

April 29, 2019 | John Mount

Also, Practical Data Science with R, 2nd Edition; Zumel, Mount; Manning 2019 is now content complete! It is deep into editing and soon into production!
[Read more...]

Data Layout Exercises

April 27, 2019 | John Mount

John Mount, Nina Zumel; Win-Vector LLC 2019-04-27 In this note we will use five real life examples to demonstrate data layout transforms using the cdata R package. The examples for this note are all demo-examples from tidyr/demo/, and are mostly based on questions posted to StackOverflow. They represent ... [Read more...]

Practical Data Science with R Book Update (April 2019)

April 22, 2019 | John Mount

I thought I would give a personal update on our book: Practical Data Science with R 2nd edition; Zumel, Mount; Manning 2019. The second edition should be fully available this fall! Nina and I have finished up through chapter 10 (of 12), and Manning has released previews of up through chapter 7 (with more ... [Read more...]
1 2 3 4 5 6 22

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)