Articles by John Mount

Touching the 3rd Rail of Data Science: “R or Python?”

December 13, 2022 | John Mount

I’ve been seeing a lot of hot takes on if one should do data science in R or in Python. I’ll comment generally on the topic, and then add my own myopic gear-head micro benchmark. I’ll jump in: If learning the language is the big step: then ... [Read more...]

Y-Aware PCA

September 8, 2022 | John Mount

We have had some trouble with some articles being damaged or hard to access in the Win Vector blog. I (John Mount) do want to apologize for that. In particular the graphs are missing for Dr. Nina Zumel’s wonderful y-aware Pricipal Components regression series. The complete R .md and .... [Read more...]

Separating Code from Presentation in Jupyter Notebooks

April 30, 2022 | John Mount

One of the great conveniences of performing a data science style analysis using Jupyter is that Jupyter notebooks are literate containers that combine code, text, results, and graphs. This is also one of the pain points in working with Jupyter notebooks with partners or with source control. That is: Jupyter […]

[Read more...]

Working in CRAN’s World

February 28, 2022 | John Mount

Part of the deal of having a package up on CRAN is: at any time one may be sent an automated email like the following. Dear maintainer, Please see the problems shown on URL. Please correct before TODAY+14DAYS to safely retain your package on CRAN. The CRAN Team If ... [Read more...]

How to Read Sourav Chatterjee’s Basic XICOR Definition

December 26, 2021 | John Mount

Introduction Professor Sourav Chatterjee recently published a new coefficient of correlation called XICOR (refs: JASA, R package, Arxiv, Hacker News, and a Python package (different author)). The basic formula (in the tie-free case) is: Take X and Y as n-vectors of observations of random variable. Compute the ranks r(i) ... [Read more...]

Don’t Feel Guilty About Selecting Variables

May 30, 2020 | John Mount

We have an exciting new article to share: Don’t Feel Guilty About Selecting Variables. If you are at all interested in the probabilistic justification of important data science techniques, such as variable selection or pruning, this should be an informative and fun read. “Data Science” is often criticized with ... [Read more...]

Data engineering and data shaping in Practical Data Science with R 2nd Edition

May 23, 2020 | John Mount

A kind reader recently shared the following comment on the Practical Data Science with R 2nd Edition live-site. Thanks for the chapter on data frames and data.tables. It has helped me overcome an obstacle freeing me from a lot of warnings telling me my data table was not a ... [Read more...]

General Data Science Means Cross-Language Tools, Training, and Documentation

May 18, 2020 | John Mount

Data science is often a case of brining the tools to the problems and data, instead of insisting on bringing the problems and data to the tools. To support cross-language data science we have been working on cross-language tools, documentation, and training. For example: vtreat data preparation package for supervised ... [Read more...]

Deal of the Day May 10: Half off Practical Data Science with R, Second Editio

May 9, 2020 | John Mount

Deal of the Day May 10: Half off Practical Data Science with R, Second Edition. Use code dotd051020au at https://bit.ly/2xLRPCk

[Read more...]

Thank you “Why R?” for Being Awesome Hosts

May 7, 2020 | John Mount

Thank you very much Why R? for being awesome hosts. We are really pleased with how your virtual MeetUp went. For those who missed it here is a link. [Read more...]

Thank you “Why R?” for Being Awesome Hosts

May 7, 2020 | John Mount

Thank you very much Why R? for being awesome hosts. We are really pleased with how your virtual MeetUp went. For those who missed it here is a link. [Read more...]

Nina and John Speaking at Why R? Webinar Thursday, May 7, 2020

April 29, 2020 | John Mount

Nina Zumel and John Mount will be speaking on advanced data preparation for supervised machine learning at the Why R? Webinar Thursday, May 7, 2020. This is a 8pm in a GMT+2 timezone, which for us is 11AM Pacific Time. Hope to see you there!

[Read more...]

Discount on Manning Books, Including our own Practical Data Science with R 2nd Edition

April 6, 2020 | John Mount

We have a discount on Manning Books, including our own Practical Data Science with R 2nd Edition! Manning.com is offering FREE shipping with code SHIP35 for US residents only. Use this link to link to purchase http://www.manning.com/?a_aid=zm. And, Manning.com is offering 50% off ...

[Read more...]

R Tip: How To Look Up Matrix Values Quickly

March 30, 2020 | John Mount

R is a powerful data science language because, like Matlab, numpy, and Pandas, it exposes vectorized operations. That is, a user can perform operations on hundreds (or even billions) of cells by merely specifying the operation on the column or vector of values. Of course, sometimes it takes a while ... [Read more...]

wrapr 2.0.0 up on CRAN

March 29, 2020 | John Mount

wrapr 2.0.0 is now up on CRAN. This means the := variant of unpack[] is now easy to install. Please give it a try! [Read more...]

Re-Share: vtreat Data Preparation Documentation and Video

March 26, 2020 | John Mount

I would like to re-share vtreat (R version, Python version) a data preparation documentation for machine learning tasks. vtreat is a system for preparing messy real world data for predictive modeling tasks (classification, regression, and so on). In particular it is very good at re-coding high-cardinality string-valued (or categorical) variables ... [Read more...]

Version Control is a Time Machine That Translates Common Hindsight Into Valuable Foresight

March 22, 2020 | John Mount

For data science projects I recommend using source control or version control, and committing changes at a very fine level of granularity. This means checking in possibly broken code, and the possibly weak commit messages (so when working in a shared project, you may want a private branch or second ...

[Read more...]

Free Coupon for our R Video Course: Introduction to Data Science

March 16, 2020 | John Mount

For all our remote learners, we are sharing a free coupon code for our R video course Introduction to Data Science. The code is ITDS2020, and can be used at this URL https://www.udemy.com/course/introduction-to-data-science/?couponCode=ITDS2020 . Pleas... [Read more...]

A Little Something From Practical Data Science with R Chapter 1

March 16, 2020 | John Mount

Here is a small quote from Practical Data Science with R Chapter 1. It is often too much to ask for the data scientist to become a domain expert. However, in all cases the data scientist must develop strong domain empathy to help define and solve the ... [Read more...]

Keep Calm and Use vtreat (in R and in Python)

March 12, 2020 | John Mount

A big thank you to Dmytro Perepolkin for sharing a “Keep Calm and Use vtreat” poster! Also, we have translated the Python vtreat steps from our recent “Cross-Methods are a Leak/Variance Trade-Off” article into R vtreat steps here. This R-port demonstrates the new to R fit/prepare notation! We ...

[Read more...]

« 1 2 3 4 … 24 »

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Articles by John Mount

Touching the 3rd Rail of Data Science: “R or Python?”

Y-Aware PCA

Separating Code from Presentation in Jupyter Notebooks

Working in CRAN’s World

How to Read Sourav Chatterjee’s Basic XICOR Definition

Don’t Feel Guilty About Selecting Variables

Data engineering and data shaping in Practical Data Science with R 2nd Edition

General Data Science Means Cross-Language Tools, Training, and Documentation

Deal of the Day May 10: Half off Practical Data Science with R, Second Editio

Thank you “Why R?” for Being Awesome Hosts

Thank you “Why R?” for Being Awesome Hosts

Nina and John Speaking at Why R? Webinar Thursday, May 7, 2020

Discount on Manning Books, Including our own Practical Data Science with R 2nd Edition

R Tip: How To Look Up Matrix Values Quickly

wrapr 2.0.0 up on CRAN

Re-Share: vtreat Data Preparation Documentation and Video

Version Control is a Time Machine That Translates Common Hindsight Into Valuable Foresight

Free Coupon for our R Video Course: Introduction to Data Science

A Little Something From Practical Data Science with R Chapter 1

Keep Calm and Use vtreat (in R and in Python)

Articles by John Mount

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)