Articles by John Mount

Speaking at BARUG

August 13, 2019 | John Mount

We will be speaking at the Tuesday, September 3, 2019 BARUG. If you are in the Bay Area, please come see us. Nina Zumel & John Mount Practical Data Science with R Practical Data Science with R (Zumel and Mount) was one of the first, and most widely-read books on the practice of ... [Read more...]

vtreat up on PyPi

August 11, 2019 | John Mount

I am excited to announce vtreat is now available for Python on PyPi, in addition for R on CRAN. vtreat is: A data.frame processor/conditioner that prepares real-world data for predictive modeling in a statistically sound manner. vtreat prepares variables so that data has fewer exceptional cases, making it ...

[Read more...]

Returning to Tides

August 10, 2019 | John Mount

Fred Viole shared a great “data only” R solution to the forecasting tides problem. The methodology comes from a finance perspective, and has some great associated notes and articles. This gives me a chance to comment on the odd relation between prediction and profit in finance. If there really was ...

[Read more...]

Lord Kelvin, Data Scientist

August 6, 2019 | John Mount

In 1876 A. Légé & Co., 20 Cross Street, Hatton Gardens, London completed the first “tide calculating machine” for William Thomson (later Lord Kelvin) (ref). Thomson’s (Lord Kelvin) First Tide Predicting Machine, 1876 The results were plotted on the paper cylinders, and one literally “turned the crank” to perform the calculations. The ...

[Read more...]

Some Notes on GNU Licenses in R Packages

July 30, 2019 | John Mount

I was recently asked if Win-Vector LLC would move the R wrapr package from a GPL-3 license to an LGPL license. In the end I decided to move wrapr distribution to a “GPL-2 | GPL-3” license. This means the package is now available under both GPL-2 and GPL-3 licensing, allowing the ...

[Read more...]

A Comment on Data Science Integrated Development Environments

July 27, 2019 | John Mount

A point that differs from our experience struck us in the recent note: A development environment specifically tailored to the data science sector on the level of RStudio, for example, does not (yet) exist. “What’s the Best Statistical Software? A Comparison of R, Python, SAS, SPSS and STATA” Amit ... [Read more...]

A Kind Note That We Really Appreciate

July 25, 2019 | John Mount

The following really made my day. I tell every data scientist I know about vtreat and urge them to read the paper. Jason Wolosonovich Jason, thanks for your support and thank you so much for taking the time to say this (and for your permission to quote you on this). ... [Read more...]

R Books Discount!

July 21, 2019 | John Mount

We, the community of Manning R and data science authors, have talked Manning into offering a catalog-wide 40% discount on all books. Please take a look at some great deals on some great technical books here: http://mng.bz/adRj !

[Read more...]

Big News: Porting vtreat to Python

July 20, 2019 | John Mount

We at Win-Vector LLC have some big news. We are finally porting a streamlined version of our R vtreat variable preparation package to Python. vtreat is a great system for preparing messy data for suprevised machine learning. The new implementation is based on Pandas, and we are experimenting with pushing ... [Read more...]

Some Details on Running xgboost

July 14, 2019 | John Mount

While reading Dr. Nina Zumel’s excellent note on bias in common ensemble methods, I ran the examples to see the effects she described (and I think it is very important that she is establishing the issue, prior to discussing mitigation). In doing that I ran into one more avoidable ... [Read more...]

Programming Over lm() in R

July 6, 2019 | John Mount

Here is simple modeling problem in R. We want to fit a linear model where the names of the data columns carrying the outcome to predict (y), the explanatory variables (x1, x2), and per-example row weights (wt) are given to us as strings. Lets start with our example data and ... [Read more...]

Replicating a Linear Model

July 3, 2019 | John Mount

For a few of my commercial projects I have been in the seemingly strange place being asked to port a linear model from one data science system to another. Now I try to emphasize that it is better going forward to port procedures and build new models with training data. ... [Read more...]

My Favorite data.table Feature

June 29, 2019 | John Mount

My favorite R data.table feature is the “by” grouping notation when combined with the := notation. Let’s take a look at this powerful notation. First, let’s build an example data.frame. d [Read more...]

data.table is Much Better Than You Have Been Told

June 26, 2019 | John Mount

There is interest in converting relational query languages (that work both over SQL databases and on local data) into data.table commands, to take advantage of data.table‘s superior performance. Obviously if one wants to use data.table it is best to learn data.table. But if we want ...

[Read more...]

Estimating Rates using Probability Theory: Chalk Talk

June 10, 2019 | John Mount

We are sharing a chalk talk rehearsal on applied probability. We use basic notions of probability theory to work through the estimation of sample size needed to reliably estimate event rates. This expands basic calculations, and then moves to the idea... [Read more...]

Technical books are amazing opportunities

June 6, 2019 | John Mount

Nina and I have been sending out drafts of our book Practical Data Science with R 2nd Edition for technical review. A few of the reviews came back from reviewers that described themselves with variations of: Senior Business Analyst for COMPANYNAME. I have been involved in presenting graphs of data ...

[Read more...]

Practical Data Science with R, half off sale!

May 24, 2019 | John Mount

Our publisher, Manning, is running a Memorial Day sale this weekend (May 24-27, 2019), with a new offer every day. Fri: Half off all eBooks Sat: Half off all MEAPs Sun: Half off all pBooks and liveVideos Mon: Half off everything The discount code is: wm052419au. Many great opportunities to ...

[Read more...]

Free Video Lecture: Vectors for Programmers and Data Scientists

May 20, 2019 | John Mount

We have just released two new free video lectures on vectors from a programmer’s point of view. I am experimenting with what ideas do programmers find interesting about vectors, what concepts do they consider safe starting points, and how to condense and present the material. Please check the lectures ...

[Read more...]

Timing Working With a Row or a Column from a data.frame

May 15, 2019 | John Mount

In this note we share a quick study timing how long it takes to perform some simple data manipulation tasks with R data.frames. We are interested in the time needed to select a column, alter a column, or select a row. Knowing what is fast and what is slow ...

[Read more...]

What is “Tidy Data”?

May 11, 2019 | John Mount

I would like to write a bit on the meaning and history of the phrase “tidy data.” Hadley Wickham has been promoting the term “tidy data.” For example in an eponymous paper, he wrote: In tidy data: Each variable forms a column. Each observation forms a row. Each type of ... [Read more...]

« 1 … 3 4 5 6 7 … 24 »

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Articles by John Mount

Speaking at BARUG

vtreat up on PyPi

Returning to Tides

Lord Kelvin, Data Scientist

Some Notes on GNU Licenses in R Packages

A Comment on Data Science Integrated Development Environments

A Kind Note That We Really Appreciate

R Books Discount!

Big News: Porting vtreat to Python

Some Details on Running xgboost

Programming Over lm() in R

Replicating a Linear Model

My Favorite data.table Feature

data.table is Much Better Than You Have Been Told

Estimating Rates using Probability Theory: Chalk Talk

Technical books are amazing opportunities

Practical Data Science with R, half off sale!

Free Video Lecture: Vectors for Programmers and Data Scientists

Timing Working With a Row or a Column from a data.frame

What is “Tidy Data”?

Articles by John Mount

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)