Articles by John Mount

R Tip: Force Named Arguments

February 22, 2018 | John Mount

R tip: force the use of named arguments when designing function signatures. R’s named function argument binding is a great aid in writing correct programs. It is a good idea, if practical, to force optional arguments to only be usable by name. To do this declare the additional arguments ... [Read more...]

R Tip: Use [[ ]] Wherever You Can

February 21, 2018 | John Mount

R tip: use [[ ]] wherever you can. In R the [[ ]] is the operator that (when supplied a scalar argument) pulls a single element out of lists (and the [ ] operator pulls out sub-lists). For vectors [[ ]] and [ ] appear to be synonyms. However, when writing reusable code you may … Continue reading R Tip: Use [[ ]] ... [Read more...]

R Tip: Use seq_len() to Avoid The Backwards Sequence Bug

February 19, 2018 | John Mount

Another R tip. Use seq_len() to avoid The backwards seqeunce bug. Many R users use the “colon sequence” notation to build sequences. For example: for(i in 1:5) { print(paste(i, i*i)) } #__ [1] "1 1" #__ [1] "2 4" #__ [1] "3 9" #__ [1] "4 16" However, the colon notation can be unsafe as … Continue reading R Tip: Use seq_len() to Avoid ... [Read more...]

R Tip: Use qc() For Fast Legible Quoting

February 17, 2018 | John Mount

Here is an R tip. Need to quote a lot of names at once? Use qc(). This is particularly useful in selecting columns from data.frames: library("wrapr") # get qc() definition head(mtcars[, qc(mpg, cyl, wt)]) # mpg cyl wt # Mazda RX4 21.0 6 2.620 # Mazda RX4 Wag 21.0 6 2.875 # Datsun 710 22.8 … Continue reading R Tip: ... [Read more...]

Is 10,000 Cells Big?

February 12, 2018 | John Mount

Trick question: is a 10,000 cell numeric data.frame big or small? In the era of "big data" 10,000 cells is minuscule. Such data could be fit on fewer than 1,000 punched cards (or less than half a box). The joking answer is: it is small when they are selling you the system, ...

[Read more...]

Why No Exact Permutation Tests at Scale?

February 1, 2018 | John Mount

Here at Win-Vector LLC we like permutation tests. Our team has written on them (for example: How Do You Know if Your Data Has Signal?) and they are used to estimate significances in our sigr and WVPlots R packages. For example permutation methods are used to estimate the significance reported ...

[Read more...]

Supercharge your R code with wrapr

January 27, 2018 | John Mount

I would like to demonstrate some helpful wrapr R notation tools that really neaten up your R code. Img: Christopher Ziemnowicz. Named Map Builder First I will demonstrate wrapr‘s "named map builder": :=. The named map builder adds names to vectors and lists by nice "names on the left and ...

[Read more...]

Latest vtreat up on CRAN

January 24, 2018 | John Mount

There is a new version of the R package vtreat now up on CRAN. vtreat is an essential data preparation system for predictive modeling that helps defend your predictive modeling work against real world data issues including: High cardinality categorical variables Rare levels (including new or novel levels during application) ... [Read more...]

Advisory on Multiple Assignment dplyr::mutate() on Databases

January 21, 2018 | John Mount

I currently advise R dplyr users to take care when using multiple assignment dplyr::mutate() commands on databases. (image: Kingroyos, Creative Commons Attribution-Share Alike 3.0 Unported License) In this note I exhibit a troublesome example, and a systematic solution. First let’s set up dplyr, our database, and some example data. ...

[Read more...]

Data Reshaping with cdata

January 17, 2018 | John Mount

I’ve just shared a short webcast on data reshaping in R using the cdata package. (link) We also have two really nifty articles on the theory and methods: Fluid data reshaping with cdata Coordinatized Data: A Fluid Data Specification Please give it a try! This is the material I ...

[Read more...]

Base R can be Fast

January 15, 2018 | John Mount

“Base R” (call it “Pure R”, “Good Old R”, just don’t call it “Old R” or late for dinner) can be fast for in-memory tasks. This is despite the commonly repeated claim that: “packages written in C/C++ are faster than R code.” The benchmark results of “rquery: Fast ...

[Read more...]

Setting up RStudio Server quickly on Amazon EC2

January 13, 2018 | John Mount

I have recently been working on projects using Amazon EC2 (elastic compute cloud), and RStudio Server. I thought I would share some of my working notes. Amazon EC2 supplies near instant access to on-demand disposable computing in a variety of sizes (billed in hours). RStudio Server supplies an interactive user ...

[Read more...]

rquery: Fast Data Manipulation in R

January 9, 2018 | John Mount

Win-Vector LLC recently announced the rquery R package, an operator based query generator. In this note I want to share some exciting and favorable initial rquery benchmark timings. Let’s take a look at rquery’s new “ad hoc” mode (made convenient through wrapr‘s new “wrapr_applicable” feature). This ...

[Read more...]

New wrapr R pipeline feature: wrapr_applicable

January 6, 2018 | John Mount

The R package wrapr now has a neat new feature: “wrapr_applicable”. This feature allows objects to declare a surrogate function to stand in for the object in wrapr pipelines. It is a powerful technique and allowed us to quickly implement a convenient new ad hoc query mode for rquery. ...

[Read more...]

Big cdata News

January 4, 2018 | John Mount

I have some big news about our R package cdata. We have greatly improved the calling interface and Nina Zumel has just written the definitive introduction to cdata. cdata is our general coordinatized data tool. It is what powers the deep learning performance graph (here demonstrated with R and Keras) ...

[Read more...]

Announcing rquery

December 28, 2017 | John Mount

We are excited to announce the rquery R package. rquery is Win-Vector LLC‘s currently in development big data query tool for R. rquery supplies set of operators inspired by Edgar F. Codd‘s relational algebra (updated to reflect lessons learned from working with R, SQL, and dplyr at big ... [Read more...]

Plotting Deep Learning Model Performance Trajectories

December 23, 2017 | John Mount

I am excited to share a new deep learning model performance trajectory graph. Here is an example produced based on Keras in R using ggplot2: The ideas include: We plot model performance as a function of training epoch, data set (training and validation), and metric. For legibility we facet on ...

[Read more...]

How to Greatly Speed Up Your Spark Queries

December 20, 2017 | John Mount

For some time we have been teaching R users "when working with wide tables on Spark or on databases: narrow to the columns you really want to work with early in your analysis." The idea behind the advice is: working with fewer columns makes for quicker queries. photo: Jacques Henri ...

[Read more...]

More Pipes in R

December 16, 2017 | John Mount

Was enjoying Gabriel’s article Pipes in R Tutorial For Beginners and wanted call attention to a few more pipes in R (not all for beginners). data.table has essentially used the square bracket sequence “][” in a manner equivalent to piping in R since about 2006. Here is an example. The ...

[Read more...]

Getting started with seplyr

December 14, 2017 | John Mount

A big “thank you!!!” to Microsoft for hosting our new introduction to seplyr. If you are working R and big data I think the seplyr package can be a valuable tool. For how and why, please check out our new introductory article.

[Read more...]

« 1 … 10 11 12 13 14 … 24 »

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Articles by John Mount

R Tip: Force Named Arguments

R Tip: Use [[ ]] Wherever You Can

R Tip: Use seq_len() to Avoid The Backwards Sequence Bug

R Tip: Use qc() For Fast Legible Quoting

Is 10,000 Cells Big?

Why No Exact Permutation Tests at Scale?

Supercharge your R code with wrapr

Latest vtreat up on CRAN

Advisory on Multiple Assignment dplyr::mutate() on Databases

Data Reshaping with cdata

Base R can be Fast

Setting up RStudio Server quickly on Amazon EC2

rquery: Fast Data Manipulation in R

New wrapr R pipeline feature: wrapr_applicable

Big cdata News

Announcing rquery

Plotting Deep Learning Model Performance Trajectories

How to Greatly Speed Up Your Spark Queries

More Pipes in R

Getting started with seplyr

Articles by John Mount

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)