Articles by John Mount

R Tip: Think in Terms of Values

April 2, 2018 | John Mount

R tip: first organize your tasks in terms of data, values, and desired transformation of values, not initially in terms of concrete functions or code. I know I write a lot about coding in R. But it is in the service of supporting statistics, analysis, predictive analytics, and data science. ... [Read more...]

R Tip: Use Named Vectors to Re-Map Values

March 28, 2018 | John Mount

Here is an R tip. Want to re-map a column of values? Use a named vector as the mapping. Example: library("dplyr") library("wrapr") head(starwars[, qc(name, gender)]) # # A tibble: 6 x 2 # name gender # # 1 Luke Skywalker male # 2 C-3PO NA # 3 R2-D2 NA # 4 Darth Vader … Continue reading R Tip: Use ... [Read more...]

R Tip: Use let() to Re-Map Names

March 26, 2018 | John Mount

Another R tip. Need to replace a name in some R code or make R code re-usable? Use wrapr::let(). Here is an example involving dplyr. Let’s look at some example data: library("dplyr") library("wrapr") starwars %__% select(., name, homeworld, species) %__% head(.) # # A tibble: 6 x 3 # name homeworld species # … Continue ...
[Read more...]

R Tip: Break up Function Nesting for Legibility

March 21, 2018 | John Mount

There are a number of easy ways to avoid illegible code nesting problems in R. In this R tip we will expand upon the above statement with a simple example. At some point it becomes illegible and undesirable to compose operations by nesting them, such as in the following code. ...
[Read more...]

R Tip: Use stringsAsFactors = FALSE

March 17, 2018 | John Mount

R tip: use stringsAsFactors = FALSE. R often uses a concept of factors to re-encode strings. This can be too early and too aggressive. Sometimes a string is just a string. Sigmund Freud, it is often claimed, said: “Sometimes a cigar is just a cigar.” To avoid problems delay re-encoding of ...
[Read more...]

Take Care If Trying the RPostgres Package

March 16, 2018 | John Mount

Take care if trying the new RPostgres database connection package. By default it returns some non-standard types that code developed against other database drivers may not expect, and may not be ready to defend against. Danger, Will Robinson! Trying the new package One can try the newer RPostgres as a ... [Read more...]

The Many Faces of R

March 14, 2018 | John Mount

Some days I see R as an eclectic programming language preferred by scientists. “Programming languages as people.” From Leftover Salad (David Marino). Other days I see it more like the following. “Statistical tools as cars.” Shared previously by Darren L. Dhaly and brought to my attention by post by Dimitri ...
[Read more...]

R Tip: Use the vtreat Package For Data Preparation

March 11, 2018 | John Mount

If you are working with predictive modeling or machine learning in R this is the R tip that is going to save you the most time and deliver the biggest improvement in your results. R Tip: Use the vtreat package for data preparation in predictive analytics and machine learning projects. ...
[Read more...]

R Tip: Use vector(mode = “list”) to Pre-Allocate Lists

March 6, 2018 | John Mount

Another R tip. Use vector(mode = "list") to pre-allocate lists. result [[1]] #__ NULL #__ #__ [[2]] #__ NULL #__ #__ [[3]] #__ NULL The above used to be critical for writing performant R code (R seems to have greatly improved incremental list growth over the years). It … Continue reading R Tip: Use vector(mode = "list") to Pre-Allocate Lists [Read more...]

R Tip: Get Out of the Habit of Calling View() Directly

March 4, 2018 | John Mount

R tip: get out of the habit of calling View() directly. View() only works correctly in interactive environments, not currently in RMarkdown contexts. It is better to call something else that safely dispatches to View(), or to something else depending if you are in an interactive or non-interactive session. The ... [Read more...]

Speaking on New Tools for R at Big Data Scale

March 3, 2018 | John Mount

I would like to thank LinkedIn for letting me speak with some of their data scientists and analysts. John Mount discussing rquery SQL generation at LinkedIn. If you have a group using R at database or Spark scale, please reach out ( jmount at win-vector.com ). We (Win-Vector LLC) have some ...
[Read more...]

R Tip: Use drop = FALSE with data.frames

February 27, 2018 | John Mount

Another R tip. Get in the habit of using drop = FALSE when indexing (using [ , ] on) data.frames. Prince Rupert’s drops (img: Wikimedia Commons) In R, single column data.frames are often converted to vectors when manipulated. For example: d x #__ 1 1 #__ 2 2 #__ … Continue reading R Tip: Use drop = FALSE with data....
[Read more...]

Wanted: cdata Test Pilots

February 25, 2018 | John Mount

I need a few volunteers to please “test pilot” the development version of the R package cdata, please. Jacqueline Cochran: at the time of her death, no other pilot held more speed, distance, or altitude records in aviation history than Cochran. Our cdata package has an upcoming new feature called “...
[Read more...]

Is R base::subset() really that bad?

February 23, 2018 | John Mount

Is R base::subset() really that bad? Notes discussing subset() often refer to the following text (from help(subset), referred to in examples: 1, 2): Warning This is a convenience function intended for use interactively. For programming it is better to use the standard sub-setting functions like [, and in particular the non-standard ...
[Read more...]

R Tip: Force Named Arguments

February 22, 2018 | John Mount

R tip: force the use of named arguments when designing function signatures. R’s named function argument binding is a great aid in writing correct programs. It is a good idea, if practical, to force optional arguments to only be usable by name. To do this declare the additional arguments ... [Read more...]

R Tip: Use [[ ]] Wherever You Can

February 21, 2018 | John Mount

R tip: use [[ ]] wherever you can. In R the [[ ]] is the operator that (when supplied a scalar argument) pulls a single element out of lists (and the [ ] operator pulls out sub-lists). For vectors [[ ]] and [ ] appear to be synonyms. However, when writing reusable code you may … Continue reading R Tip: Use [[ ]] ... [Read more...]

R Tip: Use seq_len() to Avoid The Backwards Sequence Bug

February 19, 2018 | John Mount

Another R tip. Use seq_len() to avoid The backwards seqeunce bug. Many R users use the “colon sequence” notation to build sequences. For example: for(i in 1:5) { print(paste(i, i*i)) } #__ [1] "1 1" #__ [1] "2 4" #__ [1] "3 9" #__ [1] "4 16" However, the colon notation can be unsafe as … Continue reading R Tip: Use seq_len() to Avoid ... [Read more...]
1 8 9 10 11 12 22

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)