# Articles by The Clerk

### Who named their kid Daenerys or Khaleesi?

May 13, 2019 |

The Game of Thrones character, Daenerys Targaryen, suffered a hit to her reputation in the final season of the much-watched HBO series. For the past few years, she has been a crowd favorite--combining brave determination, style, and dragon(s). Many people went a step further in their admiration and named ...

### Why are Racing Drivers Born on March 23

August 31, 2017 |

Wikipedia has pages for each day of the year (e.g., January 1, April 25). Each page contains a list of names of famous people with that birthday along with a short description of each person. I wrote an R script to scrape these lists from each wikipedi...

### Election Results vs. Benford’s Law and the Return of City-States?

December 1, 2016 |

From Wikipedia:  Benford's law, also called the first-digit law, is an observation about the frequency distribution of leading digits in many real-life sets of numerical data. The law states that in many naturally occurring collections of numbers, the leading significant digit is likely to be small... Benford's Law is a ...

### The Simpsons as a Chart

March 26, 2016 |

Inspired by this clever image, I thought I would whip it up in R.Results:Below is the R code: # Prepare ----------------------------------------------------------------- rm(list=ls());gc() pkg

### The Simpsons as a Chart

March 26, 2016 |

Inspired by this clever image, I thought I would whip it up in R.Results:Below is the R code: 1: # Prepare ----------------------------------------------------------------- 2: rm(list=ls());gc() 3: pkg

### Visualing High Dimensions as DNA Strands

January 10, 2016 |

For a community project, I needed to research which U.S. cities were most similar to mine. The U.S. census has some wonderful data that covers 1,579 statistical areas, using the Office of Management & Budget's definition.With this data, I selected ... [Read more...]

### Visualing High Dimensions as DNA Strands

January 10, 2016 |

For a community project, I needed to research which U.S. cities were most similar to mine. The U.S. census has some wonderful data that covers 1,579 statistical areas, using the Office of Management & Budget's definition.With this data, I selected the relevant attributes and then calculated the root mean ... [Read more...]

### What Does the AVERAGE Brand Logo Look Like?

October 23, 2015 |

PNG images are essentially a grid of values that represent colors to display. Since each cell in the grid is made up of numbers, I got curious about what it might mean to aggregate multiple PNGs. What would it look like to average two or more images? Median? Mode? Random?... [Read more...]

### What Does the AVERAGE Brand Logo Look Like?

October 23, 2015 |

PNG images are essentially a grid of values that represent colors to display. Since each cell in the grid is made up of numbers, I got curious about what it might mean to aggregate multiple PNGs. What would it look like to average two or more images? Median?To do ... [Read more...]

### cuRve stitching

September 15, 2015 |

Remember curve stitching from grade school? It makes for a nice tutorial for working with some common R functionality.Here's an example of how to create the appearance of a parabola from plotting a series of straight lines:pkg [Read more...]

### Top 2 Packages for Newly Hired Data Scientists

July 9, 2015 |

library(NewCo knowledge)function (X, FUN, ..., ) {FUN [Read more...]

### Finding Similar European Soccer Clubs (with R & Shiny)

March 17, 2015 |

Are you a die-hard supporter of one European soccer (football) team (club)? Having a rough season, or just want to watch more matches with passion?This European Team Finder analyzed 126 attributes of the top-flight teams in the marquee n... [Read more...]

### Tableau 9.0 Connects Directly to R Data Files

March 11, 2015 |

Tableau 9.0 will be released soon.Tableau 8 already integrates with some R functionality, but 9.0 actually allows direct connection to R data files.Tableau continues to remove friction between itself and R, further justifying its superior Gartner ... [Read more...]

### R’s Tricky == Operator, or "It depends on what the meaning of the word ‘is’ is"

February 11, 2015 |

One scenario where R can trip up a programmer is when using the == operator or its relatives. The help page notes that "NA values are regarded as non-comparable", which introduces some potentially unexpected behavior.As a toy example, look what happens... [Read more...]

### First Day of the Month, Using R

December 29, 2014 |

Future-proofing is an important concept when designing automated reports. One thing that can get out of hand over time is when you accumulate so many periods of data that your charts start to look overcrowded. You can solve for this by limiting the num... [Read more...]

### FIFA 15 Analysis with R

September 26, 2014 |

Several months ago, I used R to analyze professional soccer players based on their attributes from the video game, FIFA14. Now that FIFA15 is upon us, let's take a similar look.FIFA 15 is a video game by EA Sports that mimics the experience of managing and playing for a soccer ... [Read more...]

### A Look at Random Seeds in R… Or: “85, why can’t you be more like 548?”

August 17, 2014 |

Have you ever wondered whether the set.seed() function in R has any quirkiness? This analysis was inspired by a Stack Overflow posting by Wolfgang and I incorporate some of his code.For each seed (1-1000, for this analysis), I took the mean and standard deviation of the first 1,000 random ... [Read more...]

### R is short for SSIS

May 18, 2014 |

R is Short for SSIS Data scientists often identify a need to join data from different, unlinked servers. One standard tool for accomplishing this is an SSIS package to consolidate the data onto one of the servers. For the analyst who wants to keep everything in one file for simplicity ... [Read more...]

### Assign n Email Addresses to x Cells, Intrinsically (Part II)

March 27, 2014 |

Part I showed the concept and general technique of a method of assigning n email addresses to x cells pseudo-randomly, without the need for maintaining a log of each assignment.The earlier post considered the basic case of each cell being assigned approximately the same quantity of email addresses. In ... [Read more...]

### Assign n Email Addresses to x Cells, Intrinsically

March 5, 2014 |

Assign n Email Addresses to x Cells, Intrinsically Assign n Email Addresses to x Cells, IntrinsicallySample Use Case:Marketing requests that an email address list be divided randomly into a given number of cells so that each cell would receive a different version of copy. Below is a technique that ... [Read more...]
1 2