# Articles by kjytay

### Two interesting facts about high-dimensional random projections

April 16, 2019 |

John Cook recently wrote an interesting blog post on random vectors and random projections. In the post, he states two surprising facts of high-dimensional geometry and gives some intuition for the second fact. In this post, I will provide R … Continue reading →

### The sinh-arcsinh normal distribution

April 15, 2019 |

This month’s issue of Significance magazine has a very nice summary article of the sinh-arcsinh normal distribution. (Unfortunately, the article seems to be behind a paywall.) This distribution was first introduced by Chris Jones and Arthur Pewsey in 2009 as … Continue reading →

### Testing numeric variables for NA/NaN/Inf

April 9, 2019 |

In R, a numeric variable is either a number (like 0, 42, or -3.14), or one of 4 special values: NA, NaN, Inf or -Inf. It can be hard to remember how the is.x functions treat each of the special … Continue reading → [Read more...]

### Many ways to do the same thing: linear regression

April 7, 2019 |

One feature of R (could be positive, could be negative) is that there are many ways to do the same thing. In this post, I list out the different ways we can get certain results from a linear regression model. … Continue reading → [Read more...]

### Plots within plots with ggplot2 and ggmap

February 23, 2019 |

Once in a while, you might find yourself wanting to embed one plot within another plot. ggplot2 makes this really easy with the annotation_custom function. The following example illustrates how you can achieve this. (For all the code in one … Continue reading →

### Quantile regression in R

January 31, 2019 |

Quantile regression: what is it? Let be some response variable of interest, and let be a vector of features or predictors that we want to use to model the response. In linear regression, we are trying to estimate the conditional … Continue reading →

### pcLasso: a new method for sparse regression

January 13, 2019 |

I’m excited to announce that my first package has been accepted to CRAN! The package pcLasso implements principal components lasso, a new method for sparse regression which I’ve developed with Rob Tibshirani and Jerry Friedman. In this post, I will … Continue reading →

### A deep dive into glmnet: offset

January 9, 2019 |

I’m writing a series of posts on various function options of the glmnet function (from the package of the same name), hoping to give more detail and insight beyond R’s documentation. In this post, we will look at the offset … Continue reading →

### Using emojis as scatterplot points

December 27, 2018 |

Recently I wanted to learn how to use emojis as points in a scatterplot points. It seems like the emojifont package is a popular way to do it. However, I couldn’t seem to get it to work on my machine … Continue reading →

### All the (NBA) box scores you ever wanted

December 18, 2018 |

In this previous post, I showed how one can scrape top-level NBA game data from BasketballReference.com. In the post after that, I demonstrated how to scrape play-by-play data for one game. After writing those posts, I thought to myself: why … Continue reading →

### Recreating the NBA lead tracker graphic

December 13, 2018 |

For each NBA game, nba.com has a really nice graphic which tracks the point differential between the two teams throughout the game. Here is the lead tracker graphic for the game between the LA Clippers and the Phoenix Suns on … Continue reading →

### Scraping NBA game data from basketball-reference.com

December 11, 2018 |

I’m a casual NBA fan: I don’t have time to watch the games but enjoy viewing the highlights on Instagram/Youtube (especially Shaqtin’ A Fool!); I sometimes read game articles and analyses (e.g. Blogtable). Apart from the game being an amazing … Continue reading →

### A deep dive into glmnet: standardize

November 15, 2018 |

I’m writing a series of posts on various function options of the glmnet function (from the package of the same name), hoping to give more detail and insight beyond R’s documentation. In this post, we will focus on the standardize … Continue reading →

### A deep dive into glmnet: penalty.factor

November 13, 2018 |

The glmnet function (from the package of the same name) is probably the most used function for fitting the elastic net model in R. (It also fits the lasso and ridge regression, since they are special cases of elastic net.) … Continue reading →

### Getting started Stamen maps with ggmap

October 25, 2018 |

Spatial visualizations really come to life when you have a real map as a background. In R, ggmap is the package that you’ll want to use to get these maps. In what follows, we’ll demonstrate how to use ggmap with … Continue reading →

### Obtaining the number of components from cross validation of principal components regression

October 14, 2018 |

Principal components (PC) regression is a common dimensionality reduction technique in supervised learning. The R lab for PC regression in James et al.’s Introduction to Statistical Learning is a popular intro for how to perform PC regression in R: it is … Continue reading →

### Subsetting in the presence of NAs

October 6, 2018 |

In R, we can subset a data frame df easily by putting the conditional in square brackets after df. For example, if I want all the rows in df which have value equal to 1 in the column colA, all … Continue reading → [Read more...]

### Different winners under different criteria

August 21, 2018 |

A few posts ago (see here), I noted that there was a group of 7 teams in the English Premier League (EPL) that seem to be a cut above the rest: Arsenal Chelsea Everton Liverpool Manchester City Manchester United Tottenham … Continue reading →

### Exploring point distribution in the English Premier League

August 17, 2018 |

I recently got a hold of table standings for the English Premier League (EPL) for the past 10 years. In this post, I want to explore the question: How similar are the point distributions across seasons? Code for the figures … Continue reading →