### Detecting outlier samples in PCA

August 21, 2019 |

In this post, I present something I am currently investigating (feedback welcome!) and that I am implementing in my new package {bigutilsr}. This package can be used to detect outlier samples in Principal Component Analysis (PCA). remotes::install_github("privefl/bigutilsr") library(bigutilsr) I present three different statistics of outlierness ...

### Using clustering to find points in an image

November 26, 2018 |

In this post, I present my new package {img2coord}. This package can be used to retrieve coordinates from a scatter plot (as an image). devtools::install_github("privefl/img2coord") Have you ever made a plot, saved it as a png and moved on? When you come back to ...

### Choosing hyper-parameters in penalized regression

November 22, 2018 |

In this post, I’m evaluating some ways of choosing hyper-parameters ($$\alpha$$ and $$\lambda$$) in penalized linear regression. The same principles can be applied to other types of penalized regresions (e.g. logistic). Model In penalized linear regression, we find regression coefficients $$\hat{\beta}_0$$ and $$\hat{\beta}$$ that minimize the ...

### Predicting height based on DNA mutations

October 7, 2018 |

In this post, I show some results of predicting height based on DNA mutations. This analysis aims at reproducing the analysis of this paper using my own analysis tools in. I use a new dataset composed of 500,000 adults from UK, and genotyped over hund...

### Fast R functions to get first principal components

August 29, 2018 |

In this post, I compare different approaches to get first principal components of large matrices in R. Comparison library(bigstatsr) library(tidyverse) Data # Create two matrices, one with some structure, one without n

### Whether to use a data frame in R?

July 19, 2018 |

In this post, I try to show you in which situations using a data frame is appropriate, and in which it’s not. Learn more with the Advanced R book. What is a data frame? A data frame is just a list of vectors of the same length, each vector ... [Read more...]

### Why I rarely use apply

July 13, 2018 |

In this short post, I talk about why I’m moving away from using function apply. With matrices It’s okay to use apply with a dense matrix, although you can often use an equivalent that is faster. N

### One year as a subscriber to Stack Overflow

July 1, 2018 |

In this post, I follow up on a previous post describing how last year in July, I spent one month mostly procrastinating on Stack Overflow (SO). We’re already in July so it’s time to get back to one year of activity on Stack Overflow. Am I still as ...

### Why loops are slow in R

June 10, 2018 |

In this post, I talk about loops in R, why they can be slow and when it is okay to use them. Don’t grow objects Let us generate a matrix of uniform values (max changing for every column). gen_grow

### Performance: when algorithmics meets mathematics

April 18, 2018 |

In this post, I talk about performance through an efficient algorithm I developed for finding closest points on a map. This algorithm uses both concepts from mathematics and algorithmics. Problem to solve This problem comes from a recent question on StackOverflow. I have two matrices, one is 200K rows long, ...

### Teaching an advanced R course

March 28, 2018 |

In this post, I come back to my first experience teaching an advanced R course over the past month. Content This course was programmed for 10 sessions (3 hours each) and I initially wanted to talk about the following subjects: R programming and g...

### Shiny App for making Pixel Art Models

November 15, 2017 |

Last weekend, I discovered the pixel art. The goal is to reproduce a pixelated drawing. Anyone can do this without any drawing skills because you just have to reproduce the pixels one by one (on a squared paper). Kids and big kids can quickly become addicted to this. Example For ...

### Grenoble RUG: 2nd working session, ggplot2

October 24, 2017 |

The slides are available there. For example, you’ll learn

### Grenoble RUG: first working session

October 1, 2017 |

In this post, I will talk about the organisation of our R User Group (RUG) in Grenoble and our first working session. Organisation Each month, we have a working session of 2 hours. The first hour is dedicated to a presentation/tutorial (you can see t...

### Scraping some French medical school rankings

September 9, 2017 |

In this post, I will analyze the results of the “épreuves classantes nationales (ECN)”, which is a competitive examination at the end of the 6th year of medical school in France. First ones get to choose first where they want to continue their medical training. A very clean dataset The ...

### A guide to parallelism in R

September 4, 2017 |

In this post, I will talk about parallelism in R. This post will likely be biased towards the solutions I use. For example, I never use mcapply nor clusterApply. I prefer to always use foreach. In this post, we will focus on how to parallelize R code on your computer. ... [Read more...]

### One month as a procrastinator on Stack Overflow

July 26, 2017 |

Hello everyone, I’m 6103040 aka F. Privé. In this post, I will give some insights about answering questions on Stack Overflow (SO) for a month. One of the reason I’ve began frenetically answering questions on Stack Overflow was to procrastinate while finishing a scientific manuscript. My activity on Stack ...

### Package bigstatsr: Statistics with matrices on disk (useR 2017)

July 20, 2017 |

In this post, I will talk about my package bigstatsr, which I’ve just presented in a lightning talk of 5 minutes at useR!2017. You can listen to me in action there. I should have chosen a longer talk to explain more about this package, maybe next time. I will use ...

### (Linear Algebra) Do not scale your matrix

June 2, 2017 |

In this post, I will show you that you generally don’t need to explicitly scale a matrix. Maybe you wanted to know more about WHY matrices should be scaled when doing linear algebra. I will remind about that in the beginning but the rest will focus on HOW to ...

### Tip: Optimize your Rcpp loops

December 28, 2016 |

In this post, I will show you how to optimize your Rcpp loops so that they are 2 to 3 times faster than a standard implementation. Context Real data example For this post, I will use a big.matrix which represents genotypes for 15,283 individuals, corresponding to the number of mutations (0, 1 or 2) at 287,155 ...
