Articles by Florian Privé

Detecting outlier samples in PCA

August 21, 2019 | Florian Privé

In this post, I present something I am currently investigating (feedback welcome!) and that I am implementing in my new package {bigutilsr}. This package can be used to detect outlier samples in Principal Component Analysis (PCA). remotes::install_github("privefl/bigutilsr") library(bigutilsr) I present three different statistics of outlierness ...
[Read more...]

Using clustering to find points in an image

November 26, 2018 | Florian Privé

In this post, I present my new package {img2coord}. This package can be used to retrieve coordinates from a scatter plot (as an image). devtools::install_github("privefl/img2coord") Have you ever made a plot, saved it as a png and moved on? When you come back to ...
[Read more...]

Choosing hyper-parameters in penalized regression

November 22, 2018 | Florian Privé

In this post, I’m evaluating some ways of choosing hyper-parameters (\(\alpha\) and \(\lambda\)) in penalized linear regression. The same principles can be applied to other types of penalized regresions (e.g. logistic). Model In penalized linear regression, we find regression coefficients \(\hat{\beta}_0\) and \(\hat{\beta}\) that minimize the ...
[Read more...]

Predicting height based on DNA mutations

October 7, 2018 | Florian Privé

In this post, I show some results of predicting height based on DNA mutations. This analysis aims at reproducing the analysis of this paper using my own analysis tools in. I use a new dataset composed of 500,000 adults from UK, and genotyped over hund...
[Read more...]

Whether to use a data frame in R?

July 19, 2018 | Florian Privé

In this post, I try to show you in which situations using a data frame is appropriate, and in which it’s not. Learn more with the Advanced R book. What is a data frame? A data frame is just a list of vectors of the same length, each vector ... [Read more...]

Why I rarely use apply

July 13, 2018 | Florian Privé

In this short post, I talk about why I’m moving away from using function apply. With matrices It’s okay to use apply with a dense matrix, although you can often use an equivalent that is faster. N
[Read more...]

Why loops are slow in R

June 10, 2018 | Florian Privé

In this post, I talk about loops in R, why they can be slow and when it is okay to use them. Don’t grow objects Let us generate a matrix of uniform values (max changing for every column). gen_grow
[Read more...]

Performance: when algorithmics meets mathematics

April 18, 2018 | Florian Privé

In this post, I talk about performance through an efficient algorithm I developed for finding closest points on a map. This algorithm uses both concepts from mathematics and algorithmics. Problem to solve This problem comes from a recent question on StackOverflow. I have two matrices, one is 200K rows long, ...
[Read more...]

Teaching an advanced R course

March 28, 2018 | Florian Privé

In this post, I come back to my first experience teaching an advanced R course over the past month. Content This course was programmed for 10 sessions (3 hours each) and I initially wanted to talk about the following subjects: R programming and g...
[Read more...]

Shiny App for making Pixel Art Models

November 15, 2017 | Florian Privé

Last weekend, I discovered the pixel art. The goal is to reproduce a pixelated drawing. Anyone can do this without any drawing skills because you just have to reproduce the pixels one by one (on a squared paper). Kids and big kids can quickly become addicted to this. Example For ...
[Read more...]

Grenoble RUG: first working session

October 1, 2017 | Florian Privé

In this post, I will talk about the organisation of our R User Group (RUG) in Grenoble and our first working session. Organisation Each month, we have a working session of 2 hours. The first hour is dedicated to a presentation/tutorial (you can see t...
[Read more...]

Scraping some French medical school rankings

September 9, 2017 | Florian Privé

In this post, I will analyze the results of the “épreuves classantes nationales (ECN)”, which is a competitive examination at the end of the 6th year of medical school in France. First ones get to choose first where they want to continue their medical training. A very clean dataset The ...
[Read more...]

A guide to parallelism in R

September 4, 2017 | Florian Privé

In this post, I will talk about parallelism in R. This post will likely be biased towards the solutions I use. For example, I never use mcapply nor clusterApply. I prefer to always use foreach. In this post, we will focus on how to parallelize R code on your computer. ... [Read more...]

One month as a procrastinator on Stack Overflow

July 26, 2017 | Florian Privé

Hello everyone, I’m 6103040 aka F. Privé. In this post, I will give some insights about answering questions on Stack Overflow (SO) for a month. One of the reason I’ve began frenetically answering questions on Stack Overflow was to procrastinate while finishing a scientific manuscript. My activity on Stack ...
[Read more...]

(Linear Algebra) Do not scale your matrix

June 2, 2017 | Florian Privé

In this post, I will show you that you generally don’t need to explicitly scale a matrix. Maybe you wanted to know more about WHY matrices should be scaled when doing linear algebra. I will remind about that in the beginning but the rest will focus on HOW to ...
[Read more...]

Tip: Optimize your Rcpp loops

December 28, 2016 | Florian Privé

In this post, I will show you how to optimize your Rcpp loops so that they are 2 to 3 times faster than a standard implementation. Context Real data example For this post, I will use a big.matrix which represents genotypes for 15,283 individuals, corresponding to the number of mutations (0, 1 or 2) at 287,155 ...
[Read more...]
1 2

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)