Blog Archives

How not to reveal your MySQL DB login/password when sharing code on GitHub or BitBucket?

March 19, 2013
By

Solution: use your ~/.my/cnfInside your ~/.my.cnf file define the connection parameters to your databases. For example, here I define two groups called local and toto:user = rootpassword = ultra_secrethost = localhostuser = capitaine_flamp...

Read more »

Large correlation in parallel

February 24, 2013
By
Large correlation in parallel

A little improvement to the bigcor function proposed on Rmazing to compute huge correlation matrix in R, I made the function work in parallel using all the CPU cores available on the machine. The code is here.Here is a benchmark of the 2 func...

Read more »

Air quality analysis from Beijing twitter feed.

January 14, 2013
By
Air quality analysis from Beijing twitter feed.

As air pollution in Beijing reach new high . I re-ran the analysis I put online a few months ago. "Crazy bad" is a good description when it reach those levels. But I am sure there are other place like Mexico city, LA etc... that also look as dramatic as...

Read more »

Computing an empirical pFDR in R

December 21, 2012
By

The positive false discovery rate (pFDR) has become a classical procedure to test for false positive. It is one of my favourite because it rely on a re-sampling approach.I base my implementation on John Storey PNAS paper and the technical report he published with Rob Tibshirani while at Stanford (I find the technical report...

Read more »

Religious restrictions index: how do countries compare?

September 21, 2012
By

The Guardian DataBlog published yesterday an interesting article exploring graphically the religious intolerance across the world. The data are coming from a report published by Pew Research Center's Forum on Religion and Public Life. I like the philosophy DataBlog a lot, providing the raw data for everyone to look at. However, I felt that the visualization could be...

Read more »

Twitter analysis of air pollution in Beijing

July 31, 2012
By
Twitter analysis of air pollution in Beijing

One of the air pollution detection machine in Beijing (at the American Embassy) is connected to Twitter and tweet about the air quality in real time. By default the machine in Beijing output the 24hr summary PM2.5 air pollution information. What is PM2.5 is define here Next will be to compare the...

Read more »

Rcpp vs. R implementation of cosine similarity

June 9, 2012
By

While speeding up some code the other day working on a project with a colleague I ended up trying Rcpp for the first time. I re-implemented the cosine distance function using RcppArmadillo relatively easily using bits and pieces of code I found scattered around the web. But the speed increase was not as much as I expected comparing the...

Read more »

Obtaining a protein-protein interaction network for a gene list in R

June 3, 2012
By
Obtaining a protein-protein interaction network for a gene list in R

Building a network of interaction between a bunch of genes can help a great deal in understanding the relationships between the seemingly disparate elements from your list. It can seems challenging at first to build such network but it's less complicat...

Read more »

Another look at over-representation analysis interpretation

May 21, 2012
By
Another look at over-representation analysis interpretation

Interpreting a list of differentially regulated genes can take many forms. One of the most widely used method is looking for enrichment of functional group of genes compared to a random sampling of gene from the same universe, namely an over-representation analysis (ORA).The point I want to explore today is what is the best way to interpret the results...

Read more »

Using R to graph a subject trend in PubMed

May 15, 2012
By
Using R to graph a subject trend in PubMed

The traditional way to show that your topic is worth studying in front of an audience is to show the state of the field based on a literature review. This is especially true if your subject is obscure except to a handful of scientists in the world.I was confronted with this problem more than once and the last time...

Read more »