Monthly Archives: September 2011

Data Science: a literature review

September 28, 2011
By
Data Science: a literature review

Just what is Data Science, anyway? Here's one take: Ever since the term "Data Scientist" was coined by DJ Patil and Jeff Hammerbacker in 2009, there's been a vigorous debate on what the team actually means. More than 80% of statisticians consider themselves data scientists, but Data Science is more than just Statistics. (My own take is that Data...

Read more »

Polyploidy in sugarcane

September 28, 2011
By

While reading UseR conference abstracts I came across this sentence: "Sugarcane is polypoid, i.e., has 8 to 14 copies of every chromosome, with individual alleles in varying numbers." Vau! This generates really complex genotype system. Say we have biallelic gene with alleles being A and B. In diploids the possible genotypes are AA, AB, and BB. Given the...

Read more »

Bessel integral

September 28, 2011
By
Bessel integral

Pierre Pudlo and I worked this morning on a distribution related to philogenic trees and got stuck on the following Bessel integral where In is the modified Bessel function of the first kind. We could not find better than formula 6.611(4) in Gradshteyn and Ryzhik. which is for a=0… Anyone in for a closed form

Read more »

Using transparency for data count intuition

September 27, 2011
By
Using transparency for data count intuition

This is an illustration of representing point count in a graphic using transparency. This is easy to do in ggplot2 if you use one of the barchart type of geoms.  However I think there are other situations where it would be useful to apply aesthetics based on point count. Since Hadley did a lot of

Read more »

World Tourism Day, and Google Public Data Explore

September 27, 2011
By
World Tourism Day, and Google Public Data Explore

Today is the World Tourism Day! So let’s speak about some tourism related datasets – and others. Among other nice functions, Google offers a Public Data Explore in a beta version which provides a collection of datasets from OECD, IMF, Eurostat, … Continue reading →

Read more »

Five new local R user groups

September 27, 2011
By

Looks like there's been a lot of activity in the R user community in the Northern hemisphere now that the summer break is over. I've just added several new groups to the Local R User Group Directory: Tokyo, Japan: The Tokyo.R R study group has already had 17 meetings, but has just been added to the directory. Shanghai/East China:...

Read more »

Tikz Introduction

September 27, 2011
By
Tikz Introduction

The pgf drawing package for LaTeX provides facilities for drawing simple of complicated pictures within a LaTeX document. There are many options available within the package and in this post we consider some of the basics to get up and running. Fast Tube by Casper As with all LaTeX documents we need to select a

Read more »

Basic line chart with ggplot2

September 27, 2011
By
Basic line chart with ggplot2

ggplot2 is a package for R which easily draws plots that are easier on the eyes than R’s built-in plotting functions, though the grammar is different than what is commonly used in R. This code demonstrates how to prepare a … Continue reading →

Read more »

Ghastly R code

September 27, 2011
By
Ghastly R code

My R package, R/qtl, contains about 33k lines of R code (and 21k lines of C code). Some of it is quite good; some of it is terrible. Here’s another example of the terrible. I’ve long needed to revise the function scantwo, for performing a two-dimensional genome scan for pairs of loci. I was looking

Read more »

Project Euler: problem 6

September 27, 2011
By
Project Euler: problem 6

The sum of the squares of the first ten natural numbers is,12 + 22 + ... + 102 = 385The square of the sum of the first ten natural numbers is,(1 + 2 + ... + 10)2 = 552 = 3025Hence the difference between the sum of the squares o...

Read more »