### ggplot2 Version of Figures in “25 Recipes for Getting Started with R”

August 16, 2011 |

In order to provide an option to compare graphs produced by basic internal plot function and ggplot2, I recreated the figures in the book, 25 Recipes for Getting Started with R, with ggplot2. The code used to create the images is in separate paragraphs, allowing easy comparison.

### the batman equation

August 13, 2011 |

HardOCP has an image with an equation which apparently draws the Batman logo.

### ProjectEuler-Problem 46

June 21, 2011 |

It was proposed by Christian Goldbach that every odd composite number can be written as the sum of a prime and twice a square. 9 = 7 + 212 15 = 7 + 222

### [Project Euler] – Problem 58

May 21, 2011 |

Starting with 1 and spiralling anticlockwise in the following way, a square spiral with side length 7 is formed. 37 36 35 34 33 32 31 38 17 16 15 14 13 30

### [Project Euler] – Problem 57

May 19, 2011 |

It is possible to show that the square root of two can be expressed as an infinite continued fraction. √ 2 = 1 + 1/(2 + 1/(2 + 1/(2 + … ))) = 1.414213… By expanding this for the first four iterations, we get:

### Machine Learning Ex3 – Multivariate Linear Regression

March 29, 2011 |

Part 1. Finding alpha. The first question to resolve in Exercise 3 is to pick a good learning rate alpha. This require making an initial selection, running gradient descent and observing the cost function.

### clusterProfiler in Bioconductor 2.8

March 26, 2011 |

In recently years, high-throughput experimental techniques such as microarray and mass spectrometry can identify many lists of genes and gene products. The most widely used strategy for high-throughput data analysis is to identify different gene clusters based on their expression profiles. Another commonly used approach is to annotate these genes ...

### Machine Learning Ex2 – Linear Regression

March 22, 2011 |

Thanks to this post, I found OpenClassroom. In addition, thanks to Andrew Ng and his lectures, I took my first course in machine learning. These videos are quite easy to follow. Exercise 2 requires implementing gradient descent algorithm to model data with linear regression.

### The easiest way to get UTR sequence

March 2, 2011 |

I just figure out the way to query UTR sequence from ensembl by biomart tool. It is very simple compare with using bioperl to parse gbk file to extract UTR sequence.

### Estimate Probability and Quantile

January 25, 2011 |

Simple root finding and one dimensional integrals algorithms were implemented in previous posts. These algorithms can be used to estimate the cumulative probabilities and quantiles. Here, take normal distribution as an example.

### Single variable optimization

January 1, 2011 |

Optimization means to seek minima or maxima of a funtion within a given defined domain. If a function reach its maxima or minima, the derivative at that point is approaching to 0. If we apply Newton-Raphson method for root finding to f', we can get the optimizing f.

### one-dimensional integrals

December 25, 2010 |

The foundamental idea of numerical integration is to estimate the area of the region in the xy-plane bounded by the graph of function f(x). The integral was esimated by divide x to small intervals, then add all the small approximations to give a total approximation.

### Project Euler — Problem 187

December 23, 2010 |

http://projecteuler.net/index.php?section=problems&id=187 A composite is a number containing at least two prime factors. For example, 15 = 3 × 5; 9 = 3 × 3; 12 = 2 × 2 × 3. There are ten composites below thirty containing precisely two, not necessarily distinct, prime factors: 4, 6, 9, 10, 14, 15, 21, 22, 25, 26.

### Root finding

December 4, 2010 |

Numerical root finding methods use iteration, producing a sequence of numbers that hopefully converge towards a limits which is a root. In this post, only focus four basic algorithm on root finding, and covers bisection method, fixed point method, Newton-Raphson method, and secant method.

### bubble chart by using ggplot2

December 1, 2010 |

The visualization represented by Hans Rosling's TED talk was very impressive. FlowingData provides a tutorial on making bubble chart in R. I prefer ggplot2 for graphics.

### The avalanche of publications mentioning GO

November 30, 2010 |

Gene Ontology is the de facto standard for annotation of gene products. It has been widely used in biological data mining, and I believe it will play more central role in the future. Publications mentioning GO was collected and deposited in GO ftp, and can be accessed (ftp://ftp.geneontology....

November 30, 2010 |

I started to develop GOSemSim package two years ago when I was not quite familiar with R. I am very happy to see that someone use it and found it helpful. I try to learn S4 and redesign GOSemSim with S4 classes and methods in the pass two weeks, and ...

### upgrade R – F77 cause compilation error

October 20, 2010 |

I try to compile the source code of R 2.12 on CentOS, but it throw an error when trying to install *cluster*. * installing *source* package 'cluster' ...

### Listing gene IDs from hyperGTest

October 19, 2010 |

hyperGTest compute Hypergeomtric p-values for over or under-representation of each GO term in the specified category among the specified gene set. *geneSample* was used as an example.

### The S3 OOP system

October 15, 2010 |

R currently supports two internal OOP systems (S3 and S4), and several others as add-on packages, such as R.oo, and OOP. S3 is easy to use but not reliable enough for large software projects. The emphasis of the S3 system was on generic functions and polymorphism. It's a ...
