# Articles by Michael kao

### 3D Sine Wave

September 16, 2014 |

Had a headache last night, so decided to take things easy and just read posts Google+. Then I came across this post which seems interesting so I thought I would play around before I head to bed. First of all, I thought generating a square base would be much easier ... [Read more...]

### Spline interpolation of temporal resolution for satellite images.

August 25, 2014 |

This week, I had a discussion with a few of my colleagues on the possibility of utilizing remote sensing data or satellite images to improve our statistical estimation such as imputation. One source of interest is the Normalized Difference Vegetation Index which quantify the concentrations of green leaf vegetation around ... [Read more...]

### Why multiple imputation?

March 20, 2014 |

Background In the forth coming week, I will be giving a presentation on the fundamentals of imputation to my colleagues. One of the most important idea I would like to present is multiple imputation. In my last post, I have given a small example of multiple imputation, but it does ... [Read more...]

### Accurate imputation and valid statistical inference with ensemble

January 25, 2014 |

Imputation is predictive inference and not causal inference! I have met many people, who consider the two are equivalent. Their reasoning is based on the belief that if you can produce a model which replicate the data generating mechanism, it will give you the best prediction. Which may or may ... [Read more...]

### First day of State of Food Insecurity (SOFI) 2013

October 2, 2013 |

The FAO flagship publication SOFI 2013 was release yesterday on the 1st of October, the publication is the most important report in monitoring the progress towards the 2015 Millenium Development Goal and ultimately eliminate hunger. I was interest in how the people responded, so I scrapped some data from Twitter and previous ... [Read more...]

### Tupper’s self-referential formula

March 24, 2013 |

Can't remember where I first came across this equation but the Tupper's self referential equation, is a very interesting formula that when graphed in two dimension plane it reproduces the formula. \[ \frac{1}{2} I first thought this would be a quick 5 min exercise which turned into a 3 hour work, the obstacle ... [Read more...]

### Violin plots and regional income distribution

March 20, 2013 |

While preparing my slides for statistical graphics, a plot really caught my eye when I was playing around with the data. I started off by plotting the time seriesof GNI per capita by country, and as expected it got quite messy and incomprehensible.
## Download and manipulate the data<br>library(FAOSTAT)<br>raw.lst = getWDItoSYB(indicator = c("NY.GNP.PCAP.CD", "SP.POP.TOTL"))<br>raw.df = raw.lst[["entity"]]<br>traw.df = translateCountryCode(raw.df, from = "ISO2_WB_CODE", to = "UN_CODE")<br>mraw.df = merge(traw.df, FAOregionProfile[, c("UN_CODE", "UNSD_MACRO_REG")])<br>final.df = mraw.df[!is.na(mraw.df\$UNSD_MACRO_REG), ]<br><br>## Simple ugly time series plot<br>ggplot(data = final.df, aes(x = Year, y = NY.GNP.PCAP.CD)) +<br>    geom_line(aes(col = Country)) +<br>    labs(x = NULL, y = "GNI per capita")<br>
So I decided to compute the ... [Read more...]

### R package building automation

February 11, 2013 |

Title: R package building automationInspired by the post at http://giventhedata.blogspot.tw/2013/02/my-r-package-development-cheat-sheet.html. I have decided to publish my cheat script for package development as well. Building package used to be a nightmare, filling in all those Rdfiles manually can cause some serious brain damage. Thanks to the ... [Read more...]

### Relearn boxplot and label the outliers

February 5, 2013 |

Despite the fact that box plot is used almost every where and taught at undergraduate statistic classes, I recently had to re-learn the box plot in order to know how to label the outliers.This stackoverflow post was where I found how the outliers and whiskers of the Tukey box ... [Read more...]

### A package for agricultural statistic: FAOSTAT

February 3, 2013 |

After 8 years of using R, today I finally become a contributor to the community and released my first package, FAOSTAT.The package is designed to provide user with direct access to the FAOSTAT data base via R and to support the open data and methodology philosophy used in the Statistical ... [Read more...]

### Maize trade Part II: Comparison and analysis

February 3, 2013 |

Following my last post about the maize network, although interesting but is not very informative. What we are going to do today is to contrast the maize network with the wine trade network.The choice why we have chose wine will become clear after the network and the analysis. Lets ... [Read more...]

### Maize trade Part I: Generate the network diagram

January 17, 2013 |

It has been several month since my last post, partially due to the fact that my laptop was lost and several deadlines was approaching. Fortunately I will be returning to Taiwan and get a new laptop within a week, and will be updating regularly again.This post will provide a ... [Read more...]

### Perculiar behaviour of the sum function

October 3, 2012 |

The sum function in R is a special one in contrast to other summary statistics functions such as mean and median. The first distinguish is that it is a Primitive function where the others are not (Although you can call mean using .Internal). This ... [Read more...]

September 29, 2012 |

I got very excited on making a network diagram of my Facebook network using Ghefi (https://gephi.org/) and submitted my first assignment for the Social Network Analysis course on https://www.coursera.org/. It's middle of the night, so I will ... [Read more...]

September 22, 2012 |

This week,  I got my hands on some agricultural trade data. Trade data are typically extremely dirty so treat with care when you get your hands on them. Lab standard equipments are required.So I decided to look how countries trade by plotting the ... [Read more...]

### Preferential attachment for network

September 15, 2012 |

I am currently taking the networked life course on Coursera.org offered by Professor Michael Kearns from the University of Pennsylvania.  I have been took several courses including machine learning, natural language processing since the platf... [Read more...]

### Imputation by mean?

September 13, 2012 |

Today, I was briefed that when computing the regional aggregates such as those defined by the M49 country standard of the United Nation (http://unstats.un.org/unsd/methods/m49/m49regin.htm) I should use the regional mean to replace missing values.... [Read more...]