Articles by Michael kao

3D Sine Wave

September 16, 2014 | Michael kao

Had a headache last night, so decided to take things easy and just read posts Google+. Then I came across this post which seems interesting so I thought I would play around before I head to bed. First of all, I thought generating a square base would be much easier ... [Read more...]

Why multiple imputation?

March 20, 2014 | Michael kao

Background In the forth coming week, I will be giving a presentation on the fundamentals of imputation to my colleagues. One of the most important idea I would like to present is multiple imputation. In my last post, I have given a small example of multiple imputation, but it does ... [Read more...]

First day of State of Food Insecurity (SOFI) 2013

October 2, 2013 | Michael kao

The FAO flagship publication SOFI 2013 was release yesterday on the 1st of October, the publication is the most important report in monitoring the progress towards the 2015 Millenium Development Goal and ultimately eliminate hunger. I was interest in how the people responded, so I scrapped some data from Twitter and previous ... [Read more...]

Tupper’s self-referential formula

March 24, 2013 | Michael kao

Can't remember where I first came across this equation but the Tupper's self referential equation, is a very interesting formula that when graphed in two dimension plane it reproduces the formula. \[ \frac{1}{2} I first thought this would be a quick 5 min exercise which turned into a 3 hour work, the obstacle ... [Read more...]

Violin plots and regional income distribution

March 20, 2013 | Michael kao

While preparing my slides for statistical graphics, a plot really caught my eye when I was playing around with the data. I started off by plotting the time seriesof GNI per capita by country, and as expected it got quite messy and incomprehensible.
## Download and manipulate the data<br>library(FAOSTAT)<br>raw.lst = getWDItoSYB(indicator = c("NY.GNP.PCAP.CD", "SP.POP.TOTL"))<br>raw.df = raw.lst[["entity"]]<br>traw.df = translateCountryCode(raw.df, from = "ISO2_WB_CODE", to = "UN_CODE")<br>mraw.df = merge(traw.df, FAOregionProfile[, c("UN_CODE", "UNSD_MACRO_REG")])<br>final.df = mraw.df[!is.na(mraw.df$UNSD_MACRO_REG), ]<br><br>## Simple ugly time series plot<br>ggplot(data = final.df, aes(x = Year, y = NY.GNP.PCAP.CD)) +<br>    geom_line(aes(col = Country)) +<br>    labs(x = NULL, y = "GNI per capita")<br>
So I decided to compute the ... [Read more...]

R package building automation

February 11, 2013 | Michael kao

Title: R package building automationInspired by the post at http://giventhedata.blogspot.tw/2013/02/my-r-package-development-cheat-sheet.html. I have decided to publish my cheat script for package development as well. Building package used to be a nightmare, filling in all those Rdfiles manually can cause some serious brain damage. Thanks to the ... [Read more...]

Relearn boxplot and label the outliers

February 5, 2013 | Michael kao

Despite the fact that box plot is used almost every where and taught at undergraduate statistic classes, I recently had to re-learn the box plot in order to know how to label the outliers.This stackoverflow post was where I found how the outliers and whiskers of the Tukey box ... [Read more...]

A package for agricultural statistic: FAOSTAT

February 3, 2013 | Michael kao

After 8 years of using R, today I finally become a contributor to the community and released my first package, FAOSTAT.The package is designed to provide user with direct access to the FAOSTAT data base via R and to support the open data and methodology philosophy used in the Statistical ... [Read more...]

Maize trade Part II: Comparison and analysis

February 3, 2013 | Michael kao

Following my last post about the maize network, although interesting but is not very informative. What we are going to do today is to contrast the maize network with the wine trade network.The choice why we have chose wine will become clear after the network and the analysis. Lets ... [Read more...]

Maize trade Part I: Generate the network diagram

January 17, 2013 | Michael kao

It has been several month since my last post, partially due to the fact that my laptop was lost and several deadlines was approaching. Fortunately I will be returning to Taiwan and get a new laptop within a week, and will be updating regularly again.This post will provide a ... [Read more...]

Perculiar behaviour of the sum function

October 3, 2012 | Michael kao

The sum function in R is a special one in contrast to other summary statistics functions such as mean and median. The first distinguish is that it is a Primitive function where the others are not (Although you can call mean using .Internal). This ... [Read more...]

my Facebook social network

September 29, 2012 | Michael kao

I got very excited on making a network diagram of my Facebook network using Ghefi (https://gephi.org/) and submitted my first assignment for the Social Network Analysis course on https://www.coursera.org/. It's middle of the night, so I will ... [Read more...]

Network of trade

September 22, 2012 | Michael kao

This week,  I got my hands on some agricultural trade data. Trade data are typically extremely dirty so treat with care when you get your hands on them. Lab standard equipments are required.So I decided to look how countries trade by plotting the ... [Read more...]

Preferential attachment for network

September 15, 2012 | Michael kao

I am currently taking the networked life course on Coursera.org offered by Professor Michael Kearns from the University of Pennsylvania.  I have been took several courses including machine learning, natural language processing since the platf... [Read more...]

Imputation by mean?

September 13, 2012 | Michael kao

Today, I was briefed that when computing the regional aggregates such as those defined by the M49 country standard of the United Nation (http://unstats.un.org/unsd/methods/m49/m49regin.htm) I should use the regional mean to replace missing values.... [Read more...]

Units and metadata

August 2, 2012 | Michael kao

Handling meta-data is not natural in R, or any traditional rectangular shaped type data storage system.There are several tricks and packages which attempt to solve this problem, with Hmisc using the atrribute feature and the IRange package having its o... [Read more...]

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)