Bayesian Inference Using OpenBUGS

July 22, 2012
By
Bayesian Inference Using OpenBUGS

In our previous statistics tutorials, we have treated population parameters as fixed values, and provided point estimates and confidence intervals for them. An alternative approach is the Bayesian statistics. It treats population parameters as random...

Read more »

R for Ecologists: Phylogenies in R

July 22, 2012
By
R for Ecologists: Phylogenies in R

I’ve only recently begun working from an evolutionary perspective, and I can’t imagine why I haven’t thought about it much before. After all, it comes up in just about everything that we do in ecology. For example, I’m currently feeding … Continue reading →

Read more »

London Olympics and a prediction for the 100m final

July 22, 2012
By
London Olympics and a prediction for the 100m final

It is less than a week before the 2012 Olympic games will start in London. No surprise therefore that the papers are all over it, including a lot of data and statistis around the games. The Economist investigated the potential financial impact on spons...

Read more »

Project Euler — problem 16

July 21, 2012
By

The 16th problem is another big-number problem: 215 = 32768 and the sum of its digits is 3 + 2 + 7 + 6 + 8 = 26. What is the sum of the digits of the number 21000? This … Continue reading →

Read more »

Base versus grid graphics

Base versus grid graphics

In a comment in response to my latest post, Robert Young took issue with my characterization of grid as an R graphics package. Perhaps grid is better described as a “graphics support package,” but my primary point – and the main point of this post – is that to generate the display you want, it is sometimes necessary to use commands...

Read more »

The magic of the year 1901

July 21, 2012
By
The magic of the year 1901

The year 1901 is rather magical. Well it is for R provided you run it under Linux. Let me show you why. I have four data points, one from 1900, two from 1901, and one from 1902. dates values I convert them in two different ways; as a Date, and as a POSIXct. For...

Read more »

Emulating dynamic scoping in GNU R

July 21, 2012
By

By design GNU R uses lexical scoping. Fortunately it allows for at least two ways to simulate dynamic scoping.Let us start with the example code and next analyze it:x <- "global"f1 <- function() cat("f1:", x, "\n")f2 <- function() cat("f2:", e...

Read more »

googleVis — where did SYTYCD dancers come from?

July 21, 2012
By

After watching 20 wonderful dancers of the 9th season of So You Think You Can Dance, I have presented a geomap of  the states where they are coming from (click here). Now, I am interested to this show’s history.  I’d like to re-draw the … Continue reading →

Read more »

Automatic Hyperparameter Tuning Methods

July 20, 2012
By

At MSR this week, we had two very good talks on algorithmic methods for tuning the hyperparameters of machine learning models. Selecting appropriate settings for hyperparameters is a constant problem in machine learning, which is somewhat surprising given how much expertise the machine learning community has in optimization theory. I suspect there’s interesting psychological and

Read more »

Le Monde puzzle [#783]

July 20, 2012
By
Le Monde puzzle [#783]

In a political party, there are as many cells as there are members and each member belongs to at least one cell. Each cell has five members and an arbitrary pair of cells only shares one member. How many members are there in this political party? Back to the mathematical puzzles of Le Monde (science

Read more »

R Journal, June 2012

July 20, 2012
By

The June 2012 issue of the R Journal, the peer-reviewed open-journal about R packages and applications of R, is now available. This issue includes articles about: Efficiently calling C functions from R without the need for wrapper code Using clusters of Macs running Apple Xgrid for parallel distributed processing with R Semi-automated text classification with the 'maxent' package Two...

Read more »

Modeling Permanent and Gradual Process Changes with CDFs

July 20, 2012
By
Modeling Permanent and Gradual Process Changes with CDFs

Spencer HerathSpecial thanks to Ben OgorekBackgroundI recently faced a process with a structural change resulting in an increase in the process mean.  The jump to the new mean was not immediate; rather, there was a gradual increase in values over time.  I had previously benefited from multi-staged process-behavior charts when encountering immediate process shifts, but now I needed a...

Read more »

Look ma! No typing! Autorunning code on R startup

July 20, 2012
By
Look ma! No typing! Autorunning code on R startup

Regular readers may know that I often make R-based GUIs. They’re great for giving non-technical users safe and easy access to statistical models. The safety comes from the restrictions of a GUI: you can limit what the users does more easily than with a command line, helping to reduce the number of opportunities for bad

Read more »

Time-based internet advertising

July 20, 2012
By
Time-based internet advertising

Last week it was announced that Facebook is rotating its ads after a certain time of exposure. Sid Suri, Preston McAfee, and Dan Goldstein's research may have been the source of this idea. In 2011 and 2012 the trio published a couple papers putting for and improving the idea. The post Time-based internet advertising appeared first on Decision Science...

Read more »

Community Detection in Networks with R

Community Detection in Networks with R

I mainly post this visualization because I think it’s pretty. It reminds a little of the work by the famous Dutch painter Mondrian. The complete matrix can be found here. The plot is a heatmap of an adjacency matrix generated by a weighted dir...

Read more »

Volatility and Correlation

July 19, 2012
By
Volatility and Correlation

The implied option volatility reflects the price premium an option commands. A trader’s profit and loss ‘P&L’ from hedging option positions is driven to a large extend by the actual historical volatility of the underlying assets. Thus as option premiums … Continue reading →

Read more »

Coke vs Soda vs Pop : Linguistic trends analyzed with Twitter and R

July 19, 2012
By
Coke vs Soda vs Pop : Linguistic trends analyzed with Twitter and R

Growing up in Australia, for me a carbonated drink like Pepsi or Fanta or lemonade was always just a "soft drink". (Also, 'lemonade' in Australia was something different to 'lemonade' in the US; it's something close to 7-Up.) So when I moved to Seattle, it was surprising to me that all such things were called "pop". And then I...

Read more »

Course at Monash (#2)

July 19, 2012
By
Course at Monash (#2)

Here are the slides for the second day of my course at Monash University, Melbourne, in the Special Lectures in Econometrics, with a strong strong similarity with the slides of my course in Roma this Spring. (Ah, sunny Roma…) The first day lecture was very well attended and I hope this remains true for the

Read more »

Hierarchical Cluster Analysis (ChemoSpec) – 02

July 19, 2012
By
Hierarchical Cluster Analysis (ChemoSpec) – 02

This is the second derivative spectra of the raw spectra we have sawn in the post: "Hierarchical Cluster Analysis (ChemoSpec) - 01". In that post we saw some clusters, but the distance between the clusters was not high, so it was clear that some m...

Read more »

Best of Axys, R, d3.js, and HTML5

July 19, 2012
By
Best of Axys, R, d3.js, and HTML5

Axys, R, d3.js, and HTML5 all offer incredibly powerful tools for investment management and reporting, but they are not set up to synergistically interact to fill each other’s gaps and leverage each other’s strengths.  In my ideal scenario, Ax...

Read more »

Outer Product of Character Vectors in R

July 19, 2012
By
Outer Product of Character Vectors in R

What follows is like a kata to strengthen your R fundamentals. The lovely stats in the wild recently posted some hott data analysis of Olympians’ ages and sexes. Because I’m annoyingly picky about graphics, I asked for his code so I could ...

Read more »

Health Care Costs – Part 3, "Why You Are Paying More"

July 19, 2012
By
Health Care Costs – Part 3, "Why You Are Paying More"

Malpractice - A Booming Industry? Perhaps authors Frank Sloan, Randall Bovbjerg and Penny Githens capture it best from their book Insuring Medical Malpractice: "If aging Doctor Kildare were to return to medical practice today, having been...

Read more »

self-organizing map in R

July 19, 2012
By
self-organizing map in R

This is my first SOM figure :)Thanks to the som package and example code from Jun Yan. Here is my code for the figure:require(som)rpkm <- Tx_rpkmrpkm.f <- filtering(rpkm, lt=10, ut=30000, mmr=2, mmd=10)# rpkm.f=log(rpkm.f+0.1) # t...

Read more »

Random Forest Variable Importance

July 19, 2012
By

Random forests ™ are great. They are one of the best "black-box" supervised learning methods. If you have lots of data and lots of predictor variables, you can do worse than random forests. They can deal with messy, real data. If there are lots of extraneous predictors, it has no problem. It automatically does a good job...

Read more »

A weighting function for ‘nls’ / ‘nlsLM’

July 19, 2012
By
A weighting function for ‘nls’ / ‘nlsLM’

Standard nonlinear regression assumes homoscedastic data, that is, all response values are distributed normally.  In case of heteroscedastic data (i.e. when the variance is dependent on the magnitude of the data), weighting the fit is essential. In nls (or nlsLM of the minpack.lm package), weighting can be conducted by two different methods: 1) by supplying

Read more »

Video: knitr, R Markdown, and R Studio: Introduction to Reproducible Analysis

July 19, 2012
By

This post presents the video of a talk that I presented in July 2012 at Melbourne R Users on using knitr, R Markdown, and R Studio to perform reproducible analysis. I also provide links to a github repository where the R markdown examples can be examin...

Read more »

Universal portfolio, part 8

July 18, 2012
By
Universal portfolio, part 8

We extend the analysis of part 7 by calculating the final wealth for all tuples of 3 and 4 stocks, this is a simple extension but it also shows the most important problem of the Universal portfolio algorithm, its exponential complexity in the number of...

Read more »

Mapping Public Opinion: A Tutorial

July 18, 2012
By
Mapping Public Opinion: A Tutorial

At the upcoming 2012 summer meeting of the Society of Political Methodology, I will be presenting a poster on Isarithmic Maps of Public Opinion. Since last posting on the topic, I have made major improvements to the code and robustness of the modeling approach, and written a tutorial that illustrates the production of such maps. This … Continue reading →

Read more »

Time zones

July 18, 2012
By
Time zones

Say we have some following raw data. It consists of a timestamp and a corresponding value. There is a peak at exactly midnight (00:00:00). Each timestamp is fully specified. It contains a date, a time of day, and a time zone offset indication. In this case +0000, meaning the data is 0 hours away the UTC timezone. "timestamp","value""25-04-2012...

Read more »