Here are a couple of tutorials I’ve written to help anyone who’s interested in learning how to produce simple bar charts or simple segmented bar charts in R, given that you have

Have you ever wanted an easy way to generate continuous color pallettes for a discrete factor? I came across a question over on Stackoverflow about how add color to a ggplot figure. I often find myself with lot’s of categories that are discrete when I want a continuous color plot. This can be achieved by writing a quick...

Another Google Summer of Code (GSoC) project this summer focused on creating functions for doing returns-based performance attribution. I’ve always been a little puzzled about why this functionality wasn’t covered already, but I think that most analysts do this kind of work in Excel. That, of course, has its own perils. But beyond the workflow

For some reason R is not happy with its 64-bit cousin when installing source packages: * installing *source* package ‘XMLSchema’ ... ** R ** inst ** preparing package for lazy loading ** help *** installing help indices ** building package indices ** installing vignettes ** testing if installed package can be loaded *** arch - i386 *** arch - x86_64 /Library/Frameworks/R.framework/Resources/bin/R: line 259: /Library/Frameworks/R.framework/Resources/bin/exec/x86_64/R: Bad CPU type in executable /Library/Frameworks/R.framework/Resources/bin/R: line 259:...

For the last decade or so, the go-to software for Bayesian statisticians has been BUGS (and later the open-source incarnation, OpenBugs, or JAGS). BUGS is used for multi-level modeling: using a specialized notation, you can define random variables of various distributions, set Bayesian priors for their parameters, and create the network of relationships that describe how the random variables...

Why do I always forget how to do this? R CMD INSTALL rgdal --configure-args="--with-gdal-config=/Library/Frameworks/GDAL.framework/Versions/Current/unix/bin/gdal-config --with-proj-include=/Library/Frameworks/PROJ.framework/Versions/4/Headers/ --with-proj-lib=/Library/Frameworks/PROJ.framework/Versions/Current/unix/lib/" You will need to adjust the paths based on your version of the GDAL and Proj4 frameworks. read more

With Ewen (aka @3wen), not only we have been playing on Twitter this month, we have also been working on kernel estimation for densities of spatial processes. Actually, it is only a part of what he was working on, but that part on kernel estimation has been the opportunity to write a short paper, that can now be downloaded on hal. The problem...

When I learned about principal component analysis (PCA), I thought it would be really useful in big data analysis, but that's not true if you want to do prediction. I tried PCA in my first competition at kaggle, but it delivered bad results. This post illustrates how PCA can pollute good predictors.When I started examining this problem,...

30.08.2012 With Bio7 1.6 it is possible to send multiple images from ImageJ to R without the need to open them in the Graphical User Interface of ImageJ for speed improvements. With a simple script written in Java, Groovy or BeanShell a new Bio7 API command can be used (see below) to transfer images and

So I want to mine some #altmetrics data for some research I'm thinking about doing. The steps would be: Get journal titles for ecology and evolution journals. Get DOI's for all papers in all the above journal titles. Get altmetrics data on each DO...

Here's a short follow-up on how to produce a word cloud for a search result from GScholarScraper_3.1:# File-Name: GScholarScraper_3.1.R# Date: 2012-08-22# Author: Kay Cichini# Email: [email protected]# Purpose: Scrape Google Scholar search result# ...

Here is the list of courses I wish to teach next year at Chiang Mai School of Economics, not so sure about the demand there! Undergraduate (B.Econ.) ECON 304: Economics Statistics (with R) ECON 408: Research Design in Economics ECON 417: Managerial Economics ECON 419: Economic Theory and Entrepreneurship ECON 443: Industrial Economics ECON 4xx: Introduction to

10,000 iterations for 4 chains on the (precompiled) efficiently-parameterized 8-schools model: > date () "Thu Aug 30 22:12:53 2012" > fit3 date () "Thu Aug 30 22:12:55 2012" > print (fit3) Inference for Stan model: anon_model. 4 chains: each with iter=10000; warmup=5000; thin=1; 10000 iterations saved. mean se_mean sd 2.5% 25% 50% 75% The post Stan...

Spearman’s Rho Rank Correlation There are generally three types of correlation that a researcher may encounter: Pearson’s r, Kendall’s Tau, and Spearman’s Rho. They each have their own uses and applications depending on the da...

It's a wonderful thing when people make interesting data sets available to the public. When Thomas Jones wrote a paper in Econometrics about the growth of US retail giant Walmart, he made the data he collected about every Walmart store opening in history (location and date) available to the public. Since then, several people have used different techniques to...

Stan 1.0.0 and RStan 1.0.0 It’s official. The Stan Development Team is happy to announce the first stable versions of Stan and RStan. What is (R)Stan? Stan is an open-source package for obtaining Bayesian inference using the No-U-Turn sampler, a variant of Hamiltonian Monte Carlo. It’s sort of like BUGS, but with a different language The post A...

So I was trying to figure out a fast way to make matrices with randomly allocated 0 or 1 in each cell of the matrix. I reached out on Twitter, and got many responses (thanks tweeps!). Here is the solution I came up with. See if you can tell why it...

Rather belatedly, I got around to posting a series of posts summarising the Formula One season to date: F1 2012 Mid-Season Review – Grid/Classification Analysis: for example, how do the drivers’ grid and final classifications compare? F1 2012 Mid-Season Review – Pit Stops: for example, how does pit stop performance across the teams compare? F1

I have resisted learning the popular R graphics package, ggplot2. I dismissed ggplot2 as primarily useful for exploratory graphics and rationalized my avoidance of ggplot2 by assuming that it would require just as many (or more) lines of code as the R base package to whip the default plots into publication-quality figures. The few times