A while ago I had the need to produce some posters that included lots of data (scientific style). Having recently got back into R and learning LaTex I googled for a way to do this using R. Here's what I found and ended up with, using R, LaT...

Today's guest post comes to us from Andrew Winterman, Data Designer at data visualization company Persiscopic. He shares with us the process of using the R language and other tools to create an interactive data application for a client — ed. The Hewlett Foundation contacted us a few months ago because they were interested in exploring ways to visualize...

ggplot2 is one of the most elegant R package for data analysis and visualization. Recently I gave a tutorial on ggplot2 package. You could find my ggplot2 notes here(click the image below). You could find my presentation slide below. The … Continue reading →The post R Graphics with ggplot2 appeared first on Fiddling with data and...

MathJax allows you to customize how \( \LaTeX \) is displayed. Simply right click over the math you’d like to see to access the display menu. Under “math settings” you can see zoom trigger and factor options. Given how small the text ...

The classic Pythagorean identity is: \(sin^2(\theta) + cos^2(\theta) =1 \) The binomial formula which calculates the probability of obtaining k tails when flipping a coin n times, with a assumed probability p for each trial is: \( P(E) = {n \choos...

Forecasters are often met with skepticism. Almost every time I tell someone that I work in forecasting, they say something about forecasting the stock market, or forecasting the weather, usually suggesting that such forecasts are hopelessly inaccurate. In fact, forecasts of the weather are amazingly accurate given the complexity of the system, while anyone claiming to forecast the stock...

This links back to previous posts here and here. Earlier today, I had a quick chat with Michela (by email, actually) on this topic. In particular, she was trying to use the function I've written to compute summaries from the posterior distrib...

In this podcast interview with Michael Kane, Data Scientist and Associate Researcher at Yale University, Michael discusses the R statistical programming language, computational challenges associated with big data, and two projects involving data analysis he conducted on the stock market "flash crash" of May 6, 2010, and the tracking of transportation routes bird flu H5N1. Michael also...

I have been exploring how to speed up some of my R scripts and have started reading about some amazing corners of R. My first weapon was the Rcpp and RcppArmadillo package. These are wonderful tools and even for someone that has never written c++ before, there are enough to examples and documentation to get started. I...

Today's guest post is from Ron Fredericks, videographer and co-founder of LectureMaker, LLC — ed. I was initially surprised to find R user groups (RUGs) so popular. I filmed my first R session during the 2009 Predictive Analytics World in San Francisco. I filmed several more R user sessions over the past three years along with business/science clients and...

I first experimented with word clouds several years ago and used them to visualise the speeches of Kevin Rudd and Malcolm Turnbull. I have now learned from the Fell Stats blog (via R-Bloggers) that there is an R package for generating word clouds. The package makes use of tm, a text mining package for R, which I have been

Hello, world!Back in July we have read Markus Gesmann’s great blogpost about a prediction for the 100m final in London. Soon we decided to create similar estimates about the forthcoming events and started to post our results on Facebook.We would like to emphasise again that these kind of extrapolated estimates are rather just for fun and we also think...

Not exactly pin-point accuracy. Previously Two related posts are: A practical introduction to garch modeling garch and long tails Experiment 1000 simulated return series were generated. The garch(1,1) parameters were alpha=.07, beta=.925, omega=.01. The asymptotic variance for this model is 2. The half-life is about 138 days. The simulated series used a Student’s t distribution … Continue reading...

Setting up a beamer slideshow is tedious. Creating new slideshows with the same header/footer/style files every week for your course lectures is very very tedious. To solve this problem I created a simple bash shell script. When you run the script in...

Metadata! Metadata is very cool. It's super hot right now - everybody is talking about it. Okay, maybe not everyone, but it's an important part of archiving scholarly work. We are working on a repo on GitHub rmetadata to be a one stop shop for quer...

Here's how I did it in 3 easy steps: (1) Set up a form in Google Docs/Drive. (2) Choose "Actions" and "Embed in Website" to get the URL for the iframe and put it in a post, like below. Then, go to the spreadsheet view of the form on Google Docs/Drive a...

I’m happy to present episode 10 of the R-Podcast! Season 1 of the R-Podcast concludes with part 2 of my series on data munging, in which I discuss issues surrounding importing data sets contained in HTML tables. I share how I used the XML and RCurl packages to validate and import data from hockey-reference.com for