## Visualizing Risky Words — Part 2

March 9, 2013
This is a follow-up to my Visualizing Risky Words post. You’ll need to read that for context if you’re just jumping in now. Full R code for the generated images (which are pretty large) is at the end. Aesthetics are the primary reason for using a word cloud, though one can pretty quickly recognize what

## Analyzing SimplyStatistics visits info

March 9, 2013
Recently we had to analyze the data of the number of visits per day to SimplyStatistics.org. There were two goals: Estimate the fraction of visitors retained after a spike in the number of visitors Identify (if any) any factors that influence the fraction estimated in 1. For me it was a fun project in part because I like SimplyStatistics but also...

## A bit more on sample size

March 8, 2013
In our article What is a large enough random sample? we pointed out that if you wanted to measure a proportion to an accuracy “a” with chance of being wrong of “d” then a idea was to guarantee you had a sample size of at least: This is the central question in designing opinion polls Related posts:

## R vs. Perl/mySQL – an applied genomics showdown

March 8, 2013
R vs. Perl/mySQL - an applied genomics showdown Recently I was given an assignment for a class I'm taking that got me thinking about speed in R. This isn't something I'm usually concerned with, but the first time I tried to run my solution (ussing plyr's ddply() it was going to take all night to compute. I consulted the professor that taught...

## Quandl package released to CRAN

March 8, 2013
In a guest post here on February 20, Tammer Kamel introduced us to Quandl, a kind of "wikipedia" of time series data. In the post, Tammer (the founder of Quandl) noted that they were working on an R package to give R users access to Quandl as a data source. That package is now available. It includes the Quandl...

## Comparing quantiles for two samples

March 8, 2013
Recently, for a research paper, I some samples, and I wanted to compare them. Not to compare they means (by construction, all of them were centered) but there dispersion. And not they variance, but more their quantiles. Consider the following boxplot type function, where everything here is quantile related (which is not the case for standard boxplot, see http://freakonometrics.hypotheses.org/4138,...

## Data Visualization: Shiny Democratization

March 8, 2013
In organizing Data Visualization DC we focus on three themes: The Message, The Process, The Psychology. In other words, ideas and examples of what can be communicated, the tools and know-how to get it done, and how best to communicate. … Continue reading → The post Data Visualization: Shiny Democratization appeared first on Data Community DC.

## Publishing Stats for Analytic Reuse – FAOStat Website and R Package

March 8, 2013
How can stats and data publishers, from NGOs and (inter)national statistics agencies to scientific researchers, publish their data in a way that supports its analysis directly, as well as in combination with other datasets? Here’s one approach I learned about from Michael Kao of the UN Food and Agriculture Organisation statistics division, FAOStat. At first

## Cool GSS training video! And cumulative file 1972-2012!

March 8, 2013
Felipe Osorio made the above video to help people use the General Social Survey and R to answer research questions in social science. Go for it! Meanwhile, Tom Smith reports: The initial release of the General Social Survey (GSS), cumulative file for 1972-2012 is now on our website. Codebooks and copies of questionnaires will be The post Cool...

## Visualizing rOpenSci collaboration

March 8, 2013
We (rOpenSci) have been writing code for R packages for a couple years, so it is time to take a look back at the data. What data you ask? The commits data from GitHub ~ data that records who did what and when. Using the Github commits API we can gather data on who commited code to a...

## From OpenOffice noob to control freak: A love story with R, LaTeX and knitr

March 8, 2013
Lately I had to write a seminar paper for a class and I decided to overdo it.But let's start at the very beginning. Here is my evolution of how I used to write stuff and how I got from this:to that:School: OpenOffice - I guess everyone has some&nb...

## ddply in action

March 7, 2013
Top Batting Averages Over Time Top Batting Averages Over Time reference:http://www.baseball-databank.org/ ShortI'm going to use plyr and ggplot2 to look at how top batting averages have changed over time First load the data: options(width = 100)library(ggplot2) ## Warning message: package 'ggplot2' was built under R version 2.14.2 library(plyr)data(baseball)head(baseball) ## ...

## geom_point Legend with Custom Colors in ggplot

March 7, 2013
Formerly, I showed how to make line segments using ggplot.Working from that previous example, there are only a few things we need to change to add custom colors to our plot and legend in ggplot.First, we'll add the colors of our choice. I'll do th...

## ggplot ggoldy

March 7, 2013
One of my graduate students worked some ggplot magic and created an almost Light Bright-esqe plot of our very own Goldy Gopher. She also, thoughtfully, published a tutorial on her blog. Read and enjoy!

March 7, 2013
In this post, I will show how to download CBS fantasy football projections using R. The R Script The R Script for downloading fantasy football projections from CBS is located The post Downloading CBS Fantasy Football Projections in R appeared first on Fantasy Football Analytics.

## Veterinary Epidemiologic Research: Linear Regression Part 2 – Checking assumptions

March 6, 2013
We continue on the linear regression chapter the book Veterinary Epidemiologic Research. Using same data as last post and running example 14.12: Now we can create some plots to assess the major assumptions of linear regression. First, let’s have a look at homoscedasticity, or constant variance of residuals. You can run a statistical test, the

## Stan 1.2.0 and RStan 1.2.0

March 6, 2013
$Stan 1.2.0 and RStan 1.2.0$

Stan 1.2.0 and RStan 1.2.0 are now available for download. See: http://mc-stan.org/ Here are the highlights. Full Mass Matrix Estimation during Warmup Yuanjun Gao, a first-year grad student here at Columbia (!), built a regularized mass-matrix estimator. This helps for posteriors with high correlation among parameters and varying scales. We’re still testing this ourselves, so The post Stan...

## Let’s Do Some Hierarchical Bayes Choice Modeling in R!

March 6, 2013
It can be difficult to work your way through hierarchical Bayes choice modeling.  There is just too much new to learn.  If nothing else, one gets lost in all ways that choice data can be collected and analyzed.  Then there is all this ou...

## Lambda.r 1.1.1 released (and introducing the EMPTY keyword)

March 6, 2013
I’m pleased to announce that lambda.r 1.1.1 is now available on CRAN. This release is mostly a bug fix release, …Continue reading »

## A volatility filter using historical vol

March 6, 2013
We have been looking at a way to improve risk adjusted returns by using a volatility filter. Although we could use VIX or equivalent, it turns out that historical volatility will work just as well, if not a little better.You can see part 1 here Digging into the VIX, and part 2 here What can we use...

## Barycentric interpolation: fast interpolation on arbitrary grids

March 6, 2013
$Barycentric interpolation: fast interpolation on arbitrary grids$

Barycentric interpolation generalises linear interpolation to arbitrary dimensions. It is very fast although suboptimal if the function is smooth. You might now it as algorithm 21.7.1 in Numerical Recipes (Two-dimensional Interpolation on an Irregular Grid). Using package geometry it can be implemented in a few lines of code in R. Here’s a quick explanation of what

## Exporting plain, lattice, or ggplot graphics

March 6, 2013
A blend between a basic scatterplot, lattice scatterplot and a ggplot In a recent post I compared the Cairo packages with the base package for exporting graphs. Matt Neilson was kind enough to share in...

## Times per second benchmark

March 5, 2013
In GNU R the simplest way to measure execution time of a piece code is to use system.time. However, sometimes I want to find out how many times some function can be executed in one second. This is especially useful when we want to compare function...

## Le Monde puzzle [#810]

March 5, 2013
The current puzzle is as follows: Take a board with seven holes and seeds. The game starts with one player putting the seeds on the holes as he or she wishes. The other player picks a seed wherever. Then, alternatively, each player picks a seed in a hole contiguous to the previous one. The loser

## Predicted correlations and portfolio optimization

March 5, 2013
What effect do predicted correlations have when optimizing trades? Background A concern about optimization that is not one of “The top 7 portfolio optimization problems” is that correlations spike during a crisis which is when you most want optimization to work. This post looks at a small piece of that question.  It wonders if increasing predicted … Continue reading...

## Easily plotting grouped bars with ggplot #rstats

March 5, 2013
Summary This tutorial shows how to create diagrams with grouped bar charts or dot plots with ggplot. The groups can also be displayed as facet grids. Importing the data from SPSS All following examples are based on an imported SPSS … Weiterlesen →

## Load Balanced Parallelization with snowfall

March 5, 2013
For some reason, I didn't notice a few months ago the best way to perform a parallelized version of Lapply with package snowfall. We implemented the parallel version of function lapply with the function sfLapply, in the development of our pipeline p...

## Updating R from R (on Windows) – using the {installr} package

March 5, 2013
Upgrading R on Windows is not easy. While the R FAQ offer guidelines, some users may prefer to simply run a command in order to upgrade their R to the latest version. That is what the new {installr} package is …Read more »

## Create an R package from a single R file with roxyPackage

March 5, 2013
Documenting code can be a bit of a pain. Yet, the older (and wiser?) I get, the more I realise how important it is. When I was younger I said 'documentation is for people without talent'. Well, I am clearly loosing my talent, as I sometimes struggle to understand what I programmed years ago. Thus, anything that soothes the...