Attractive but hard-to-read graph could be made much much better

April 20, 2011
By

Matthew Yglesias shares this graph from the Economist: I hate this graph. OK, sure, I don't hate hate hate hate it: it's not a 3-d exploding pie chart or anything. It's not misleading, it's just extremely difficult to read. Basically,...

Read more »

The R code for those time-use graphs

April 20, 2011
By

By popular demand, here's my R script for the time-use graphs:...

Read more »

Day #27 A lot of graphics in one place

April 20, 2011
By

assignment in R Today my internship-promotor gave me the assignment to create this chart in R. This means: I get a lot of data and put a certain column on a barchart for each plate. On top of that data, you place 2 errorbars. At first I thought, piece ...

Read more »

Spreadsheet errors

April 20, 2011
By
Spreadsheet errors

For my sins, I have done more than my fair share of analysis in Excel. I am quite capable of building and maintaining 130Mb spreadsheets (I had a dozen of them for one client). Excel is pretty much installed everywhere, so it is sometimes the only way to get started getting commercial value of the data in the...

Read more »

Spreadsheet errors

April 20, 2011
By
Spreadsheet errors

For my sins, I have done more than my fair share of analysis in Excel. I am quite capable of building and maintaining 130Mb spreadsheets (I had a dozen of them for one client). Excel is pretty much installed everywhere, so it is sometimes the only way...

Read more »

Spreadsheet errors

April 20, 2011
By
Spreadsheet errors

For my sins, I have done more than my fair share of analysis in Excel. I am quite capable of building and maintaining 130Mb spreadsheets (I had a dozen of them for one client). Excel is pretty much installed everywhere, so it is sometimes the only way...

Read more »

Transaction cost analysis and pre-trade analysis

April 20, 2011
By
Transaction cost analysis and pre-trade analysis

Transaction cost analysis (TCA) is the framework to achieve best execution in trading context. TCA can be split into three groups: pre-trade analysis, intraday analysis, and post-trade measurement. Pre-trade analysis allows us to get insight about the future volatility of the price, forecast intra-day and daily volumes, market impact. It evaluates all strategies and advises

Read more »

Custom Labels for Ordination Diagram

April 20, 2011
By
Custom Labels for Ordination Diagram

Here is how you do custom labels, hull, spider in a vegan ordination diagram: Read more »

Read more »

Aggregate Function in R: Making your life easier, one mean at a time

April 20, 2011
By
Aggregate Function in R: Making your life easier, one mean at a time

I previously posted about calculating medians using R. I used tapply to do it, but I’ve since found something that feels easier to use (at least to me). ?Download download.txt1 2 3 aggregated_output = aggregate(DV ~ IV1 * IV2, data=data_to_aggregate, FUN=median) aggregated_output The above code saves an aggregated dataset to aggregated_output and gives you the

Read more »

Common Data Creation Commands

April 19, 2011
By
Common Data Creation Commands

Here is a video tutorial where I go through some of the most commonly used commands in creating and manipulating data. As soon as I want to do more than just running a single regression, I use these commands more than any other set of commands (in som...

Read more »

Common Data Creation Commands

April 19, 2011
By
Common Data Creation Commands

Here is a video tutorial where I go through some of the most commonly used commands in creating and manipulating data. As soon as I want to do more than just running a single regression, I use these commands more than any other set of commands (in som...

Read more »

Simplifying polygon shapefiles in R

April 19, 2011
By
Simplifying polygon shapefiles in R

Recently I downloaded the Crosby Code shapefile from Landcare Research's LRIS server for use in some publications I'm preparing. This shapefile is incredibly detailed, far more so than what I require. This detail means that it takes a while for the map to be plotted each time. As detail is less important for me than speed of...

Read more »

Simplifying polygon shapefiles in R

April 19, 2011
By
Simplifying polygon shapefiles in R

Recently I downloaded the Crosby Code shapefile from Landcare Research's LRIS server for use in some publications I'm preparing. This shapefile is incredibly detailed, far more so than what I require. This detail means that it takes a while for the map to be plotted each time. As detail is less important for me than speed of...

Read more »

250 years of Bayes’ Theorem

April 19, 2011
By

The Reverend Thomas Bayes died 250 years ago this month. His grave, located near epidemiological centre of excellence St Mary's College, remains a point of pilgrimage for statisticians (of both Bayesian and Frequentist stripes) visiting London to this day. Because since then, Bayes Theorem has been the underpinning of predictive analytics applications from spam detection to medical alerts. There...

Read more »

How Kaggle competitors use R

April 19, 2011
By
How Kaggle competitors use R

The competitive data prediction competitions hosted by Kaggle require data scientists to bring their A game: the competition is intense, and competitors know in real time from the daily leaderboards how their predictions compare in accuracy to those of their rivals. So it's no surprise that open-source R, the most powerful statistics language, is a common tool of choice...

Read more »

Barron’s Spring 2008 Big Money Poll

April 19, 2011
By
Barron’s Spring 2008 Big Money Poll

Barron's April 28, 2008, Cover Story "Back in the Pool" offers a great hindsight look at our wonderful foresight: “AND NOW, FOR SOME GOOD NEWS: THE OTHER SHOE isn't going to drop. After a winter of discontent marked by massive write-offs on Wall Str...

Read more »

Example 8.35: Grab true (not pseudo) random numbers; passing API URLs to functions or macros

April 19, 2011
By
Example 8.35: Grab true (not pseudo) random numbers; passing API URLs to functions or macros

Usually, we're content to use a pseudo-random number generator. But sometimes we may want numbers that are actually random-- an example might be for randomizing treatment status in a randomized controlled trial.The site Random.org provides truly rando...

Read more »

NBA, Logistic Regression, and Mean Substitution

April 19, 2011
By
NBA, Logistic Regression, and Mean Substitution

I’m currently sitting at about 32K feet above sea level on my way from Tampa International to DIA and my options … Continue reading →

Read more »

RStudio, Revolution Analytics and Deducer: A Tale of Three GUIs

April 19, 2011
By

I’m in the process of moving from SPSS to R at the moment. It’s not been the easiest of rides, but then learning how to do a core part of your job never really should be. It’s been fun, though – don’t get me wrong – it’s definitely been an adventure!! Here I’m going to

Read more »

Day #25-26 R is soo static!

April 19, 2011
By

Today I stumbled upon a very nice package called “rgl”. For documentation and demos, take a look at it’s website. Rgl is: quoted by rgl site itself: The rgl package is a visualization device system for R, using OpenGL as the rendering...

Read more »

Day #25-26 R is soo static!

April 19, 2011
By

Today I stumbled upon a very nice package called “rgl”. For documentation and demos, take a look at it’s website. Rgl is: quoted by rgl site itself: The rgl package is a visualization device system for R, using OpenGL as the rendering...

Read more »

Flu Trends

April 18, 2011
By
Flu Trends

Not a model, but certainly Mickey Mousey: here’s some R code that plots Google’s US flu data:df <- read.csv(url("http://www.google.org/flutrends/us/data.txt"), skip=11)df$Date <- as.Date(df$Date)dev.new(height=8, width=12)# Leave a thin outer...

Read more »

Mickey Mouse Models

April 18, 2011
By
Mickey Mouse Models

My statistics professor once drew a little Markov chain on the board and called it “just a Mickey Mouse model,” because it was too simple to represent anything serious.

Read more »

pre-generate pictures of your knitting

April 18, 2011
By
pre-generate pictures of your knitting

This was a birthday present for my spouse. (Don't worry--I also covered a lot of things -- fruit/nuts/cocoa puffs/etc -- in chocolate. But I think both were appreciated!)Sometimes p...

Read more »

pre-generate pictures of your knitting

April 18, 2011
By
pre-generate pictures of your knitting

This was a birthday present for my spouse. (Don't worry--I also covered a lot of things -- fruit/nuts/cocoa puffs/etc -- in chocolate. But I think both were appreciated!)Sometimes p...

Read more »

A Population Regression

April 18, 2011
By
A Population Regression

Here's a video on some of the theory behind simple linear regression.There's no R involved with this video, but the video provides some theory behind what it is that R's lm() command estimates.

Read more »

A Population Regression

April 18, 2011
By
A Population Regression

Here's a video on some of the theory behind simple linear regression.There's no R involved with this video, but the video provides some theory behind what it is that R's lm() command estimates.

Read more »

Details of two-way sync between two Ubuntu machines

April 18, 2011
By
Details of two-way sync between two Ubuntu machines

In a previous post I discussed my frustrations with trying to get Dropbox or Spideroak to perform BOTH encrypted remote backup and AND fast two way file syncing. This is the detail of how I set up for two machines, both Ubuntu 10.10, to perform two way sync where a file change on either machine

Read more »

GEOSTAT 2011 — Canberra

April 18, 2011
By
GEOSTAT 2011 — Canberra

Just got back from the 2011 GEOSTAT summer school that recently took place in Canberra, Australia. Thanks to Tom Hengl for the invitation to co-teach the course, to the great folks at ANU who made it possible, and to all of the students who participat...

Read more »