46 search results for "ecdf"

The odds of a cluster of airplane accidents

August 2, 2014
By
The odds of a cluster of airplane accidents

Recently, there have been a lot of airplane accidents. July, 17th 2014, Hrabove, Ukraine, Malaysia Airlines, Boeing 777, fatalities 298 (/298) July, 23rd 2014, Magong, Taiwan, TransAsia Airways, ATR 72-500, fatalities 47 (/58) July, 24th 2014, Aguelhok, Mali, Air Algerie, Mc Donnell Douglas MD-83, fatalities 116 (/116) It is simple to find a lot of datasets about airplane crashes....

Read more »

Gender gap and visualisation challenge @ useR!2014

June 20, 2014
By
Gender gap and visualisation challenge @ useR!2014

7 days to go for submissions in the DataVis contest at useR!2014 (see contest webpage). Note that the contest is open for all R users, not only conference participants. Submit your solution soon! PISA dataset allows to challenge some ,,common opinions”, like are boys or girls better in math / reading. But, how to compare

Read more »

Take a look, it’s in a book: distribution of kindle e-book highlights

Take a look, it’s in a book: distribution of kindle e-book highlights

If you've ever started a book and not finished it, it may comfort you to know that you are not alone. It's hard to get accurate estimates of the percentage books that are discontinued, but the rise of e-reading (and resulting circumvention of privacy) affords us the opportunity to answer related questions. The kindle e-reading »more

Read more »

Author inflation in academic literature

April 6, 2014
By
Author inflation in academic literature

There seems to be a general consensus that author lists in academic articles are growing. Wikipedia says so, and I’ve also come across a published letter and short Nature article which accept this is the case and discuss ways of … Continue reading →

Read more »

Regression with multiple predictors

February 18, 2014
By

(This article was first published on Digithead's Lab Notebook, and kindly contributed to R-bloggers) Now that I'm ridiculously behind in the Stanford Online Statistical Learning class, I thought it would be fun to try to reproduce the figure on page 36 of the slides from chapter 3 or page 81 of the book. The result is a curvaceous surface...

Read more »

ggplot2: Cheatsheet for Visualizing Distributions

February 18, 2014
By
ggplot2: Cheatsheet for Visualizing Distributions

In the third and last of the ggplot series, this post will go over interesting ways to visualize the distribution of your data.

Read more »

From spreadsheet thinking to R thinking

January 7, 2014
By
From spreadsheet thinking to R thinking

Towards the basic R mindset. Previously The post “A first step towards R from spreadsheets” provides an introduction to switching from spreadsheets to R.  It also includes a list of additional posts (like this one) on the transition. Add two columns Figure 1 shows some numbers in two columns and the start of adding those The post From...

Read more »

2013 Summary

January 6, 2014
By
2013 Summary

2013 was a tough year. Trading was tough, with one of my strategies experiencing a significant drawdown. Research was tough – wasted a lot of time on machine learing techneques, without much to show for it. Also made some expensive mistakes, so all in all – it was a year I’d prefer I had avoided.

Read more »

Using R to replicate common SPSS multiple regression output

December 4, 2013
By

(This article was first published on Jeromy Anglim's Blog: Psychology and Statistics, and kindly contributed to R-bloggers) The following post replicates some of the standard output you might get from a multiple regression analysis in SPSS. A copy of the code in RMarkdown format is available on github. The post was motivated by this previous post that discussed using...

Read more »

Maximum Likelihood versus Goodness of Fit

November 8, 2013
By
Maximum Likelihood versus Goodness of Fit

Thursday, I got an interesting question from a colleague of mine (JP). I mean, the way I understood the question turned out to be a nice puzzle (but I have to confess I might have misunderstood). The question is the following : consider a i.i.d. sample of continuous variables. We would like to choose between two (parametric) families for...

Read more »