Blog Archives

Estimating the Size of a Demonstration

Estimating the Size of a Demonstration

Abstract Inspired by the recent March For Science we look into methods for the statistical estimation of the number of people participating in a demonstration organized as a march. In particular, we provide R code to reproduce the two on-the-spot counting method analysis of Yip et al. (2010) for the data of the July 1 March in Hong Kong...

Read more »

On a First Name Basis with Statistics Sweden

March 24, 2017
By
On a First Name Basis with Statistics Sweden

Abstract Jugding from recent R-Bloggers posts, it appears that many data scientists are concerned with scraping data from various media sources (Wikipedia, twitter, etc.). However, one should be aware that well structured and high quality datasets are available through state's and country's bureau of statistics. Increasingly these are offered to the public through direct database access, e.g., using a REST...

Read more »

Did Mary and John go West?

March 5, 2017
By
Did Mary and John go West?

Abstract As a final post in the baby-names-the-data-scientist's-way series, we use the US Social Security Administration 1910-2015 data to space-time visualize for each the most popular baby name for girls and boys, respectively. The code uses in parts the new simple features package (sf) in order to to get some first experience with the package. Creative Commons...</p><p><a href=Read more »

US Babyname Collisions 1880-2014

February 28, 2017
By
US Babyname Collisions 1880-2014

Abstract We use US Social Security Administration data to compute the probability of a name clash in a class of year-YYYY born kids during the years 1880-2014. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. The markdown+Rknitr source code of this blog is available under a

Read more »

Happy pbirthday class of 2016

February 12, 2017
By
Happy pbirthday class of 2016

Abstract Continuing the analysis of first names given to newborns in Berlin 2016, we solve the following problem: what is the probability, that in a school class of size \(n\) of these kids there will be at least two kids having the same first name? The answer to the problem for classes of size 26 is 34% and...

Read more »

Naming Uncertainty by the Bootstrap

February 5, 2017
By
Naming Uncertainty by the Bootstrap

Abstract Data on the names of all newborn babies in Berlin 2016 are used to illustrate how a scientific treatment of chance could enhance rank statements in, e.g., onomastics investigations. For this purpose, we first identify different stages of the naming-your-baby process, which are influenced by chance. Second, we compute confidence intervals for the ranks based on a bootstrap procedure...

Read more »

suRprise! – Classifying Kinder Eggs by Boosting

December 22, 2016
By
suRprise! – Classifying Kinder Eggs by Boosting

Abstract Carrying the Danish tradition of Juleforsøg to the realm of statistics, we use R to classify the figure content of Kinder Eggs using boosted regression trees for the egg's weight and possible rattling noises. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. The markdown+Rknitr source code...

Read more »

4×3 R-Hackathoning – The Finisher’s Guide

December 11, 2016
By
4×3 R-Hackathoning – The Finisher’s Guide

Abstract We present experiences from organizing a small R hackathon aimed at advancing knowledge and documentation of the R package surveillance. The hackathon was piggybacked on the ESCAIDE2016 conference visited by current and potential package users in the area of infectious disease epidemiology. The output of the hackathon is available at https://surveillancer.github.io/tutorials/. Creative Commons License

Read more »

Better Confidence Intervals for Quantiles

October 22, 2016
By
Better Confidence Intervals for Quantiles

\{\boldsymbol{\mathbf{#1}}} \DeclareMathOperator*{\argmin}{arg\,min} \DeclareMathOperator*{\argmax}{arg\,max} \] Abstract We discuss the computation of confidence intervals for the median or any other quantile in R. In particular we are interested in the interpolated order statistic approach suggested by Hettmansperger and Sheather (1986) and Nyblom (1992). In order to make the methods available to a greater audience we provide an implementation of these methods in...

Read more »

Cartograms with R

October 9, 2016
By
Cartograms with R

Abstract We show how to create cartograms with R by illustrating the population and age-distribution of the planning regions of Berlin by static plots and animations. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. The markdown+Rknitr source code of this blog is available under a

Read more »

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)