Blog Archives

Consecutive Numbers in Lottery Draws

March 2, 2014
By

A historian, a data scientist, a programmer, a mathematician, and a philosopher discuss the question, how likely it is that a lottery draw (6 out of 49) contains two consecutive numbers. The historian The historian argues that from 1955 up to 2011, there were 5026 lottery draws in Germany, every Saturday, and from 2000 on, two draws every...

Read more »

Unit conversion in R

May 17, 2013
By

Last weekend I submitted an update of my R package datamart to CRAN. It has been more than a half year since the last update, however there are only minor advances. The package is still in its early stages, and very experimental.One new feature is the function uconv. Think iconv, but instead of converting character vectors between different encodings,...

Read more »

Some of Excel’s Finance Functions in R

February 16, 2013
By

Last year I took a free online class on finance by Gautam Kaul. I recommend it, although there are other classes I can not compare it to. The instructor took great efforts in motivating the concepts, structuring the material, and enable critical thinking / intuition. I believe this is an advantage of video...

Read more »

ScraperWiki in R

July 29, 2012
By

ScraperWiki describes itself as an online tool for gathering, cleaning and analysing data from the web. It is a programming oriented approach, users can implement ETL processes in Python, PHP or Ruby, share these processes among the community (or pay for privacy) and schedule automated runs. The software behind the service is open source, and there is...

Read more »

Convenient access to Gapminder’s datasets from R

July 16, 2012
By
Convenient access to Gapminder’s datasets from R

In April, Hans Rosling examined the influence of religion on fertility. I used R to replicate a graphic of his talk:> library(datamart) > gm <- gapminder() > #queries(gm) > # > # babies per woman > tmp <- query(gm, "TotalFertilityRate") > babies <- as.vector(tmp) > names(babies) <- names(tmp) > babies <- babies > countries <- names(babies) > # > # income per capita, PPP adjusted > tmp <- query(gm, "IncomePerCapita") >...

Read more »

Querying DBpedia from R

June 24, 2012
By

DBpedia is an extract of structured information from wikipedia. The structured data can be retrieved using an SQL-like query language for RDF called SPARQL. There is already an R package for this kind of queries named SPARQL.There is an S4 class Dbpedia part of my datamart package that aims to support the creation of predefined parameterized queries. Here is...

Read more »

A wrapper for R’s data() function

June 19, 2012
By

The workflow for statistical analyses is discussed at several places. Often, it is recommended:never change the raw data, but transform it, keep your analysis reproducible, separate functions and data, use R package system as organizing structure. In some recent projects I tried an S4 class approach for this workflow, which I want to present and discuss. It makes use of...

Read more »

Working with strings

April 10, 2012
By

R has a lot of string functions, many of them can be found with ls("package:base", pattern="str"). Additionally, there are add-on packages such as stringr, gsubfn and brew that enhance R string processing capabilities. As a statistical language and environment, R has an edge compared to other programming languages when it comes to text mining algorithms or natural language processing....

Read more »

Berlin’s children

February 4, 2012
By
Berlin’s children

Few years ago, a newspaper claimed the block I live in — Prenzlauer Berg in Berlin — is the most fertile region in Europe. It was a hoax, as this (German) newspaper article points out. (The article has become quite famous because it coined the term Bionade Biedermeier to describe the life style in this area.)However,...

Read more »

Categorizing my expenses

January 28, 2012
By
Categorizing my expenses

In order to analyse my expenses, a classification scheme is necessary. I need to identify categories that are meaningful to me. I decided to go with the “Classification of Individual Consumption by Purpose” (COICOP), for three reasons:It is made by people who have thought more about consumption classification than I ever will. It is feasible to assign bank transactions...

Read more »