What to expect from Strata Conference 2015? An empirical outlook.

February 9, 2015 | 0 Comments

In one week, the 2015 edition of Strata Conference (or rather: Strata + Hadoop World) will open its doors to data scientists and big data practitioners from all over the world. What will be the most important big data technology trends for this year? As last year, I ran an analysis on ... [Read more...]

Anomaly Detection with Wikipedia Page View Data

January 8, 2015 | 0 Comments

Today, the Twitter engineering team released another very interesting Open Source R package for working with time series data: “AnomalyDetection“. This package uses the Seasonal Hybrid ESD (S-H-ESD) algorithm to identify local anomalies (= variations inside seasonal patterns) and global anomalies (= variations that cannot be explained with seasonal patterns). As a [...] [Read more...]

The Top 7 Beautiful Data Blog Posts in 2014

January 3, 2015 | 0 Comments

2014 was a great year in data science – and also an exciting year for me personally from a very inspirational Strata Conference in Santa Clara to a wonderful experience of speaking at PyData Berlin to founding the data visualization company DataLion. But it also was a great year blogging about data ... [Read more...]

Querying the Bitcoin blockchain with R

January 2, 2015 | 0 Comments

The crypto-currency Bitcoin and the way it generates “trustless trust” is one of the hottest topics when it comes to technological innovations right now. The way Bitcoin transactions always backtrace the whole transaction list since the first discovered block (the Genesis block) does not only work for finance. The first ... [Read more...]

2014 highlight: Statistical Learning course by Hastie & Tibshirani

January 1, 2015 | 0 Comments

What I like most about the R and Python developer and user communities, is their incredible openness and generosity. One of the finest examples in the past year was the online course “Statistical Learning” taught by Stanford professors Trevor Hastie and Rob Tibshirani. In this MOOC they explain very understandably (... [Read more...]

Analyzing VC investment strategies with Crunchbase data

April 5, 2014 | 0 Comments

If you look at the investments in Big Data companies in the last few years, one thing is obvious: This is a very dynamic and fast growing market. I am producing regular updates of this network map of Big Data investments with a Python program (actually an IPython Notebook). But ... [Read more...]

Animated Twitter Networks

November 11, 2013 | 0 Comments

In this blogpost I presented a visualization made with R that shows how almost the whole world expresses its attention to political crises abroad. Here’s another visualization with Tweets in October 2013 that referred to the Lampedusa tragedy in the Mediterranean. But this transnational public space isn’t quite as ... [Read more...]

Mining Research Interests – or: What Would Google Want to Know?

October 25, 2013 | 0 Comments

I am a regular visitor of Google’s research page where they post all of their latest and upcoming scientific papers. Lately I have thought whether it would be possible to statistically extract some of the meta-information from the papers. Here’s the result of the analysis of the papers’ ... [Read more...]

Cosmopolitan Public Spaces

June 2, 2013 | 0 Comments

In my PhD and post-doc research projects at the university, I did a lot of research on the new cosmopolitanism together with Ulrich Beck. Our main goal was to test the hypothesis of an “empirical cosmopolitanization”. Maybe the term is confusing and too abstract, but what we were looking for ... [Read more...]

Mapping a Revolution

June 1, 2013 | 0 Comments

Twitter has become an important communications tool for political protests. While mass media are often censored during large-scale political protests, Social Media channels remain relatively open and can be used to tell the world what is happening and to mobilize support all over the world. From an analytic perspective tweets ... [Read more...]

Color analysis of Flickr images

May 1, 2013 | 0 Comments

Since I’ve seen this beautiful color wheel visualizing the colors of Flickr images, I’ve been fascinated with large scale automated image analysis. At the German Market Research association’s conference in late April, I presented some analyses that went in the same direction (click to enlarge): On the ... [Read more...]

Why the 2012 US elections are more exciting than 2008

November 4, 2012 | 0 Comments

Here’s an addition to my last post on using Wikipedia data to analyse attention for the US presidential elections 2012. Here’s another look at the interest not for the candidates’ Wikipedia pages but the general pages for the elections 2008 and 2012. Compared to the candidates’ pages, the attention for the ... [Read more...]

Wikipedia Attention and the US elections

November 3, 2012 | 0 Comments

One of the most interesting challenges of data science are predictions for important events such as national elections. With all those data streams of billions of posts, comments, likes, clicks etc. there should be a way to identify the most important correlations to make predictions about real-world behavior such as: ... [Read more...]

