Blog Archives

FizzBuzz in R and Python

April 21, 2019
By

In this post, we will solve a simple problem (called "FizzBuzz") that is asked by some employers in data scientist job interviews. The question seeks to ascertain the applicant's familiarity with basic programming concepts. We will see 2 different ways to solve the problem in 2 different statistical programming languages: R and Python.The FizzBuzz Question I came across the FizzBuzz...

Read more »

A Tale of Two (Small Belgian) Cities with Open Data: Official Crime Statistics and Self-Reported Feelings of Safety in Leuven and Vilvoorde

February 25, 2019
By
A Tale of Two (Small Belgian) Cities with Open Data: Official Crime Statistics and Self-Reported Feelings of Safety in Leuven and Vilvoorde

In this post, we will analyze government data from the Flemish region in Belgium on A) official crime statistics and B) self-reported feelings of safety among residents of Flanders. We will focus our analysis on two cities in the province of Flemish Brabant: Leuven and Vilvoorde. A key question of this analysis is: do the residents of the safer...

Read more »

Linguistic Signals of Album Quality: A Predictive Analysis of Pitchfork Review Scores Using Quanteda

January 10, 2019
By
Linguistic Signals of Album Quality: A Predictive Analysis of Pitchfork Review Scores Using Quanteda

In this post we will return to the Pitchfork music review data, parts of which I've analyzed in previous posts. Our goal here will be to use text mining and natural language processing (NLP) to understand linguistic signals of album quality. This type of analysis helps us understand what Pitchfork reviewers appreciate or dislike, and gives us a sense...

Read more »

Multilevel Modeling Solves the Multiple Comparison Problem: An Example with R

October 31, 2018
By
Multilevel Modeling Solves the Multiple Comparison Problem: An Example with R

Multiple comparisons of group-level means is a tricky problem in statistical inference. A standard practice is to adjust the threshold for statistical significance according to the number of pairwise tests performed. For example, according to the widely-known Bonferonni method, if we have 3 different groups for which we want to compare the means of a given variable, we would...

Read more »

Differences in Word Use Across Music Genres in Pitchfork Album Reviews

September 22, 2018
By
Differences in Word Use Across Music Genres in Pitchfork Album Reviews

In this post we will return to the data on Pitchfork music reviews, parts of which I've analyzed previously. The goal of this post will be to gain an understanding of distinctive words in the reviews of albums of different musical genres. This type of analysis helps us understand the musical aspects that distinguish written descriptions of the music...

Read more »

Sentiment Use Across the Course of Pitchfork Music Reviews: A Tidy Text Analysis with R

June 6, 2018
By
Sentiment Use Across the Course of Pitchfork Music Reviews: A Tidy Text Analysis with R

In this post, we'll return to the Kaggle data containing information on Pitchfork music reviews. In a previous post, I used this dataset to cluster music genres. In the current post, I will use R and the tidytext package (and philosophy) to examine the text of the music reviews. Specifically, the goal of the analysis described in this post...

Read more »

Anscombe’s Quartet: 1980’s Edition

January 7, 2018
By
Anscombe’s Quartet: 1980’s Edition

In this post, I'll describe a fun visualization of Anscombe's quartet I whipped up recently.If you aren't familiar with Anscombe's quartet, here's a brief description from its Wikipedia entry: "Anscombe's quartet comprises four datasets that have nearly identical simple descriptive statistics, yet appear very different when graphed. Each dataset consists of eleven (x,y) points. They were constructed in 1973...

Read more »

Clustering Music Genres with R

December 7, 2017
By
Clustering Music Genres with R

In a number of upcoming posts, I'll be analyzing an interesting dataset I found on Kaggle. The dataset contains information on 18,393 music reviews from the Pitchfork website. The data cover reviews posted between January 1999 and January 2016. I downloaded the data and did an extensive data munging exercise to turn the data into a tidy dataset for...

Read more »

Sensographics and Mapping Consumer Perceptions Using PCA and FactoMineR

September 10, 2017
By
Sensographics and Mapping Consumer Perceptions Using PCA and FactoMineR

In the last post, we focused on the preparation of a tidy dataset describing consumer perceptions of beverages. In this post, I'll describe some analyses I've been doing of these data, in order to better understand how consumers perceive the beverage category. This type of analysis is often used in sensographics- companies who produce food products (chocolate, sauces, etc.)...

Read more »

Showing Some Respect for Data Munging

August 1, 2017
By
Showing Some Respect for Data Munging

In this post, I'd like to focus on data munging, e.g. the process of acquiring and arranging data (typically in a tidy manner) prior to data analysis. It's common knowledge that data scientists spend an enormous amount of time munging data, but data analysis, modeling, and visualization get most of the attention at presentations, on blogs and in the...

Read more »

Search R-bloggers

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)