# 557 search results for "boxplot"

## Examining Data Exercises

September 6, 2016
By

One of the first steps of data analysis is the descriptive analysis; this helps to understand how the data is distributed and provides important information for further steps. This set of exercises will include functions useful for one variable descriptive analysis, including graphs. Before proceeding, it might be helpful to look over the help pages

## ggplot2 (ggplot) Introduction

August 26, 2016
By

In this post I’ll briefly introduce how to use ggplot2 (ggplot), which by default makes nicer looking plots than the standard R plotting functions. The first thing to know is that ggplot requires data frames work properly. It is an entirely different framework from the standard plotting functions in R. Let’s grab a default data

## Gotta catch them all

August 21, 2016
By

Introduction When data becomes high-dimensional, the inherent relational structure between the variables can sometimes become unclear or indistinct. One, might want to find clusters for numerous amounts of reasons - me, I want to use it to better unde...

August 2, 2016
By

## Kernels for everyone!

August 1, 2016
By

During my dissertation, I spent a lot of time working on spatial kernel estimates. Where spatial kernel estimates are defined as a convolution of a spatial suppport , A simple example of this estimate is a Gaussian filter or blur in more common parlance. In the Guassian filter, is the normal density function , with...

## Does sentiment analysis work? A tidy analysis of Yelp reviews

July 21, 2016
By

This year Julia Silge and I released the tidytext package for text mining using tidy tools such as dplyr, tidyr, ggplot2 and broom. One of the canonical examples of tidy text mining this package makes possible is sentiment analysis. Sentiment analysis is often used by companies to quantify general social media opinion (for...

## Monte Carlo Analysis of Manning’s Equation: A Shiny App

July 20, 2016
By

Monte Carlo analysis is a great way to explore the impact of input variable uncertainty on the results of engineering equations, and with vector variables and distribution and sampling functions at its core, R is a natural platform for this analysis. During a recent rainy vacation, I built a Shiny app that applies...

## Escalating Life Expectancy

July 18, 2016
By

I’ve added mortality data to the lifespan package. A result that immediately emerges from these data is that average life expectancy is steadily climbing. The effect is more pronounced for men, rising from around 66.5 in 1994 to 70.0 in 2014. The corresponding values for women are 74.6 and 76.5 respectively. Good news for everyone.

## Birth Month by Gender

July 16, 2016
By

Based on some feedback to a previous post I normalised the birth counts by the (average) number of days in each month. As pointed out by a reader, the results indicate a gradual increase in the number of conceptions during (northern hemisphere) Autumn and Winter, roughly up to the end of December. Normalising the data

## Most Probable Birth Month

July 14, 2016
By

In a previous post I showed that the data from www.baseball-reference.com support Malcolm Gladwell’s contention that more professional baseball players are born in August than any other month. Although this might be explained by the 31 July cutoff for admission to baseball leagues, it was suggested that it could also be linked to a larger