380 search results for "boxplot"

Labeled outliers in R boxplot

Labeled outliers in R boxplot

Boxplots are a good way to get some insight in your data, and while R provides a fine ‘boxplot’ function, it doesn’t label the outliers in the graph. However, with a little code you can add labels yourself:The numbers plotted next to ...

Read more »

Use box plots to assess the distribution and to identify the outliers in your dataset

August 14, 2015
By
Use box plots to assess the distribution and to identify the outliers in your dataset

After you check the distribution of the data by ploting the histogram, the second thing to do is to look for outliers. Identifying the outliers is important becuase it might happen that an association you find in your analysis can be explained by the presence of outliers. The best tool to identify the outliers is

Read more »

15 Questions All R Users Have About Plots

July 30, 2015
By
15 Questions All R Users Have About Plots

R allows you to create different plot types, ranging from the basic graph types like density plots, dot plots, bar charts, line charts, pie charts, boxplots and scatter plots, to the more statistically complex types of graphs such as probability plots, mosaic plots and correlograms. In addition, R is pretty known for its data visualization The post

Read more »

Computing AIC on a Validation Sample

July 29, 2015
By
Computing AIC on a Validation Sample

This afternoon, we’ve seen in the training on data science that it was possible to use AIC criteria for model selection. > library(splines) > AIC(glm(dist ~ speed, data=train_cars, family=poisson(link="log"))) 438.6314 > AIC(glm(dist ~ speed, data=train_cars, family=poisson(link="identity"))) 436.3997 > AIC(glm(dist ~ bs(speed), data=train_cars, family=poisson(link="log"))) 425.6434 > AIC(glm(dist ~ bs(speed), data=train_cars, family=poisson(link="identity"))) 428.7195 And I’ve been asked...

Read more »

Why I use Panel/Multilevel Methods

July 24, 2015
By
Why I use Panel/Multilevel Methods

I don’t understand why any researcher would choose not to use panel/multilevel methods on panel/hierarchical data. Let’s take the following linear regression as an example: , where is a random effect for the i-th group. A pooled OLS regression model for the above is unbiased and consistent. However, it will be inefficient, unless for all

Read more »

R 101 – Aggregate By Quarter

July 14, 2015
By
R 101 – Aggregate By Quarter

We were asked a question on how to (in R) aggregate quarterly data from what I believe was a daily time series. This is a pretty common task and there are many ways to do this in R, but we’ll focus on one method using the zoo and dplyr packages. Let’t get those imports out of the way: library(dplyr) library(zoo) library(ggplot2) Now, we need...

Read more »

R Package to access the Open Movie Database (OMDB) API

July 10, 2015
By
R Package to access the Open Movie Database (OMDB) API

It’s not on CRAN yet, but there’s a devtools-installable R package for getting data from the OMDB API. It covers all of the public API endpoints: find_by_id: Retrieve OMDB info by IMDB ID search find_by_title: Retrieve OMDB info by title search get_actors: Get actors from an omdb object as a vector get_countries: Get countries from

Read more »

From cats to zombies, Wednesday at useR2015

July 1, 2015
By
From cats to zombies, Wednesday at useR2015

The morning opened with someone who I was too bleary eyed to work out who it was. Possibly the dean of the University of Aalborg. Anyway, he said that this is the largest ever useR conference, and the first ever in a Nordic country. Take that, Norway! Also, considering that there are now quite a

Read more »

Stop the madness – no more pie charts

July 1, 2015
By
Stop the madness – no more pie charts

There has been a trend in the last few years to put interesting-looking but non-informative figures in papers; the pie chart is the worst recurrent offender.  I have no idea how they keep getting included, as they’re famously misleading and awful.  I want my work to look as much like the cockpit of a mecha or … Continue reading...

Read more »

Visualization and Analysis of Reddit’s "The Button" Data

June 15, 2015
By
Visualization and Analysis of Reddit’s "The Button" Data

IntroductionPeople are weird. And if there's anything that's greater collective proof of this fact than Reddit, you'd be hard pressed to find it.I tend to put reddit in the same bucket as companies like Google, Amazon and Netflix, where they have enoug...

Read more »