Visualizing a One-Way ANOVA using D3.js

May 31, 2013
By

(This article was first published on R Psychologist, and kindly contributed to R-bloggers)

A while ago I was playing around with the JavaScript package D3.js,
and I began with this visualization—that I never really finished—of how
a one-way ANOVA is calculated. I wanted to make the visualization
interactive, and I did integrate some interactive elements. For
instance, if you hover over a data point it will show the residual, and
its value will be highlighted in the combined computation. The circle
diagram show the partitioning of the sums of squares, and if you hover a
part it will show from where the variation is coming. I tried to make
the plots look like plots from the R-package ggplot2.

These plots are not designed to work on mobile phones.

Let’s check the calculations in R

To se if this works, let’s compute the ANOVA as I have described it
here.

1
2
3
4
# data  
grp1 <- c(1,2,3,4)  
grp2 <- c(5,6,7,8)  
grp3 <- c(9,10,11,12)
1
2
3
# total SS  
total_SS <- sum((c(grp1, grp2, grp3) - mean(c(grp1, grp2, grp3)))^2)  
total_SS  
1
[1] 143
1
2
3
# within groups SS  
within_SS <- sum((c(grp1 - mean(grp1), grp2 - mean(grp2), grp3 - mean(grp3)))^2)  
within_SS
1
2
3
# within groups SS  
within_SS <- sum((c(grp1 - mean(grp1), grp2 - mean(grp2), grp3 - mean(grp3)))^2)  
within_SS
1
[1] 15
1
2
3
# between groups  
between_SS <- 4*(sum((c(mean(grp1), mean(grp2), mean(grp3))^2 - mean(df$y)^2)))  
between_SS  
1
[1] 128
1
2
3
4
# check calculation  
between_SS + within_SS == total_SS  

[1] TRUE

We see that total_SS, between_SS and within_SS are identical to
what is shown above in the visualization.

1
2
3
4
df1 <- 3-1 # number of groups - 1  
df2 <- 12 - 3 # N - number of groups  
F <- (between_SS/df1) / (within_SS/df2)  
F
1
[1] 38.4
1
1-pf(F, df1, df2) # p-value  
1
[1] 3.921015e-05

Let’s compare this to anova()

1
2
3
df <- data.frame(y=c(grp1,grp2,grp3))  
df$group <- gl(3,4)  
anova(lm(y ~ group, df))  
Analysis of Variance Table

Response: y
          Df Sum Sq Mean Sq F value    Pr(>F)    
group      2    128  64.000    38.4 3.921e-05 ***
Residuals  9     15   1.667                      
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

We have identical results.

To leave a comment for the author, please follow the link and comment on their blog: R Psychologist.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)