X is for By

April 27, 2018
By

[This article was first published on Deeply Trivial, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

X is for By Today’s post will be rather short, demonstrating a set of functions from the psych package, which allows you to conduct analysis by group. These commands add “By” to the end of existing functions. But first, a word of caution: With great power comes great responsibility. This function could very easily turn into a fishing expedition (also known as p-hacking). Conducting planned group comparisons is fine. Conducting all possible group comparisons and cherry-picking any differences is problematic. So use these group by functions with care.

Let’s pull up the Facebook dataset for this.

Facebook<-read.delim(file="full_facebook_set.txt", header=TRUE)

This is the full dataset, which includes all the variables I collected. I don’t want to run analyses on all variables, so I’ll pull out the ones most important for this blog post demonstration.

smallFB<-Facebook[,c(1:2,77:80,105:116,122,133:137,170,187)]

First, I’ll run descriptives on this smaller data frame by gender.

library(psych)
## Warning: package 'psych' was built under R version 3.4.4
describeBy(smallFB,smallFB$gender)
## 
## Descriptive statistics by group
## group: 0
## vars n mean sd median trimmed mad min
## RespondentId 1 73 164647.77 1711.78 164943.0 164587.37 2644.96 162373.0
## gender 2 73 0.00 0.00 0.0 0.00 0.00 0.0
## Rumination 3 73 37.66 14.27 37.0 37.41 13.34 8.0
## DepRelat 4 73 21.00 7.86 21.0 20.95 5.93 4.0
## Brood 5 73 8.49 3.76 9.0 8.42 2.97 1.0
## Reflect 6 73 8.16 4.44 8.0 8.24 4.45 0.0
## SavorPos 7 73 64.30 10.93 65.0 64.92 8.90 27.0
## SavorNeg 8 73 33.30 11.48 33.0 33.08 13.34 12.0
## SavorTot 9 73 31.00 20.15 34.0 31.15 19.27 -10.0
## AntPos 10 73 20.85 3.95 21.0 20.93 4.45 10.0
## AntNeg 11 73 11.30 4.23 11.0 11.22 4.45 4.0
## AntTot 12 73 9.55 6.90 10.0 9.31 7.41 -3.0
## MomPos 13 73 21.68 3.95 22.0 21.90 2.97 9.0
## MomNeg 14 73 11.45 4.63 11.0 11.41 5.93 4.0
## MomTot 15 73 10.23 7.63 11.0 10.36 8.90 -11.0
## RemPos 16 73 21.77 4.53 23.0 22.20 4.45 8.0
## RemNeg 17 73 10.55 4.39 9.0 10.27 4.45 4.0
## RemTot 18 73 11.22 8.05 14.0 11.68 7.41 -8.0
## LifeSat 19 73 24.63 6.80 25.0 24.93 7.41 10.0
## Extravert 20 73 4.32 1.58 4.5 4.33 1.48 1.5
## Agreeable 21 73 4.79 1.08 5.0 4.85 1.48 1.0
## Conscient 22 73 5.14 1.34 5.0 5.19 1.48 2.0
## EmotStab 23 73 5.10 1.22 5.0 5.15 1.48 1.0
## OpenExp 24 73 5.11 1.29 5.5 5.20 1.48 2.0
## Health 25 73 28.77 19.56 25.0 26.42 17.79 0.0
## Depression 26 73 10.26 7.27 9.0 9.56 5.93 0.0
## max range skew kurtosis se
## RespondentId 168279 5906.0 0.21 -1.36 200.35
## gender 0 0.0 NaN NaN 0.00
## Rumination 71 63.0 0.12 -0.53 1.67
## DepRelat 42 38.0 0.10 -0.04 0.92
## Brood 17 16.0 0.15 -0.38 0.44
## Reflect 19 19.0 -0.12 -0.69 0.52
## SavorPos 84 57.0 -0.69 0.76 1.28
## SavorNeg 57 45.0 0.14 -0.95 1.34
## SavorTot 72 82.0 -0.17 -0.75 2.36
## AntPos 28 18.0 -0.24 -0.46 0.46
## AntNeg 22 18.0 0.27 -0.55 0.49
## AntTot 24 27.0 0.11 -0.76 0.81
## MomPos 28 19.0 -0.69 0.55 0.46
## MomNeg 22 18.0 0.08 -0.98 0.54
## MomTot 24 35.0 -0.25 -0.55 0.89
## RemPos 28 20.0 -0.88 0.35 0.53
## RemNeg 22 18.0 0.56 -0.66 0.51
## RemTot 24 32.0 -0.53 -0.77 0.94
## LifeSat 35 25.0 -0.37 -0.84 0.80
## Extravert 7 5.5 -0.09 -0.93 0.19
## Agreeable 7 6.0 -0.60 1.04 0.13
## Conscient 7 5.0 -0.24 -0.98 0.16
## EmotStab 7 6.0 -0.60 0.28 0.14
## OpenExp 7 5.0 -0.49 -0.55 0.15
## Health 91 91.0 1.13 1.14 2.29
## Depression 36 36.0 1.02 0.95 0.85
## --------------------------------------------------------
## group: 1
## vars n mean sd median trimmed mad
## RespondentId 1 184 164373.49 1515.34 164388.00 164253.72 1891.80
## gender 2 184 1.00 0.00 1.00 1.00 0.00
## Rumination 3 184 38.09 15.28 40.00 38.16 17.05
## DepRelat 4 184 21.67 8.78 21.00 21.66 8.90
## Brood 5 184 8.57 4.14 8.50 8.47 3.71
## Reflect 6 184 7.84 4.06 8.00 7.73 4.45
## SavorPos 7 184 67.22 9.63 68.00 67.71 8.90
## SavorNeg 8 184 29.75 11.62 27.50 28.72 9.64
## SavorTot 9 184 37.47 19.30 40.00 38.66 20.02
## AntPos 10 184 22.18 3.37 23.00 22.28 2.97
## AntNeg 11 184 10.10 4.44 9.00 9.78 4.45
## AntTot 12 184 12.08 6.85 14.00 12.36 5.93
## MomPos 13 184 22.28 3.88 23.00 22.59 2.97
## MomNeg 14 184 10.60 4.88 9.50 10.13 5.19
## MomTot 15 184 11.68 7.75 13.00 12.29 7.41
## RemPos 16 184 22.76 3.85 23.00 23.10 2.97
## RemNeg 17 184 9.05 3.79 8.00 8.68 2.97
## RemTot 18 184 13.71 6.97 15.00 14.34 5.93
## LifeSat 19 184 23.76 6.25 24.00 24.18 7.41
## Extravert 20 184 4.66 1.57 5.00 4.74 1.48
## Agreeable 21 184 5.22 1.06 5.50 5.26 1.48
## Conscient 22 184 5.32 1.24 5.50 5.42 1.48
## EmotStab 23 184 4.70 1.31 4.75 4.75 1.11
## OpenExp 24 184 5.47 1.08 5.50 5.56 0.74
## Health 25 184 32.54 16.17 30.00 31.43 16.31
## Depression 26 184 12.19 8.48 9.00 11.09 5.93
## min max range skew kurtosis se
## RespondentId 162350.0 167714 5364.0 0.46 -0.90 111.71
## gender 1.0 1 0.0 NaN NaN 0.00
## Rumination 3.0 74 71.0 -0.05 -0.60 1.13
## DepRelat 0.0 42 42.0 0.00 -0.46 0.65
## Brood 0.0 19 19.0 0.19 -0.62 0.31
## Reflect 0.0 19 19.0 0.25 -0.48 0.30
## SavorPos 33.0 84 51.0 -0.59 0.36 0.71
## SavorNeg 12.0 64 52.0 0.79 0.25 0.86
## SavorTot -18.0 72 90.0 -0.57 -0.10 1.42
## AntPos 9.0 28 19.0 -0.49 0.41 0.25
## AntNeg 4.0 22 18.0 0.63 -0.39 0.33
## AntTot -8.0 24 32.0 -0.43 -0.48 0.50
## MomPos 10.0 28 18.0 -0.81 0.54 0.29
## MomNeg 4.0 24 20.0 0.81 -0.03 0.36
## MomTot -13.0 24 37.0 -0.69 -0.03 0.57
## RemPos 9.0 28 19.0 -0.87 0.81 0.28
## RemNeg 4.0 21 17.0 0.83 0.33 0.28
## RemTot -9.0 24 33.0 -0.82 0.50 0.51
## LifeSat 8.0 35 27.0 -0.53 -0.32 0.46
## Extravert 1.0 7 6.0 -0.36 -0.72 0.12
## Agreeable 2.5 7 4.5 -0.27 -0.63 0.08
## Conscient 1.0 7 6.0 -0.70 0.13 0.09
## EmotStab 1.5 7 5.5 -0.35 -0.73 0.10
## OpenExp 1.5 7 5.5 -0.91 0.62 0.08
## Health 2.0 85 83.0 0.60 -0.05 1.19
## Depression 0.0 39 39.0 1.14 0.66 0.62

In this dataset, I coded men as 0 and women as 1. The descriptive statistics table generated includes all scale and subscale scores, and gives me mean, standard deviation, median, a trimmed mean (dropping very low and very high values), median absolute deviation, minimum and maximum values, range, skewness, and kurtosis. I’d need to run t-tests to find out if differences were significant, but this still gives me some idea of how men and women might differ on these measures.

There are certain measures I included that we might hypothesize would show gender differences. For instance, some research suggests gender differences for rumination and depression. In addition to running descriptives by group, I might also want to display these differences in a violin plot. The psych package can quickly generate such a plot by group.

violinBy(smallFB,"Rumination","gender",grp.name=c("M","F"))
violinBy(smallFB,"Depression","gender",grp.name=c("M","F"))

ggplot2 will generate a violin plot by group, so this feature might not be as useful for final displays, but could help in quickly visualizing the data during analysis. And you may find that you prefer the appearance of this plots. To each his own.

Another function is error.bars.by, which plots means and confidence intervals by group for multiple variables. Again, this is a way to get some quick visuals, though differences in scale among measures should be taken into consideration when generating this plot. One set of variables for which this display might be useful is the 5 subscales of the Five-Factor Personality Inventory. This 10-item measure assesses where participants fall on the so-called Big Five personality traits: Openness to Experience, Conscientiousness, Extraversion, Agreeableness, and Neuroticism (Emotional Stability). These subscales are all on the same metric.

error.bars.by(smallFB[,c(20:24)],group=smallFB$gender,xlab="Big Five Personality Traits",ylab="Score on Subscale")

Finally, we have the statsBy function, which gives descriptive statistics by group as well as between group statistics. This functions generates a lot of output, and you can read more about everything it gives you here.

FBstats<-statsBy(smallFB[,2:26],"gender",cors=TRUE,method="pearson",use="pairwise")
print(FBstats,short=FALSE)
## Statistics within and between groups  
## Call: statsBy(data = smallFB[, 2:26], group = "gender", cors = TRUE,
## method = "pearson", use = "pairwise")
## Intraclass Correlation 1 (Percentage of variance due to groups)
## gender Rumination DepRelat Brood Reflect SavorPos
## 1.00 -0.01 -0.01 -0.01 -0.01 0.03
## SavorNeg SavorTot AntPos AntNeg AntTot MomPos
## 0.03 0.04 0.05 0.02 0.05 0.00
## MomNeg MomTot RemPos RemNeg RemTot LifeSat
## 0.00 0.01 0.02 0.05 0.04 0.00
## Extravert Agreeable Conscient EmotStab OpenExp Health
## 0.01 0.05 0.00 0.03 0.03 0.01
## Depression
## 0.01
## Intraclass Correlation 2 (Reliability of group differences)
## gender Rumination DepRelat Brood Reflect SavorPos
## 1.00 -22.34 -2.06 -50.93 -2.21 0.77
## SavorNeg SavorTot AntPos AntNeg AntTot MomPos
## 0.80 0.83 0.86 0.75 0.86 0.19
## MomNeg MomTot RemPos RemNeg RemTot LifeSat
## 0.39 0.46 0.68 0.87 0.84 -0.04
## Extravert Agreeable Conscient EmotStab OpenExp Health
## 0.60 0.88 0.05 0.80 0.81 0.60
## Depression
## 0.66
## eta^2 between groups
## Rumination.bg DepRelat.bg Brood.bg Reflect.bg SavorPos.bg
## 0.00 0.00 0.00 0.00 0.02
## SavorNeg.bg SavorTot.bg AntPos.bg AntNeg.bg AntTot.bg
## 0.02 0.02 0.03 0.02 0.03
## MomPos.bg MomNeg.bg MomTot.bg RemPos.bg RemNeg.bg
## 0.00 0.01 0.01 0.01 0.03
## RemTot.bg LifeSat.bg Extravert.bg Agreeable.bg Conscient.bg
## 0.02 0.00 0.01 0.03 0.00
## EmotStab.bg OpenExp.bg Health.bg Depression.bg
## 0.02 0.02 0.01 0.01
## Correlation between groups
## Rmnt. DpRl. Brd.b Rflc. SvrP. SvrN. SvrT. AntP. AntN. AntT.
## Rumination.bg 1
## DepRelat.bg 1 1
## Brood.bg 1 1 1
## Reflect.bg -1 -1 -1 1
## SavorPos.bg 1 1 1 -1 1
## SavorNeg.bg -1 -1 -1 1 -1 1
## SavorTot.bg 1 1 1 -1 1 -1 1
## AntPos.bg 1 1 1 -1 1 -1 1 1
## AntNeg.bg -1 -1 -1 1 -1 1 -1 -1 1
## AntTot.bg 1 1 1 -1 1 -1 1 1 -1 1
## MomPos.bg 1 1 1 -1 1 -1 1 1 -1 1
## MomNeg.bg -1 -1 -1 1 -1 1 -1 -1 1 -1
## MomTot.bg 1 1 1 -1 1 -1 1 1 -1 1
## RemPos.bg 1 1 1 -1 1 -1 1 1 -1 1
## RemNeg.bg -1 -1 -1 1 -1 1 -1 -1 1 -1
## RemTot.bg 1 1 1 -1 1 -1 1 1 -1 1
## LifeSat.bg -1 -1 -1 1 -1 1 -1 -1 1 -1
## Extravert.bg 1 1 1 -1 1 -1 1 1 -1 1
## Agreeable.bg 1 1 1 -1 1 -1 1 1 -1 1
## Conscient.bg 1 1 1 -1 1 -1 1 1 -1 1
## EmotStab.bg -1 -1 -1 1 -1 1 -1 -1 1 -1
## OpenExp.bg 1 1 1 -1 1 -1 1 1 -1 1
## Health.bg 1 1 1 -1 1 -1 1 1 -1 1
## Depression.bg 1 1 1 -1 1 -1 1 1 -1 1
## MmPs. MmNg. MmTt. RmPs. RmNg. RmTt. LfSt. Extr. Agrb. Cnsc.
## MomPos.bg 1
## MomNeg.bg -1 1
## MomTot.bg 1 -1 1
## RemPos.bg 1 -1 1 1
## RemNeg.bg -1 1 -1 -1 1
## RemTot.bg 1 -1 1 1 -1 1
## LifeSat.bg -1 1 -1 -1 1 -1 1
## Extravert.bg 1 -1 1 1 -1 1 -1 1
## Agreeable.bg 1 -1 1 1 -1 1 -1 1 1
## Conscient.bg 1 -1 1 1 -1 1 -1 1 1 1
## EmotStab.bg -1 1 -1 -1 1 -1 1 -1 -1 -1
## OpenExp.bg 1 -1 1 1 -1 1 -1 1 1 1
## Health.bg 1 -1 1 1 -1 1 -1 1 1 1
## Depression.bg 1 -1 1 1 -1 1 -1 1 1 1
## EmtS. OpnE. Hlth. Dprs.
## EmotStab.bg 1
## OpenExp.bg -1 1
## Health.bg -1 1 1
## Depression.bg -1 1 1 1
## Correlation within groups
## Rmnt. DpRl. Brd.w Rflc. SvrP. SvrN. SvrT. AntP. AntN. AntT.
## Rumination.wg 1.00
## DepRelat.wg 0.95 1.00
## Brood.wg 0.88 0.78 1.00
## Reflect.wg 0.80 0.63 0.59 1.00
## SavorPos.wg -0.20 -0.20 -0.18 -0.15 1.00
## SavorNeg.wg 0.43 0.43 0.36 0.30 -0.64 1.00
## SavorTot.wg -0.36 -0.36 -0.31 -0.25 0.89 -0.92 1.00
## AntPos.wg -0.06 -0.05 -0.08 -0.03 0.86 -0.49 0.73 1.00
## AntNeg.wg 0.32 0.32 0.28 0.21 -0.54 0.89 -0.80 -0.50 1.00
## AntTot.wg -0.23 -0.23 -0.21 -0.15 0.78 -0.82 0.89 0.83 -0.89 1.00
## MomPos.wg -0.26 -0.26 -0.22 -0.19 0.86 -0.60 0.80 0.60 -0.47 0.61
## MomNeg.wg 0.46 0.46 0.39 0.35 -0.51 0.88 -0.78 -0.33 0.66 -0.59
## MomTot.wg -0.42 -0.42 -0.36 -0.32 0.75 -0.85 0.89 0.51 -0.65 0.68
## RemPos.wg -0.20 -0.19 -0.17 -0.15 0.89 -0.56 0.79 0.66 -0.44 0.62
## RemNeg.wg 0.34 0.35 0.28 0.23 -0.65 0.87 -0.85 -0.49 0.69 -0.69
## RemTot.wg -0.29 -0.30 -0.25 -0.21 0.85 -0.79 0.90 0.63 -0.62 0.72
## LifeSat.wg -0.47 -0.47 -0.43 -0.31 0.54 -0.50 0.57 0.39 -0.33 0.41
## Extravert.wg -0.20 -0.19 -0.11 -0.20 0.34 -0.35 0.38 0.21 -0.29 0.29
## Agreeable.wg -0.18 -0.18 -0.20 -0.10 0.35 -0.45 0.45 0.28 -0.39 0.39
## Conscient.wg -0.25 -0.30 -0.20 -0.10 0.24 -0.21 0.25 0.16 -0.14 0.17
## EmotStab.wg -0.48 -0.44 -0.49 -0.34 0.34 -0.44 0.43 0.20 -0.33 0.32
## OpenExp.wg -0.16 -0.14 -0.21 -0.10 0.37 -0.31 0.37 0.27 -0.27 0.31
## Health.wg 0.44 0.47 0.36 0.29 -0.30 0.34 -0.35 -0.21 0.26 -0.27
## Depression.wg 0.57 0.58 0.49 0.38 -0.44 0.55 -0.55 -0.27 0.39 -0.39
## MmPs. MmNg. MmTt. RmPs. RmNg. RmTt. LfSt. Extr. Agrb. Cnsc.
## MomPos.wg 1.00
## MomNeg.wg -0.56 1.00
## MomTot.wg 0.86 -0.91 1.00
## RemPos.wg 0.65 -0.42 0.59 1.00
## RemNeg.wg -0.55 0.63 -0.67 -0.65 1.00
## RemTot.wg 0.66 -0.58 0.69 0.91 -0.91 1.00
## LifeSat.wg 0.55 -0.55 0.62 0.48 -0.42 0.49 1.00
## Extravert.wg 0.39 -0.37 0.43 0.28 -0.25 0.29 0.27 1.00
## Agreeable.wg 0.33 -0.43 0.43 0.31 -0.36 0.37 0.25 0.12 1.00
## Conscient.wg 0.25 -0.16 0.22 0.23 -0.26 0.26 0.33 0.03 0.29 1.00
## EmotStab.wg 0.40 -0.50 0.51 0.27 -0.32 0.32 0.44 0.12 0.41 0.27
## OpenExp.wg 0.39 -0.26 0.36 0.30 -0.28 0.32 0.34 0.29 0.36 0.14
## Health.wg -0.30 0.33 -0.36 -0.27 0.29 -0.31 -0.42 -0.10 -0.25 -0.24
## Depression.wg -0.45 0.56 -0.58 -0.41 0.49 -0.50 -0.65 -0.24 -0.29 -0.26
## EmtS. OpnE. Hlth. Dprs.
## EmotStab.wg 1.00
## OpenExp.wg 0.24 1.00
## Health.wg -0.31 -0.18 1.00
## Depression.wg -0.54 -0.28 0.56 1.00
##
## Many results are not shown directly. To see specific objects select from the following list:
## mean sd n F ICC1 ICC2 ci1 ci2 r within pooled sd.r raw rbg pbg rwg nw pwg etabg etawg nwg nG Call

The variance explained by gender is quite small for all of the variables. Instead, the relationships between the variables seem to be more meaningful.

A to Z is almost done! Just Y and Z, plus look for an A-to-Z-influenced Statistics Sunday post!

To leave a comment for the author, please follow the link and comment on their blog: Deeply Trivial.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)