# Descriptive summary: Proportions of values in a vector #rstats

March 6, 2017
By

(This article was first published on R – Strenge Jacke!, and kindly contributed to R-bloggers)

When describing a sample, researchers in my field often show proportions of specific characteristics as description. For instance, proportion of female persons, proportion of persons with higher or lower income etc. Since it happens often that I like to know these characteristics when exploring data, I decided to write a function, `prop()`, which is part of my sjstats-package – a package dedicated to summary-functions, mostly for fit- or association-measures of regression models or descriptive statistics.

`prop()` is designed following a similar fashion like most functions of my sjmisc-package: first, the data; then an user-defined number of logical comparisons that define the proportions. A single comparison argument as input returns a vector, multiple comparisons return a tibble (where the first column contains the comparison, and the second the related proportion).

An examle from the mtcars dataset:

```library(sjstats)
data(mtcars)
# proportions of observations in mpg that are greater than 25
prop(mtcars, mpg > 25)
#> [1] 0.1875

prop(mtcars, mpg > 25, disp > 200, gear == 4)
#> # A tibble: 3 × 2
#>   condition   prop
#>
#> 1    mpg>25 0.1875
#> 2  disp>200 0.5000
#> 3   gear==4 0.3750```

The function also works on grouped data frames, and with labelled data. In the following example, we group a dataset on family carers by their gender and education, and then get the proportions of observations where care-receivers are at least moderately dependent and male persons. To get an impression of how the raw variables look like, we first compute simple frequency tables with `frq()`.

```library(sjmisc) # for frq()-function
data(efc)
frq(efc, e42dep)
#> # elder's dependency
#>
#>  val                label frq raw.prc valid.prc cum.prc
#>    1          independent  66    7.27      7.33    7.33
#>    2   slightly dependent 225   24.78     24.97   32.30
#>    3 moderately dependent 306   33.70     33.96   66.26
#>    4   severely dependent 304   33.48     33.74  100.00
#>    5                   NA   7    0.77        NA      NA

frq(efc, e16sex)
#> # elder's gender
#>
#>  val  label frq raw.prc valid.prc cum.prc
#>    1   male 296   32.60     32.85   32.85
#>    2 female 605   66.63     67.15  100.00
#>    3     NA   7    0.77        NA      NA

efc %>%
select(e42dep, c161sex, c172code, e16sex) %>%
group_by(c161sex, c172code) %>%
prop(e42dep > 2, e16sex == 1)

#> # A tibble: 6 × 4
#>   `carer's gender`    `carer's level of education` `e42dep>2` `e16sex==1`
#>
#> 1             Male          low level of education     0.6829      0.3659
#> 2             Male intermediate level of education     0.6590      0.3155
#> 3             Male         high level of education     0.7872      0.2766
#> 4           Female          low level of education     0.7101      0.4638
#> 5           Female intermediate level of education     0.5929      0.2832
#> 6           Female         high level of education     0.6881      0.2752```

So, within the group of male family carers with low level of education, 68.29% of care-receivers are moderately or severely dependent, and 36.59% of care-receivers are male. Within female family carers with high level of education, 68.81% of care-receivers are at least moderately dependent and 27.52% are male.

Tagged: R, rstats

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...