[This article was first published on Data Analysis in R » Quick Guide for Statistics & R » finnstats, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The post Stratified Sampling in R With Examples appeared first on finnstats.

Are you looking for the latest Data Science Job vacancies then click here The post Stratified Sampling in R With Examples appeared first on finnstats.

Researchers frequently take samples from a population and use the data from the sample to make generalizations about the entire population.

A typical sampling approach is stratified random sampling, which divides a population into groups and selects a random number of people from each category to be included in the sample.

This article shows you how to use R to achieve stratified random sampling.

Principal Component Analysis in R » finnstats

## Approach: Stratified Sampling in R

A corporation has 400 employees who are either freshers, juniors, mid-level employees, or senior employees.

Let’s say we want to obtain a stratified sample of 40 employees, with 10 employees from each level represented.

The following code explains how to create a 400-employee sample data frame.

With the help of set.seed, we can make this example repeatable.

`set.seed(1)`

Now let’s create a data frame

`data <- data.frame(Level = rep(c("freshers", "juniors", "mid-level", "Senior"), each=100),                 Score = rnorm(400, mean=45, sd=2.2))`

view the first six rows of a data frame

Free Data Science Books » EBooks » finnstats

```head(data)
Level    Score
1 freshers 46.81129
2 freshers 45.61885
3 freshers 47.13777
4 freshers 45.54551
5 freshers 45.06891
6 freshers 45.68639```

The following code demonstrates how to use the dplyr package’s group_by() and sample_n() methods to create a stratified random sample of 40 employees, with 10 employees from each Level.

`library(dplyr)`

To get a stratified sample from a data frame.

```stratified <- data %>%
group_by(Level) %>%
sample_n(size=10)```

To find the frequency of employees from each Level.

NLP Courses Online (Natural Language Processing) » finnstats

```table(stratified\$Score)
40.6277541808117 41.8867328984806 42.1225665842419 42.5233762802742 42.5544884803451
1                1                1                1                1
42.7536151417636 42.8846937474664 42.9742927968522 43.1218453854941 43.1558424722147
1                1                1                1                1
43.6575315133425 43.7415578635583 43.7732881183767 44.6932550551858 44.8755449387381
1                1                1                1                1
45.0020656995027 45.2668319456886 45.3899139820568 45.4797068293891 45.5017168903959
1                1                1                1                1
45.5455064157118 46.1478255944327 46.3450739535307 46.3836008714994 46.5858975045594
1                1                1                1                1
46.6546954492613 46.7620971328865 46.9493723718007 47.0418493618535 47.1284691388457
1                1                1                1                1
47.1753773706728 47.2486845777309 47.3834597232738 47.4520743699156 47.6813717922399
1                1                1                1                1
47.6916655311883 48.4030768433805 48.7269106424762 48.9858858605196 49.0114190243513
1                1                1                1                1```

Conclusions

We’ve discussed the most important sampling technique a data scientist should know in this article.

Remember that in machine learning, a well-generated sample can make all the difference because it allows us to work with less data while maintaining statistical significance.

To read more visit Stratified Sampling in R With Examples.

If you are interested to learn more about data science, you can find more articles here finnstats.

The post Stratified Sampling in R With Examples appeared first on finnstats.