# Stratified Sampling in R With Examples

**Data Analysis in R » Quick Guide for Statistics & R » finnstats**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The post Stratified Sampling in R With Examples appeared first on finnstats.

If you want to read the original article, click here Stratified Sampling in R With Examples.

Are you looking for the latest Data Science Job vacancies then click here The post Stratified Sampling in R With Examples appeared first on finnstats.

Researchers frequently take samples from a population and use the data from the sample to make generalizations about the entire population.

A typical sampling approach is stratified random sampling, which divides a population into groups and selects a random number of people from each category to be included in the sample.

This article shows you how to use R to achieve stratified random sampling.

Principal Component Analysis in R » finnstats

## Approach: Stratified Sampling in R

A corporation has 400 employees who are either freshers, juniors, mid-level employees, or senior employees.

Let’s say we want to obtain a stratified sample of 40 employees, with 10 employees from each level represented.

The following code explains how to create a 400-employee sample data frame.

With the help of set.seed, we can make this example repeatable.

set.seed(1)

Now let’s create a data frame

data <- data.frame(Level = rep(c("freshers", "juniors", "mid-level", "Senior"), each=100), Score = rnorm(400, mean=45, sd=2.2))

view the first six rows of a data frame

Free Data Science Books » EBooks » finnstats

head(data) Level Score 1 freshers 46.81129 2 freshers 45.61885 3 freshers 47.13777 4 freshers 45.54551 5 freshers 45.06891 6 freshers 45.68639

The following code demonstrates how to use the dplyr package’s group_by() and sample_n() methods to create a stratified random sample of 40 employees, with 10 employees from each Level.

library(dplyr)

To get a stratified sample from a data frame.

stratified <- data %>% group_by(Level) %>% sample_n(size=10)

To find the frequency of employees from each Level.

NLP Courses Online (Natural Language Processing) » finnstats

table(stratified$Score) 40.6277541808117 41.8867328984806 42.1225665842419 42.5233762802742 42.5544884803451 1 1 1 1 1 42.7536151417636 42.8846937474664 42.9742927968522 43.1218453854941 43.1558424722147 1 1 1 1 1 43.6575315133425 43.7415578635583 43.7732881183767 44.6932550551858 44.8755449387381 1 1 1 1 1 45.0020656995027 45.2668319456886 45.3899139820568 45.4797068293891 45.5017168903959 1 1 1 1 1 45.5455064157118 46.1478255944327 46.3450739535307 46.3836008714994 46.5858975045594 1 1 1 1 1 46.6546954492613 46.7620971328865 46.9493723718007 47.0418493618535 47.1284691388457 1 1 1 1 1 47.1753773706728 47.2486845777309 47.3834597232738 47.4520743699156 47.6813717922399 1 1 1 1 1 47.6916655311883 48.4030768433805 48.7269106424762 48.9858858605196 49.0114190243513 1 1 1 1 1

**Conclusions**

We’ve discussed the most important sampling technique a data scientist should know in this article.

Remember that in machine learning, a well-generated sample can make all the difference because it allows us to work with less data while maintaining statistical significance.

To read more visit Stratified Sampling in R With Examples.

If you are interested to learn more about data science, you can find more articles here finnstats.

The post Stratified Sampling in R With Examples appeared first on finnstats.

**leave a comment**for the author, please follow the link and comment on their blog:

**Data Analysis in R » Quick Guide for Statistics & R » finnstats**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.