[This article was first published on R Archives » Data Science Tutorials, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The post Descriptive Statistics in R appeared first on Data Science Tutorials

Unravel the Future: Dive Deep into the World of Data Science Today! Data Science Tutorials.

Descriptive Statistics in R: A Step-by-Step Guide

Descriptive statistics are a crucial part of data analysis, as they provide a snapshot of the central tendency and variability of a dataset.

In R, there are two primary functions that can be used to calculate descriptive statistics: `summary()` and `sapply()`.

In this article, we will explore how to use these functions to gain a deeper understanding of our data.

Replace first match in R » Data Science Tutorials

Method 1: Using the `summary()` Function

The `summary()` function is a simple and efficient way to calculate various descriptive statistics for each variable in a data frame. To use this function, simply call it on your data frame, like so:

`summary(my_data)`

The `summary()` function will return a variety of values for each variable, including the minimum, first quartile, median, mean, third quartile, and maximum.

For example, let’s say we have the following data frame:

```df <- data.frame(x=c(1, 4, 4, 5, 6, 7, 10, 12),
y=c(2, 2, 3, 3, 4, 5, 11, 11),
z=c(8, 9, 9, 9, 10, 13, 15, 17))```

We can use the `summary()` function to calculate descriptive statistics for each variable:

`summary(df)`

This will output:

```       x                y                z
Min.   :1.000   Min.   :2.000   Min.   :8.00
1st Qu.:4.000   1st Qu.:2.750   1st Qu.:9.00
Median :5.500   Median :3.500   Median :9.50
Mean   :6.125   Mean   :5.125   Mean   :11.25
3rd Qu.:7.750   3rd Qu.:6.500   3rd Qu.:13.50
Max.   :12.000   Max.   :11.000   Max.   :17.00 ```

Method 2: Using the `sapply()` Function

The `sapply()` function is a more versatile option for calculating descriptive statistics. It allows us to specify a custom function to apply to each variable in the data frame.

For example, we can use the `sapply()` function to calculate the standard deviation of each variable:

`sapply(df, sd, na.rm=TRUE)`

This will output:

```       x        y        z
3.522884 3.758324 3.327376 ```

We can also use the `sapply()` function to calculate more complex descriptive statistics by defining a custom function within it.

For example, let’s say we want to calculate the range of each variable:

`sapply(df, function(df) max(df)-min(df), na.rm=TRUE)`

This will output:

```x      y      z
11     9     9 ```

Conclusion

In this article, we have explored two methods for calculating descriptive statistics in R: the `summary()` function and the `sapply()` function.

The `summary()` function provides a quick and easy way to calculate common descriptive statistics for each variable in a data frame.

The `sapply()` function offers more flexibility and allows us to define custom functions to calculate more complex descriptive statistics.

By using these functions effectively, we can gain a deeper understanding of our data and make more informed decisions about our analysis and visualization strategies.

The post Descriptive Statistics in R appeared first on Data Science Tutorials

Unlock Your Inner Data Genius: Explore, Learn, and Transform with Our Data Science Haven! Data Science Tutorials.