Useful functions for data frames in R

[This article was first published on Software for Exploratory Data Analysis and Statistical Modelling » R Environment, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This post will consider some useful functions for dealing with data frames during data processing and validation.

Consider an artifical data set create using the expand.grid function where there are duplicate rows in the data frame.

> des = expand.grid(A = c(2,2,3,4), B = c(1,3,5,5,7))
> des
   A B
1  2 1
2  2 1
3  3 1
4  4 1
5  2 3
6  2 3
7  3 3
8  4 3
9  2 5
10 2 5
11 3 5
12 4 5
13 2 5
14 2 5
15 3 5
16 4 5
17 2 7
18 2 7
19 3 7
20 4 7

If we want to identify rows that are duplicates then the duplicated function comes in handy:

> duplicated(des)
 [1] FALSE  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE
 FALSE  TRUE FALSE FALSE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE FALSE FALSE

We can pick out the unique rows of the data frame with the following code:

> des[!duplicated(des),]
   A B
1  2 1
3  3 1
4  4 1
5  2 3
7  3 3
8  4 3
9  2 5
11 3 5
12 4 5
17 2 7
19 3 7
20 4 7

After loading a large file into a data frame we might be interested in checking some of the data to ensure that it is as expected. Rather than printing out the entirity of the data frame we can use the head and tail functions to view the top or bottom few rows of the data frame. An example using the rock data set that is available within R:

> head(rock)
  area    peri     shape perm
1 4990 2791.90 0.0903296  6.3
2 7002 3892.60 0.1486220  6.3
3 7558 3930.66 0.1833120  6.3
4 7352 3869.32 0.1170630  6.3
5 7943 3948.54 0.1224170 17.1
6 7979 4010.15 0.1670450 17.1
> tail(rock)
   area     peri    shape perm
43 5605 1145.690 0.464125 1300
44 8793 2280.490 0.420477 1300
45 3475 1174.110 0.200744  580
46 1651  597.808 0.262651  580
47 5514 1455.880 0.182453  580
48 9718 1485.580 0.200447  580

To leave a comment for the author, please follow the link and comment on their blog: Software for Exploratory Data Analysis and Statistical Modelling » R Environment.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)