# Useful functions for data frames

August 9, 2010
By

(This article was first published on Software for Exploratory Data Analysis and Statistical Modelling, and kindly contributed to R-bloggers)

The R software system is primarily command line based so when there are large sets of data it is not easy to browse the data frames. There are various useful functions for working with data frames.

For example, after loading data from a text file we might want to view the first few lines of a set of data. The functions head and tail return the first or last parts of a vector, matrix, table, data frame or function.

Consider the Orange data set that is available in R. We can view the first few lines

> head(Orange) Tree age circumference 1 1 118 30 2 1 484 58 3 1 664 87 4 1 1004 115 5 1 1231 120 6 1 1372 142

or the last few lines:

> tail(Orange) Tree age circumference 30 5 484 49 31 5 664 81 32 5 1004 125 33 5 1231 142 34 5 1372 174 35 5 1582 177

Another useful function is str, which compactly displays the internal structure of an R object. On this set of data we get:

> str(Orange) Classes ‘nfnGroupedData’, ‘nfGroupedData’, ‘groupedData’ and 'data.frame': 35 obs. of 3 variables: $Tree : Ord.factor w/ 5 levels "3"<"1"<"5"<"2"<..: 2 2 2 2 2 2 2 4 4 4 ...$ age : num 118 484 664 1004 1231 ... $circumference: num 30 58 87 115 120 142 145 33 69 111 ... - attr(*, "formula")=Class 'formula' length 3 circumference ~ age | Tree .. ..- attr(*, ".Environment")=<environment: R_EmptyEnv> - attr(*, "labels")=List of 2 ..$ x: chr "Time since December 31, 1968" ..$y: chr "Trunk circumference" - attr(*, "units")=List of 2 ..$ x: chr "(days)" ..\$ y: chr "(mm)"

There is quite a bit of additional information attached to this data frame, mainly due to it having more than one class.