str Implementation for Data Frames

June 5, 2014
By

(This article was first published on Jason.Bryer.org Blog - R, and kindly contributed to R-bloggers)

The str function is perhaps the most useful function in R. It provides great information about the structure of some object. When I teach R, especially for those coming from SPSS, the str function for data frames provides the information they are use to seeing on the variable view tab. However, sometimes I want to display the information str returns in a better format (e.g. as an HTML or LaTeX table). I wrote a function, strtable that provides the information str.data.frame does but returns the results as a data.frame. This provides much more flexibility for controlling how the output is formatted. Specifically, it will return a data.frame with four columns: variable, class, levels, and examples.

The function can be sourced from Gist using the devtools package.


devtools::source_gist('4a0a5ab9fe7e1cf3be0e')

For the first example, we’ll use the iris data frame.


data(iris)
str(iris)

## 'data.frame':	150 obs. of  5 variables:
##  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
##  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
##  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
##  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
##  $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...

The strtable has five parameters:

  • n the first n element to show
  • width maximum width in characters for the examples to show
  • n.levels the first n levels of a factor to show.
  • width.levels maximum width in characters for the number of levels to show.
  • factor.values function defining how factor examples should be printed. Possible values are as.character or as.integer.

print(strtable(iris), na.print='')

##      variable              class                              levels
##  Sepal.Length            numeric                                    
##   Sepal.Width            numeric                                    
##  Petal.Length            numeric                                    
##   Petal.Width            numeric                                    
##       Species Factor w/ 3 levels "setosa", "versicolor", "virginica"
##                                     examples
##                      5.1, 4.9, 4.7, 4.6, ...
##                        3.5, 3, 3.2, 3.1, ...
##                      1.4, 1.4, 1.3, 1.5, ...
##                      0.2, 0.2, 0.2, 0.2, ...
##  "setosa", "setosa", "setosa", "setosa", ...

print(strtable(iris, factor.values=as.integer), na.print='')

##      variable              class                              levels
##  Sepal.Length            numeric                                    
##   Sepal.Width            numeric                                    
##  Petal.Length            numeric                                    
##   Petal.Width            numeric                                    
##       Species Factor w/ 3 levels "setosa", "versicolor", "virginica"
##                 examples
##  5.1, 4.9, 4.7, 4.6, ...
##    3.5, 3, 3.2, 3.1, ...
##  1.4, 1.4, 1.3, 1.5, ...
##  0.2, 0.2, 0.2, 0.2, ...
##          1, 1, 1, 1, ...

Here’s a second example using the diamonds data from the ggplot2 package.


data(diamonds)
str(diamonds)

## 'data.frame':	53940 obs. of  10 variables:
##  $ carat  : num  0.23 0.21 0.23 0.29 0.31 0.24 0.24 0.26 0.22 0.23 ...
##  $ cut    : Ord.factor w/ 5 levels "Fair"<"Good"<..: 5 4 2 4 2 3 3 3 1 3 ...
##  $ color  : Ord.factor w/ 7 levels "D"<"E"<"F"<"G"<..: 2 2 2 6 7 7 6 5 2 5 ...
##  $ clarity: Ord.factor w/ 8 levels "I1"<"SI2"<"SI1"<..: 2 3 5 4 2 6 7 3 4 5 ...
##  $ depth  : num  61.5 59.8 56.9 62.4 63.3 62.8 62.3 61.9 65.1 59.4 ...
##  $ table  : num  55 61 65 58 58 57 57 55 61 61 ...
##  $ price  : int  326 326 327 334 335 336 336 337 337 338 ...
##  $ x      : num  3.95 3.89 4.05 4.2 4.34 3.94 3.95 4.07 3.87 4 ...
##  $ y      : num  3.98 3.84 4.07 4.23 4.35 3.96 3.98 4.11 3.78 4.05 ...
##  $ z      : num  2.43 2.31 2.31 2.63 2.75 2.48 2.47 2.53 2.49 2.39 ...

print(strtable(diamonds), na.print='')

##  variable              class                                      levels
##     carat            numeric                                            
##       cut Factor w/ 5 levels "Fair", "Good", "Very Good", "Premium", ...
##     color Factor w/ 7 levels                     "D", "E", "F", "G", ...
##   clarity Factor w/ 8 levels              "I1", "SI2", "SI1", "VS2", ...
##     depth            numeric                                            
##     table            numeric                                            
##     price            integer                                            
##         x            numeric                                            
##         y            numeric                                            
##         z            numeric                                            
##                                    examples
##                 0.23, 0.21, 0.23, 0.29, ...
##  "Ideal", "Premium", "Good", "Premium", ...
##                     "E", "E", "E", "I", ...
##             "SI2", "SI1", "VS1", "VS2", ...
##                 61.5, 59.8, 56.9, 62.4, ...
##                         55, 61, 65, 58, ...
##                     326, 326, 327, 334, ...
##                  3.95, 3.89, 4.05, 4.2, ...
##                 3.98, 3.84, 4.07, 4.23, ...
##                 2.43, 2.31, 2.31, 2.63, ...

print(strtable(diamonds, factor.values=as.integer), na.print='')

##  variable              class                                      levels
##     carat            numeric                                            
##       cut Factor w/ 5 levels "Fair", "Good", "Very Good", "Premium", ...
##     color Factor w/ 7 levels                     "D", "E", "F", "G", ...
##   clarity Factor w/ 8 levels              "I1", "SI2", "SI1", "VS2", ...
##     depth            numeric                                            
##     table            numeric                                            
##     price            integer                                            
##         x            numeric                                            
##         y            numeric                                            
##         z            numeric                                            
##                     examples
##  0.23, 0.21, 0.23, 0.29, ...
##              5, 4, 2, 4, ...
##              2, 2, 2, 6, ...
##              2, 3, 5, 4, ...
##  61.5, 59.8, 56.9, 62.4, ...
##          55, 61, 65, 58, ...
##      326, 326, 327, 334, ...
##   3.95, 3.89, 4.05, 4.2, ...
##  3.98, 3.84, 4.07, 4.23, ...
##  2.43, 2.31, 2.31, 2.63, ...

Here’s the source code from Gist:

To leave a comment for the author, please follow the link and comment on his blog: Jason.Bryer.org Blog - R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.