str Implementation for Data Frames
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
The str
function is perhaps the most useful function in R. It provides great information about the structure of some object. When I teach R, especially for those coming from SPSS, the str
function for data frames provides the information they are use to seeing on the variable view tab. However, sometimes I want to display the information str
returns in a better format (e.g. as an HTML or LaTeX table). I wrote a function, strtable
that provides the information str.data.frame
does but returns the results as a data.frame
. This provides much more flexibility for controlling how the output is formatted. Specifically, it will return a data.frame
with four columns: variable
, class
, levels
, and examples
.
The function can be sourced from Gist using the devtools
package.
devtools::source_gist('4a0a5ab9fe7e1cf3be0e')
For the first example, we’ll use the iris
data frame.
data(iris) str(iris) ## 'data.frame': 150 obs. of 5 variables: ## $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ... ## $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ... ## $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ... ## $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ... ## $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
The strtable
has five parameters:
n
the first n element to showwidth
maximum width in characters for the examples to shown.levels
the first n levels of a factor to show.width.levels
maximum width in characters for the number of levels to show.factor.values
function defining how factor examples should be printed. Possible values areas.character
oras.integer
.
print(strtable(iris), na.print='') ## variable class levels ## Sepal.Length numeric ## Sepal.Width numeric ## Petal.Length numeric ## Petal.Width numeric ## Species Factor w/ 3 levels "setosa", "versicolor", "virginica" ## examples ## 5.1, 4.9, 4.7, 4.6, ... ## 3.5, 3, 3.2, 3.1, ... ## 1.4, 1.4, 1.3, 1.5, ... ## 0.2, 0.2, 0.2, 0.2, ... ## "setosa", "setosa", "setosa", "setosa", ... print(strtable(iris, factor.values=as.integer), na.print='') ## variable class levels ## Sepal.Length numeric ## Sepal.Width numeric ## Petal.Length numeric ## Petal.Width numeric ## Species Factor w/ 3 levels "setosa", "versicolor", "virginica" ## examples ## 5.1, 4.9, 4.7, 4.6, ... ## 3.5, 3, 3.2, 3.1, ... ## 1.4, 1.4, 1.3, 1.5, ... ## 0.2, 0.2, 0.2, 0.2, ... ## 1, 1, 1, 1, ...
Here’s a second example using the diamonds
data from the ggplot2
package.
data(diamonds) str(diamonds) ## 'data.frame': 53940 obs. of 10 variables: ## $ carat : num 0.23 0.21 0.23 0.29 0.31 0.24 0.24 0.26 0.22 0.23 ... ## $ cut : Ord.factor w/ 5 levels "Fair"<"Good"<..: 5 4 2 4 2 3 3 3 1 3 ... ## $ color : Ord.factor w/ 7 levels "D"<"E"<"F"<"G"<..: 2 2 2 6 7 7 6 5 2 5 ... ## $ clarity: Ord.factor w/ 8 levels "I1"<"SI2"<"SI1"<..: 2 3 5 4 2 6 7 3 4 5 ... ## $ depth : num 61.5 59.8 56.9 62.4 63.3 62.8 62.3 61.9 65.1 59.4 ... ## $ table : num 55 61 65 58 58 57 57 55 61 61 ... ## $ price : int 326 326 327 334 335 336 336 337 337 338 ... ## $ x : num 3.95 3.89 4.05 4.2 4.34 3.94 3.95 4.07 3.87 4 ... ## $ y : num 3.98 3.84 4.07 4.23 4.35 3.96 3.98 4.11 3.78 4.05 ... ## $ z : num 2.43 2.31 2.31 2.63 2.75 2.48 2.47 2.53 2.49 2.39 ... print(strtable(diamonds), na.print='') ## variable class levels ## carat numeric ## cut Factor w/ 5 levels "Fair", "Good", "Very Good", "Premium", ... ## color Factor w/ 7 levels "D", "E", "F", "G", ... ## clarity Factor w/ 8 levels "I1", "SI2", "SI1", "VS2", ... ## depth numeric ## table numeric ## price integer ## x numeric ## y numeric ## z numeric ## examples ## 0.23, 0.21, 0.23, 0.29, ... ## "Ideal", "Premium", "Good", "Premium", ... ## "E", "E", "E", "I", ... ## "SI2", "SI1", "VS1", "VS2", ... ## 61.5, 59.8, 56.9, 62.4, ... ## 55, 61, 65, 58, ... ## 326, 326, 327, 334, ... ## 3.95, 3.89, 4.05, 4.2, ... ## 3.98, 3.84, 4.07, 4.23, ... ## 2.43, 2.31, 2.31, 2.63, ... print(strtable(diamonds, factor.values=as.integer), na.print='') ## variable class levels ## carat numeric ## cut Factor w/ 5 levels "Fair", "Good", "Very Good", "Premium", ... ## color Factor w/ 7 levels "D", "E", "F", "G", ... ## clarity Factor w/ 8 levels "I1", "SI2", "SI1", "VS2", ... ## depth numeric ## table numeric ## price integer ## x numeric ## y numeric ## z numeric ## examples ## 0.23, 0.21, 0.23, 0.29, ... ## 5, 4, 2, 4, ... ## 2, 2, 2, 6, ... ## 2, 3, 5, 4, ... ## 61.5, 59.8, 56.9, 62.4, ... ## 55, 61, 65, 58, ... ## 326, 326, 327, 334, ... ## 3.95, 3.89, 4.05, 4.2, ... ## 3.98, 3.84, 4.07, 4.23, ... ## 2.43, 2.31, 2.31, 2.63, ...
Here’s the source code from Gist:
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.