T is for tibble

April 23, 2018
By

(This article was first published on Deeply Trivial, and kindly contributed to R-bloggers)

T is for Tibble For the letter D, I introduced data frames, a built-in R object type. But as I’ve learned more about R and, in particular, the tidyverse – most recently when I finally started reading Text Mining with R: A Tidy Approach – I learned about a more modern version of the R data frame: a tibble.

According to the tibble overview on the tidyverse website:

Tibbles are data.frames that are lazy and surly: they do less (i.e. they don’t change variable names or types, and don’t do partial matching) and complain more (e.g. when a variable does not exist). This forces you to confront problems earlier, typically leading to cleaner, more expressive code.

What does this mean? Well, remember when I noted that a character variable in my measures data frame had been changed to a factor? I manually changed it back to character. But had I simply created a tibble with that information, I wouldn’t have had to do anything. Data frames will also do partial matching on variable names – so if I requested Facebook$R, it would have given me all variables in that set starting with R. If I tried that with a tibble, I’d get an error message, because it matches variable references literally.

There are a few ways to create a tibble, one using the tibble packages and the other using the readr package. Fortunately, you don’t need to worry about that, because we’re just going to use the tidyverse package, which contains those two and more.

install.packages("tidyverse")
## Installing package into '~/R/win-library/3.4'
## (as 'lib' is unspecified)
library(tidyverse)
## Loading tidyverse: ggplot2
## Loading tidyverse: tibble
## Loading tidyverse: tidyr
## Loading tidyverse: readr
## Loading tidyverse: purrr
## Loading tidyverse: dplyr
## Conflicts with tidy packages ----------------------------------------------
## filter(): dplyr, stats
## lag(): dplyr, stats

First, let’s create a new tibble from scratch. The syntax is almost exactly the same as it was in the data frame post.

measures<-tibble(
meas_id = c(1:6),
name = c("Ruminative Response Scale","Savoring Beliefs Inventory",
"Satisfaction with Life Scale","Ten-Item Personality Measure",
"Cohen-Hoberman Inventory of Physical Symptoms",
"Center for Epidemiologic Studies Depression Scale"),
num_items = c(22,24,5,10,32,16),
rev_items = c(FALSE, TRUE, FALSE, TRUE, FALSE, TRUE)
)
measures
## # A tibble: 6 x 4
## meas_id name num_items
##
## 1 1 Ruminative Response Scale 22
## 2 2 Savoring Beliefs Inventory 24
## 3 3 Satisfaction with Life Scale 5
## 4 4 Ten-Item Personality Measure 10
## 5 5 Cohen-Hoberman Inventory of Physical Symptoms 32
## 6 6 Center for Epidemiologic Studies Depression Scale 16
## # ... with 1 more variables: rev_items

As you can see, the name variable is character, not factor. I didn’t have to do anything. Alternatively, you could convert an existing data frame, whether it’s one you created or one that came with R/an R package.

car<-as_tibble(mtcars)
car
## # A tibble: 32 x 11
## mpg cyl disp hp drat wt qsec vs am gear carb
## *
## 1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
## 2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
## 3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
## 4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
## 5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
## 6 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
## 7 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
## 8 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
## 9 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
## 10 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
## # ... with 22 more rows

But chances are you’ll be reading in data from an external file. The readr package can handle delimited and fixed width files. For instance, to read in the Facebook dataset I’ve been using, I just need the function read_tsv.

Facebook<-read_tsv("small_facebook_set.txt",col_names=TRUE)
## Parsed with column specification:
## cols(
## .default = col_integer()
## )
## See spec(...) for full column specifications.
Facebook
## # A tibble: 257 x 111
## ID gender Rum1 Rum2 Rum3 Rum4 Rum5 Rum6 Rum7 Rum8 Rum9
##
## 1 1 1 3 1 3 2 3 1 2 1 1
## 2 2 1 1 1 1 1 1 1 0 0 1
## 3 3 1 4 3 3 4 3 4 2 3 3
## 4 4 0 4 0 0 2 0 0 4 0 2
## 5 5 1 2 2 2 1 2 1 1 1 1
## 6 6 0 2 4 3 4 2 3 2 2 3
## 7 7 1 1 2 3 2 0 2 3 1 2
## 8 8 0 2 1 1 2 0 2 3 3 3
## 9 9 1 4 1 4 4 3 2 2 1 1
## 10 10 1 4 2 0 3 4 2 4 1 2
## # ... with 247 more rows, and 100 more variables: Rum10 ,
## # Rum11 , Rum12 , Rum13 , Rum14 , Rum15 ,
## # Rum16 , Rum17 , Rum18 , Rum19 , Rum20 ,
## # Rum21 , Rum22 , Sav1 , Sav2 , Sav3 ,
## # Sav4 , Sav5 , Sav6 , Sav7 , Sav8 ,
## # Sav9 , Sav10 , Sav11 , Sav12 , Sav13 ,
## # Sav14 , Sav15 , Sav16 , Sav17 , Sav18 ,
## # Sav19 , Sav20 , Sav21 , Sav22 , Sav23 ,
## # Sav24 , LS1 , LS2 , LS3 , LS4 , LS5 ,
## # Extraverted , Critical , Dependable , Anxious ,
## # NewExperiences , Reserved , Sympathetic ,
## # Disorganized , Calm , Conventional , Health1 ,
## # Health2 , Health3 , Health4 , Health5 ,
## # Health6 , Health7 , Health8 , Health9 ,
## # Health10 , Health11 , Health12 , Health13 ,
## # Health14 , Health15 , Health16 , Health17 ,
## # Health18 , Health19 , Health20 , Health21 ,
## # Health22 , Health23 , Health24 , Health25 ,
## # Health26 , Health27 , Health28 , Health29 ,
## # Health30 , Health31 , Health32 , Dep1 ,
## # Dep2 , Dep3 , Dep4 , Dep5 , Dep6 ,
## # Dep7 , Dep8 , Dep9 , Dep10 , Dep11 ,
## # Dep12 , Dep13 , Dep14 , Dep15 , Dep16

Finally, if you’re working with SAS, SPSS, or Stata files, you can read those in with the tidyverse package, haven, and the functions read_sas, read_sav, and read_dta, respectively.

If for some reason you need a data frame rather than a tibble, you can convert a tibble to a data frame with class(as.data.frame(tibble_name)).

You can learn more about tibbles here and here.

To leave a comment for the author, please follow the link and comment on their blog: Deeply Trivial.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)