tibble 1.0.0

March 24, 2016
By

(This article was first published on RStudio Blog, and kindly contributed to R-bloggers)

I’m pleased to announce tibble, a new package for manipulating and printing data frames in R. Tibbles are a modern reimagining of the data.frame, keeping what time has proven to be effective, and throwing out what is not. The name comes from dplyr: originally you created these objects with tbl_df(), which was most easily pronounced as “tibble diff”.

Install tibble with:

install.packages("tibble")

This package extracts out the tbl_df class associated functions from dplyr. Kirill Müller extracted the code from dplyr, enhanced the tests, and added a few minor improvements.

Creating tibbles

You can create a tibble from an existing object with as_data_frame():

as_data_frame(iris)
#> Source: local data frame [150 x 5]
#> 
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>           (dbl)       (dbl)        (dbl)       (dbl)  (fctr)
#> 1           5.1         3.5          1.4         0.2  setosa
#> 2           4.9         3.0          1.4         0.2  setosa
#> 3           4.7         3.2          1.3         0.2  setosa
#> 4           4.6         3.1          1.5         0.2  setosa
#> 5           5.0         3.6          1.4         0.2  setosa
#> 6           5.4         3.9          1.7         0.4  setosa
#> 7           4.6         3.4          1.4         0.3  setosa
#> 8           5.0         3.4          1.5         0.2  setosa
#> 9           4.4         2.9          1.4         0.2  setosa
#> 10          4.9         3.1          1.5         0.1  setosa
#> ..          ...         ...          ...         ...     ...

This works for data frames, lists, matrices, and tables.

You can also create a new tibble from individual vectors with data_frame():

data_frame(x = 1:5, y = 1, z = x ^ 2 + y)
#> Source: local data frame [5 x 3]
#> 
#>       x     y     z
#>   (int) (dbl) (dbl)
#> 1     1     1     2
#> 2     2     1     5
#> 3     3     1    10
#> 4     4     1    17
#> 5     5     1    26

data_frame() does much less than data.frame(): it never changes the type of the inputs (e.g. it never converts strings to factors!), it never changes the names of variables, and it never creates row.names(). You can read more about these features in the vignette, vignette("tibble").

You can define a tibble row-by-row with frame_data():

frame_data(
  ~x, ~y,  ~z,
  "a", 2,  3.6,
  "b", 1,  8.5
)
#> Source: local data frame [2 x 3]
#> 
#>       x     y     z
#>   (chr) (dbl) (dbl)
#> 1     a     2   3.6
#> 2     b     1   8.5

Tibbles vs data frames

There are two main differences in the usage of a data frame vs a tibble: printing, and subsetting.

Tibbles have a refined print method that shows only the first 10 rows, and all the columns that fit on screen. This makes it much easier to work with large data. In addition to its name, each column reports its type, a nice feature borrowed from str():

library(nycflights13)
flights
#> Source: local data frame [336,776 x 16]
#> 
#>     year month   day dep_time dep_delay arr_time arr_delay carrier tailnum
#>    (int) (int) (int)    (int)     (dbl)    (int)     (dbl)   (chr)   (chr)
#> 1   2013     1     1      517         2      830        11      UA  N14228
#> 2   2013     1     1      533         4      850        20      UA  N24211
#> 3   2013     1     1      542         2      923        33      AA  N619AA
#> 4   2013     1     1      544        -1     1004       -18      B6  N804JB
#> 5   2013     1     1      554        -6      812       -25      DL  N668DN
#> 6   2013     1     1      554        -4      740        12      UA  N39463
#> 7   2013     1     1      555        -5      913        19      B6  N516JB
#> 8   2013     1     1      557        -3      709       -14      EV  N829AS
#> 9   2013     1     1      557        -3      838        -8      B6  N593JB
#> 10  2013     1     1      558        -2      753         8      AA  N3ALAA
#> ..   ...   ...   ...      ...       ...      ...       ...     ...     ...
#> Variables not shown: flight (int), origin (chr), dest (chr), air_time
#>   (dbl), distance (dbl), hour (dbl), minute (dbl).

Tibbles are strict about subsetting. If you try to access a variable that does not exist, you’ll get an error:

flights$yea
#> Error: Unknown column 'yea'

Tibbles also clearly delineate [ and [[: [ always returns another tibble, [[ always returns a vector. No more drop = FALSE!

class(iris[ , 1])
#> [1] "numeric"
class(iris[ , 1, drop = FALSE])
#> [1] "data.frame"
class(as_data_frame(iris)[ , 1])
#> [1] "tbl_df"     "tbl"        "data.frame"

Interacting with legacy code

A handful of functions are don’t work with tibbles because they expect df[, 1] to return a vector, not a data frame. If you encounter one of these functions, use as.data.frame() to turn a tibble back to a data frame:

class(as.data.frame(tbl_df(iris)))

To leave a comment for the author, please follow the link and comment on their blog: RStudio Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Sponsors

Mango solutions





RStudio homepage

Zero Inflated Models and Generalized Linear Mixed Models with R

Quantide: statistical consulting and training



http://www.eoda.de









ODSC

CRC R books series













Contact us if you wish to help support R-bloggers, and place your banner here.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)