Data types in R

[This article was first published on R on Stats and R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This article presents the different data types in R. To learn about the different variable types from a statistical point of view, read “Variable types and examples”.

What data types exist in R?

There are five data types in R:

  1. Numeric
  2. Integer
  3. Complex
  4. Character
  5. Logical

Datasets in R are often a combination of these 5 different data types. Below we explore in more details each data types one by one, except the data type “complex” as we focus on the main ones and this data type is rarely used in practice.

Numeric

The most common data type in R is numeric. A variable or a series will be stored as numeric data if the values are numbers or if the values contains decimals. For example, the following two series are stored as numeric by default:

# numeric series without decimals
num_data <- c(3, 7, 2)
num_data
## [1] 3 7 2
class(num_data)
## [1] "numeric"
# numeric series with decimals
num_data_dec <- c(3.4, 7.1, 2.9)
num_data_dec
## [1] 3.4 7.1 2.9
class(num_data_dec)
## [1] "numeric"
# also possible to check the class thanks to str()
str(num_data_dec)
##  num [1:3] 3.4 7.1 2.9

In other words, if you assign one or several numbers to an object in R, it will be stored as numeric by default (numbers with decimals), unless specified otherwise.

Integer

Integer data type is actually a special case of numeric data. Integers are numeric data without decimals. It can be used if you are sure that the numbers you store will never contains decimals. For example, let’s say you are interested in the number of children in a sample of 10 families. This variable is a discrete variable (see a reminder on the variable types if you do not remember what is a discrete variable) and will never have decimals. Therefore, it can be stored as integer data thanks to the as.integer() command:

children
##  [1] 1 3 2 2 4 4 1 1 1 4
children <- as.integer(children)
class(children)
## [1] "integer"

Note that if your variable does not have decimals, R will automatically set the type as integers instead of numeric.

Character

The data type character is used when storing text, known as strings in R. The simplest ways to store data under the character format is by using "" around the piece of text:

char <- "some text"
char
## [1] "some text"
class(char)
## [1] "character"

If you want to force any kind of data to be stored as character, you can do it by using the command as.character():

char2 <- as.character(children)
char2
##  [1] "1" "3" "2" "2" "4" "4" "1" "1" "1" "4"
class(char2)
## [1] "character"

Note that everything inside "" will be considered as character, no matter if it looks like character or not. For example:

chars <- c("7.42")
chars
## [1] "7.42"
class(chars)
## [1] "character"

Furthermore, as soon as there is at least one character value inside a variable or vector, the whole variable or vector will be considered as character:

char_num <- c("text", 1, 3.72, 4)
char_num
## [1] "text" "1"    "3.72" "4"
class(char_num)
## [1] "character"

Last but not least, although space does not matter in numeric data, it does matter for character data:

num_space <- c(1 )
num_nospace <- c(1)
# is num_space equal to num_nospace?
num_space == num_nospace
## [1] TRUE
char_space <- "text "
char_nospace <- "text"
# is char_space equal to char_nospace?
char_space == char_nospace
## [1] FALSE

As you can see from the results above, a space within character data (i.e., within "") makes it a different string in R!

Logical

A logical variable is a variable with only two values; TRUE or FALSE:

value1 <- 7
value2 <- 9

# is value1 greater than value2?
greater <- value1 > value2
greater
## [1] FALSE
class(greater)
## [1] "logical"
# is value1 less than or equal to value2?
less <- value1 <= value2
less
## [1] TRUE
class(less)
## [1] "logical"

It is also possible to transform logical data into numeric data. After the transformation from logical to numeric with the as.numeric() command, FALSE values equal to 0 and TRUE values equal to 1:

greater_num <- as.numeric(greater)
sum(greater)
## [1] 0
less_num <- as.numeric(less)
sum(less)
## [1] 1

Conversely, numeric data can be converted to logical data, with FALSE for all values equal to 0 and TRUE for all other values.

x <- 0
as.logical(x)
## [1] FALSE
y <- 5
as.logical(y)
## [1] TRUE

Thanks for reading. I hope this article helped you to understand the basic data types in R and their particularities. If you would like to learn more about the different variable types from a statistical point of view, read “Variable types and examples”. As always, if you find a mistake/bug or if you have any questions do not hesitate to let me know in the comment section below, raise an issue on GitHub or contact me. Get updates every time a new article is published by subscribing to this blog.

To leave a comment for the author, please follow the link and comment on their blog: R on Stats and R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)