This article presents the different data types in R. To learn about the different variable types from a statistical point of view, read “Variable types and examples”.
What data types exist in R?
There are the 6 most common data types in R:
Datasets in R are often a combination of these 6 different data types. Below we explore in more detail each data types one by one, except the data type “complex” as we focus on the main ones and this data type is rarely used in practice.
The most common data type in R is numeric. A variable or a series will be stored as numeric data if the values are numbers or if the values contains decimals. For example, the following two series are stored as numeric by default:
# numeric series without decimals num_data <- c(3, 7, 2) num_data ##  3 7 2 class(num_data) ##  "numeric" # numeric series with decimals num_data_dec <- c(3.4, 7.1, 2.9) num_data_dec ##  3.4 7.1 2.9 class(num_data_dec) ##  "numeric" # also possible to check the class thanks to str() str(num_data_dec) ## num [1:3] 3.4 7.1 2.9
In other words, if you assign one or several numbers to an object in R, it will be stored as numeric by default (numbers with decimals), unless specified otherwise.
Integer data type is actually a special case of numeric data. Integers are numeric data without decimals. It can be used if you are sure that the numbers you store will never contains decimals. For example, let’s say you are interested in the number of children in a sample of 10 families. This variable is a discrete variable (see a reminder on the variable types if you do not remember what is a discrete variable) and will never have decimals. Therefore, it can be stored as integer data thanks to the
children ##  1 3 2 2 4 4 1 1 1 4 children <- as.integer(children) class(children) ##  "integer"
Note that if your variable does not have decimals, R will automatically set the type as integers instead of numeric.
The data type character is used when storing text, known as strings in R. The simplest ways to store data under the character format is by using
"" around the piece of text:
char <- "some text" char ##  "some text" class(char) ##  "character"
If you want to force any kind of data to be stored as character, you can do it by using the command
char2 <- as.character(children) char2 ##  "1" "3" "2" "2" "4" "4" "1" "1" "1" "4" class(char2) ##  "character"
Note that everything inside
"" will be considered as character, no matter if it looks like character or not. For example:
chars <- c("7.42") chars ##  "7.42" class(chars) ##  "character"
Furthermore, as soon as there is at least one character value inside a variable or vector, the whole variable or vector will be considered as character:
char_num <- c("text", 1, 3.72, 4) char_num ##  "text" "1" "3.72" "4" class(char_num) ##  "character"
Last but not least, although space does not matter in numeric data, it does matter for character data:
num_space <- c(1) num_nospace <- c(1) # is num_space equal to num_nospace? num_space == num_nospace ##  TRUE char_space <- "text " char_nospace <- "text" # is char_space equal to char_nospace? char_space == char_nospace ##  FALSE
As you can see from the results above, a space within character data (i.e., within
"") makes it a different string in R!
Factor variables are a special case of character variables in the sense that it also contains text. However, factor variables are used when there are a limited number of unique character strings. It often represents a categorical variable. For instance, the gender will usually take on only two values, “female” or “male” (and will be considered as a factor variable) whereas the name will generally have lots of possibilities (and thus will be considered as a character variable). To create a factor variable use the
gender <- factor(c("female", "female", "male", "female", "male")) gender ##  female female male female male ## Levels: female male
To know the different levels of a factor variable, use
levels(gender) ##  "female" "male"
By default, the levels are sorted alphabetically. You can reorder the levels with the argument
levels in the
gender <- factor(gender, levels = c("male", "female")) levels(gender) ##  "male" "female"
Character strings can be converted to factors with
text <- c("test1", "test2", "test1", "test1") # create a character vector class(text) # to know the class ##  "character" text_factor <- as.factor(text) # transform to factor class(text_factor) # recheck the class ##  "factor"
The character strings have been transformed to factors, as shown by its class of the type
A logical variable is a variable with only two values;
value1 <- 7 value2 <- 9 # is value1 greater than value2? greater <- value1 > value2 greater ##  FALSE class(greater) ##  "logical" # is value1 less than or equal to value2? less <- value1 <= value2 less ##  TRUE class(less) ##  "logical"
It is also possible to transform logical data into numeric data. After the transformation from logical to numeric with the
FALSE values equal to 0 and
TRUE values equal to 1:
greater_num <- as.numeric(greater) sum(greater) ##  0 less_num <- as.numeric(less) sum(less) ##  1
Conversely, numeric data can be converted to logical data, with
FALSE for all values equal to 0 and
TRUE for all other values.
x <- 0 as.logical(x) ##  FALSE y <- 5 as.logical(y) ##  TRUE
Thanks for reading. I hope this article helped you to understand the basic data types in R and their particularities. If you would like to learn more about the different variable types from a statistical point of view, read the article “Variable types and examples”.
As always, if you have a question or a suggestion related to the topic covered in this article, please add it as a comment so other readers can benefit from the discussion.