Most of us are pretty familiar with data types in our daily lives — we can easily tell that things like 1, 2, 3, and 4 are numbers (in this case, integers). 15.7 is still a number, but has a decimal. We know that every single word I’m typing in this sentence is composed of characters, and we know that in math, “true” and “false” are the answers to logical statements.
Just as we do in our heads, R also categorizes our data into different classes. These categories are similar to the real-life ones I described above, but can be a little different in terms of syntax and things to watch out for in your code.
To work in R and perform data analyses, you’ll need to have a solid understanding of data types. In this tutorial, I’m going to introduce several different types of data, explain how to use and manipulate each of them, and show you how to check what type of data you have. Let’s dive in.
Types of data
There are five main types of data in R that you’d come across as an ecologist. I’ll discuss all of them below except complex numbers, which are rarely used for data analysis in R.
1.2, 5, 7, 3.14159)
1, 2, 3, 4, 5)
i + 4)
TRUE / FALSE)
I’m also going to discuss a sixth, related category that helps you work with categorical variables:
Numeric data types are pretty straightforward. These are just numbers, written as either integers or decimals. We can check if our vector is numeric by using the function
# Create a numeric vector x <- c(3, 5, 6, 10.7) # Is our vector numeric? Yes! is.numeric(x) ##  TRUE
We can check our data type by using the functions
class() tells us that we’re working with numeric values, while
typeof() is more specific and tells us we’re working with doubles (i.e., numbers with decimals).
# Check the type of data class we have class(x) ##  "numeric" # Check the specific type of data that you have typeof(x) ##  "double"
You can, of course, perform mathematical operations with numeric values.
# Add 4 to all the values in the vector x + 4 ##  7.0 9.0 10.0 14.7
You can also do math with integers, which represent numbers without decimal places. These are usually used if you’re counting something — for example, you can observe 7 butterflies in a plot, but you can’t observe 7.2 butterflies (or at least I hope not!).
If you create a vector manually and don’t have any decimal values, R will still identify your vector as the class “numeric”.
# Create a vector with only integers x <- c(1, 4, 2, 7, 8) # Look at the class class(x) ##  "numeric"
You can change this vector to be an integer by using the function
# Change the vector class x <- as.integer(x) # Look at the class class(x) ##  "integer"
Alternatively, you can generate an integer vector like this. The “L” after each number tells R that you want it to be an integer.
# Create an integer vector x <- c(1L, 2L, 5L, 3L, 10L) # View vector x ##  1 2 5 3 10 # View class class(x) ##  "integer"
You could also create an integer vector like this. The colon (
:) tells R to generate a sequence of vectors from 1 to 10, going up by 1 each time.
# Create a sequence of integers x <- c(1:10) # View vector x ##  1 2 3 4 5 6 7 8 9 10 # View data class class(x) ##  "integer"
Some functions will also automatically generate integer vectors, like the function
sample(). This function randomly samples a certain number of integer values within a specified range. I asked
sample() to choose ten values between 1 and 10.
# Create a random sequence of integers from 1 to 10: set.seed(123) # use set.seed to get the same random values as me x <- sample(1:10, 10) # View vector x ##  3 10 2 8 6 9 1 7 5 4 # View data class class(x) ##  "integer"
I’m not going to discuss this one because complex numbers aren’t used much in R for data analysis, though they exist. These are just numbers with real and imaginary components (containing the number i, or the square root of -1).
Characters are another common data type. These are used to store text in R (also called “strings”). To indicate something is a character, we put quotation marks around it
# Create a vector of characters x <- c("These", "are", "characters") # View class class(x) ##  "character"
Putting quotation marks around numbers will also turn them into characters, which can get confusing.
# Create a vector of characters x <- c("1", "4", "5", "7", "8") # View vector x ##  "1" "4" "5" "7" "8"
You can’t do math with a vector of numbers that are classed as characters.
# Try to do math mean(x) ## Warning in mean.default(x): argument is not numeric or logical: returning NA ##  NA
Why? Because R views them as text!
# View class class(x) ##  "character"
You can turn this character vector of numbers into a numeric vector using the
as.numeric()is one way to resolve that issue. Any values that were character will be converted to
NAs. In that scenario you’ll probably want to go back and fix your raw CSV file, but at least now the NAs will help you find where the problem was.
# Turn it into a numeric vector x <- as.numeric(x) # View vector x ##  1 4 5 7 8 # View class class(x) ##  "numeric"
And then you can turn it back into a character using
# Turn it back into a character x <- as.character(x) # View vector x ##  "1" "4" "5" "7" "8" # View class class(x) ##  "character"
The logical class is represented by only two possible values:
FALSE (also can be written
F, but never
These values result from any logical statements that are made. For example, in the code below I asked R if the elements of my vector were greater than 5. This returns a logical vector where each element is either
# Create a vector x <- c(1, 5, 6, 7, 2, 8) # Are the elements of vector x greater than 5? Store results in vector y y <- x > 5 # View y y ##  FALSE FALSE TRUE TRUE FALSE TRUE # View class class(y) ##  "logical"
You can also create a vector of logical statements.
# Create logical vector x <- c(T, F, T, F, F, T) # View vector x ##  TRUE FALSE TRUE FALSE FALSE TRUE
And you can convert logical values to numeric values, and back.
FALSE is the same as
TRUE is the same as
# Convert to numeric vector x <- as.numeric(x) # View vector x ##  1 0 1 0 0 1 # Convert back to logical vector x <- as.logical(x) # View vector again x ##  TRUE FALSE TRUE FALSE FALSE TRUE
This also means that you can do math with logical values. This is useful if, for example, you’re trying to see how many
TRUE values you have in your vector. In fact, applying any math operations to a logical vector will automatically convert the values to 1s and 0s.
# View vector x ##  TRUE FALSE TRUE FALSE FALSE TRUE # Count how many "TRUE" values there are. There are 3! sum(x) ##  3
Factors are a special data type that is primarily used to represent repeating categories (i.e., categorical variables). When you specify an object as a factor, you’re telling R to think of it as a categorical variable, with different levels. This can be helpful when analyzing your data, as categorical variables and continuous variables are often handled differently in statistical analyses.
In the code below, I created a data frame showing the height and sex of five individuals.
# Create an example data frame example <- data.frame(indiv = c("A", "B", "C", "D", "E"), height = c(15, 10, 12, 9, 17), sex = c("female", "female", "female", "male", "female")) # View structure of data frame str(example) ## 'data.frame': 5 obs. of 3 variables: ## $ indiv : chr "A" "B" "C" "D" ... ## $ height: num 15 10 12 9 17 ## $ sex : chr "female" "female" "female" "male" ...
Right now, the
sex column is a character vector because I entered the data in quotation marks. But really what I want to do is tell R that
sex is a categorical variable, with “female” and “male” as levels. To do that, all I have to do is use the
# Change the sex column to be a factor example$sex <- as.factor(example$sex) # View the factor example$sex ##  female female female male female ## Levels: female male
You can see that R listed the vector and then beneath that, has figured out on its own that the levels are “female” and “male”. When writing the levels, R will sort them in alphabetical order. That’s why the levels are
female male instead of
You may want to change the order of your factor levels (this can be useful when plotting your data and determining the order in which they appear).
For example, you might have a vector like this:
# Create vector places <- factor(c("first", "first", "second", "third", "fifth", "fourth", "second")) # View factor places ##  first first second third fifth fourth second ## Levels: fifth first fourth second third
The order of the levels doesn’t make sense. We want it to go from first through fifth in the implied numeric order — not alphabetically. So let’s change the order using
factor(vector, levels = c("first", "second", "third", etc.)).
# Change level order places <- factor(places, levels = c("first", "second", "third", "fourth", "fifth")) # View factor places ##  first first second third fifth fourth second ## Levels: first second third fourth fifth
Factors don’t just have to be text. They can also be integers. For example, in the code below I created a data frame describing the stream width and order of several stream sites. Stream order is not a continuous variable, even though it’s represented by numbers. It would be best to convert stream order to a factor.
# Create data frame example2 <- data.frame(stream = c("Patuxent", "Patapsco", "Deer Creek", "Town Creek", "Browns Branch"), width = c(37, 42, 25, 32, 22), order = c(6, 6, 4, 5, 3)) # View data frame structure str(example2) ## 'data.frame': 5 obs. of 3 variables: ## $ stream: chr "Patuxent" "Patapsco" "Deer Creek" "Town Creek" ... ## $ width : num 37 42 25 32 22 ## $ order : num 6 6 4 5 3
R sees stream order as being numeric, which makes sense. But let’s tell R that stream order is a factor.
# Change stream order to a factor example2$order <- as.factor(example2$order) # View stream order example2$order ##  6 6 4 5 3 ## Levels: 3 4 5 6
Looks good. Since these are numbers, R just orders the levels in ascending order.
How to check and manipulate data types
As demonstrated throughout this tutorial, it can be useful to check the type of data you’re working with and be able to change it to another type if you need. You might need this especially in situations where you’re reading in data from a .csv, and need to check that all your numbers are numeric instead of characters.
The main way to check your data type is to use the function
class(). If you have a data frame, another easy way to check data types is to use the
str() function. This displays the structure of your data frame and tells you what data type each of your columns is. The example below lists heights over time for five individuals.
# Create an example data frame example <- data.frame(indiv = c("A", "B", "C", "D", "E"), height_0 = c(15, 10, 12, 9, 17), height_10 = c(20, 18, 14, 15, 19), height_20 = c(23, 24, 18, 17, 26)) str(example) ## 'data.frame': 5 obs. of 4 variables: ## $ indiv : chr "A" "B" "C" "D" ... ## $ height_0 : num 15 10 12 9 17 ## $ height_10: num 20 18 14 15 19 ## $ height_20: num 23 24 18 17 26
You can see that the column
indiv is a character vector (abbreviated “chr”), while each successive column is numeric (abbreviated “num”).
You also noticed me using functions like
as.character(). All of the data types have
as. functions, where the first one is a logical statement to check the specific data type, asking “is this object of the class XXX?” and returns
as. functions are actions that convert objects into a new data type. You may find yourself using these often when you’re first formatting your data and preparing it for analysis.
That’s it for data types in R! Keep an eye out for our next tutorial, which will go over different data structures in R like vectors, lists, data frames, and tibbles. I hope this tutorial was helpful! Happy coding!
Also be sure to check out R-bloggers for other great tutorials on learning R