R Data types 101, or What kind of data do I have?
Want to share your content on Rbloggers? click here if you have a blog, or here if you don't.
Most of us are pretty familiar with data types in our daily lives — we can easily tell that things like 1, 2, 3, and 4 are numbers (in this case, integers). 15.7 is still a number, but has a decimal. We know that every single word I’m typing in this sentence is composed of characters, and we know that in math, “true” and “false” are the answers to logical statements.
Just as we do in our heads, R also categorizes our data into different classes. These categories are similar to the reallife ones I described above, but can be a little different in terms of syntax and things to watch out for in your code.
To work in R and perform data analyses, you’ll need to have a solid understanding of data types. In this tutorial, I’m going to introduce several different types of data, explain how to use and manipulate each of them, and show you how to check what type of data you have. Let’s dive in.
Types of data
There are five main types of data in R that you’d come across as an ecologist. I’ll discuss all of them below except complex numbers, which are rarely used for data analysis in R.

Numeric (
1.2, 5, 7, 3.14159
) 
Integer (
1, 2, 3, 4, 5
) 
Complex (
i + 4
) 
Logical (
TRUE / FALSE
) 
Character (
"a", "apple"
)
I’m also going to discuss a sixth, related category that helps you work with categorical variables:
 Factor
Numeric
Numeric data types are pretty straightforward. These are just numbers, written as either integers or decimals. We can check if our vector is numeric by using the function is.numeric()
.
# Create a numeric vector x < c(3, 5, 6, 10.7) # Is our vector numeric? Yes! is.numeric(x) ## [1] TRUE
We can check our data type by using the functions class()
and typeof()
. class()
tells us that we’re working with numeric values, while typeof()
is more specific and tells us we’re working with doubles (i.e., numbers with decimals).
# Check the type of data class we have class(x) ## [1] "numeric" # Check the specific type of data that you have typeof(x) ## [1] "double"
You can, of course, perform mathematical operations with numeric values.
# Add 4 to all the values in the vector x + 4 ## [1] 7.0 9.0 10.0 14.7
Integer
You can also do math with integers, which represent numbers without decimal places. These are usually used if you’re counting something — for example, you can observe 7 butterflies in a plot, but you can’t observe 7.2 butterflies (or at least I hope not!).
If you create a vector manually and don’t have any decimal values, R will still identify your vector as the class “numeric”.
# Create a vector with only integers x < c(1, 4, 2, 7, 8) # Look at the class class(x) ## [1] "numeric"
You can change this vector to be an integer by using the function as.integer()
.
# Change the vector class x < as.integer(x) # Look at the class class(x) ## [1] "integer"
Alternatively, you can generate an integer vector like this. The “L” after each number tells R that you want it to be an integer.
# Create an integer vector x < c(1L, 2L, 5L, 3L, 10L) # View vector x ## [1] 1 2 5 3 10 # View class class(x) ## [1] "integer"
You could also create an integer vector like this. The colon (:
) tells R to generate a sequence of vectors from 1 to 10, going up by 1 each time.
# Create a sequence of integers x < c(1:10) # View vector x ## [1] 1 2 3 4 5 6 7 8 9 10 # View data class class(x) ## [1] "integer"
Some functions will also automatically generate integer vectors, like the function sample()
. This function randomly samples a certain number of integer values within a specified range. I asked sample()
to choose ten values between 1 and 10.
# Create a random sequence of integers from 1 to 10: set.seed(123) # use set.seed to get the same random values as me x < sample(1:10, 10) # View vector x ## [1] 3 10 2 8 6 9 1 7 5 4 # View data class class(x) ## [1] "integer"
Complex
I’m not going to discuss this one because complex numbers aren’t used much in R for data analysis, though they exist. These are just numbers with real and imaginary components (containing the number i, or the square root of 1).
Character
Characters are another common data type. These are used to store text in R (also called “strings”). To indicate something is a character, we put quotation marks around it ""
.
# Create a vector of characters x < c("These", "are", "characters") # View class class(x) ## [1] "character"
Putting quotation marks around numbers will also turn them into characters, which can get confusing.
# Create a vector of characters x < c("1", "4", "5", "7", "8") # View vector x ## [1] "1" "4" "5" "7" "8"
You can’t do math with a vector of numbers that are classed as characters.
# Try to do math mean(x) ## Warning in mean.default(x): argument is not numeric or logical: returning NA ## [1] NA
Why? Because R views them as text!
# View class class(x) ## [1] "character"
You can turn this character vector of numbers into a numeric vector using the as.numeric()
function.
as.numeric()
is one way to resolve that issue. Any values that were character will be converted to NA
s. In that scenario you’ll probably want to go back and fix your raw CSV file, but at least now the NAs will help you find where the problem was.
# Turn it into a numeric vector x < as.numeric(x) # View vector x ## [1] 1 4 5 7 8 # View class class(x) ## [1] "numeric"
And then you can turn it back into a character using as.character()
.
# Turn it back into a character x < as.character(x) # View vector x ## [1] "1" "4" "5" "7" "8" # View class class(x) ## [1] "character"
Logical
The logical class is represented by only two possible values: TRUE
or FALSE
(also can be written T
/ F
, but never true
/ false
or t
/ f
).
These values result from any logical statements that are made. For example, in the code below I asked R if the elements of my vector were greater than 5. This returns a logical vector where each element is either TRUE
or FALSE
.
# Create a vector x < c(1, 5, 6, 7, 2, 8) # Are the elements of vector x greater than 5? Store results in vector y y < x > 5 # View y y ## [1] FALSE FALSE TRUE TRUE FALSE TRUE # View class class(y) ## [1] "logical"
You can also create a vector of logical statements.
# Create logical vector x < c(T, F, T, F, F, T) # View vector x ## [1] TRUE FALSE TRUE FALSE FALSE TRUE
And you can convert logical values to numeric values, and back. FALSE
is the same as 0
, while TRUE
is the same as 1
.
# Convert to numeric vector x < as.numeric(x) # View vector x ## [1] 1 0 1 0 0 1 # Convert back to logical vector x < as.logical(x) # View vector again x ## [1] TRUE FALSE TRUE FALSE FALSE TRUE
This also means that you can do math with logical values. This is useful if, for example, you’re trying to see how many TRUE
values you have in your vector. In fact, applying any math operations to a logical vector will automatically convert the values to 1s and 0s.
# View vector x ## [1] TRUE FALSE TRUE FALSE FALSE TRUE # Count how many "TRUE" values there are. There are 3! sum(x) ## [1] 3
Factor
Factors are a special data type that is primarily used to represent repeating categories (i.e., categorical variables). When you specify an object as a factor, you’re telling R to think of it as a categorical variable, with different levels. This can be helpful when analyzing your data, as categorical variables and continuous variables are often handled differently in statistical analyses.
In the code below, I created a data frame showing the height and sex of five individuals.
# Create an example data frame example < data.frame(indiv = c("A", "B", "C", "D", "E"), height = c(15, 10, 12, 9, 17), sex = c("female", "female", "female", "male", "female")) # View structure of data frame str(example) ## 'data.frame': 5 obs. of 3 variables: ## $ indiv : chr "A" "B" "C" "D" ... ## $ height: num 15 10 12 9 17 ## $ sex : chr "female" "female" "female" "male" ...
Right now, the sex
column is a character vector because I entered the data in quotation marks. But really what I want to do is tell R that sex
is a categorical variable, with “female” and “male” as levels. To do that, all I have to do is use the as.factor()
function.
# Change the sex column to be a factor example$sex < as.factor(example$sex) # View the factor example$sex ## [1] female female female male female ## Levels: female male
You can see that R listed the vector and then beneath that, has figured out on its own that the levels are “female” and “male”. When writing the levels, R will sort them in alphabetical order. That’s why the levels are female male
instead of male female
.
You may want to change the order of your factor levels (this can be useful when plotting your data and determining the order in which they appear).
For example, you might have a vector like this:
# Create vector places < factor(c("first", "first", "second", "third", "fifth", "fourth", "second")) # View factor places ## [1] first first second third fifth fourth second ## Levels: fifth first fourth second third
The order of the levels doesn’t make sense. We want it to go from first through fifth in the implied numeric order — not alphabetically. So let’s change the order using factor(vector, levels = c("first", "second", "third", etc.))
.
# Change level order places < factor(places, levels = c("first", "second", "third", "fourth", "fifth")) # View factor places ## [1] first first second third fifth fourth second ## Levels: first second third fourth fifth
Much better!
Factors don’t just have to be text. They can also be integers. For example, in the code below I created a data frame describing the stream width and order of several stream sites. Stream order is not a continuous variable, even though it’s represented by numbers. It would be best to convert stream order to a factor.
# Create data frame example2 < data.frame(stream = c("Patuxent", "Patapsco", "Deer Creek", "Town Creek", "Browns Branch"), width = c(37, 42, 25, 32, 22), order = c(6, 6, 4, 5, 3)) # View data frame structure str(example2) ## 'data.frame': 5 obs. of 3 variables: ## $ stream: chr "Patuxent" "Patapsco" "Deer Creek" "Town Creek" ... ## $ width : num 37 42 25 32 22 ## $ order : num 6 6 4 5 3
R sees stream order as being numeric, which makes sense. But let’s tell R that stream order is a factor.
# Change stream order to a factor example2$order < as.factor(example2$order) # View stream order example2$order ## [1] 6 6 4 5 3 ## Levels: 3 4 5 6
Looks good. Since these are numbers, R just orders the levels in ascending order.
How to check and manipulate data types
As demonstrated throughout this tutorial, it can be useful to check the type of data you’re working with and be able to change it to another type if you need. You might need this especially in situations where you’re reading in data from a .csv, and need to check that all your numbers are numeric instead of characters.
The main way to check your data type is to use the function class()
. If you have a data frame, another easy way to check data types is to use the str()
function. This displays the structure of your data frame and tells you what data type each of your columns is. The example below lists heights over time for five individuals.
# Create an example data frame example < data.frame(indiv = c("A", "B", "C", "D", "E"), height_0 = c(15, 10, 12, 9, 17), height_10 = c(20, 18, 14, 15, 19), height_20 = c(23, 24, 18, 17, 26)) str(example) ## 'data.frame': 5 obs. of 4 variables: ## $ indiv : chr "A" "B" "C" "D" ... ## $ height_0 : num 15 10 12 9 17 ## $ height_10: num 20 18 14 15 19 ## $ height_20: num 23 24 18 17 26
You can see that the column indiv
is a character vector (abbreviated “chr”), while each successive column is numeric (abbreviated “num”).
You also noticed me using functions like is.numeric()
or as.character()
. All of the data types have is.
and as.
functions, where the first one is a logical statement to check the specific data type, asking “is this object of the class XXX?” and returns TRUE
or FALSE
. The as.
functions are actions that convert objects into a new data type. You may find yourself using these often when you’re first formatting your data and preparing it for analysis.
That’s it for data types in R! Keep an eye out for our next tutorial, which will go over different data structures in R like vectors, lists, data frames, and tibbles. I hope this tutorial was helpful! Happy coding!
Also be sure to check out Rbloggers for other great tutorials on learning R
Rbloggers.com offers daily email updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/datascience job.
Want to share your content on Rbloggers? click here if you have a blog, or here if you don't.