Classes and objects in R
c()and I’ll go a bit beyond that this time. Speaking of the
c()function, I’ll begin this post by divulging the answer to the Challenge from last time.
c()function and the multiplication operator, all packed inside another call to
c(c(1, 2, 3, 4, 5) * 1, c(1, 2, 3, 4, 5) * 2, c(1, 2, 3, 4, 5) * 3)  1 2 3 4 5 2 4 6 8 10 3 6 9 12 15
Alternatively, you could have used a function to do this:
kronecker(c(1, 2, 3, 4, 5), c(1, 2, 3))  1 2 3 2 4 6 3 6 9 4 8 12 5 10 15
*because that’s what the post was about, but Bill Gates once said “I’d hire a lazy person over a hardworking one because the lazy person would find the simplest way to do something”. My interpretation of this – Code hard, but code smart.
The Kronecker product example is a good lead in to this post also because I gave an example with a pair of matrices. How did I tell R that my two groups of 4 number were matrices and not vectors or something else? The short answer is to reveal the code:
a <- matrix(c(1, 1, 1, 2), nrow = 2) b <- matrix(c(1, 3, 2, 4), ncol = 2) kronecker(a, b) [,1] [,2] [,3] [,4] [1,] 1 2 1 2 [2,] 3 4 3 4 [3,] 1 2 2 4 [4,] 3 4 6 8
=. These create an object in R's memory that can be called back into the command window at any time. Once I've defined
bas I have above, I can simply call them by name like I did in my call to the function
kronecker(). This is great for many obvious reasons. Like I said earlier, programming in R requires lots of trial and error and you can certainly save some time and keystrokes by naming something once and calling it by a few letters afterwards. Some facts of life pertaining to the assignment operator:
- The arrow assignment operators always point towards the object name (try reversing the arrow in the statement above that defines
a- you get an error because you can't assign letters to a matrix, that doesn't make sense.)
- Always use either <- or ->, the equal sign can get confusing. It's not always clear what is the “name” and what is the “meat” of your object. An arrow does away with this confusion.
This is a good segway into the main portion of this post. Our two objects
bare each described as a “2x2 double matrix”. What does this mean? It means 3 things.
1. This object is a matrix. This is a consequence of how we defined it - we used the
matrix()function to create
b, hence they are matrices.
2. The size of the matrix is 2 by 2. A matrix with 3 rows and 2 columns would be 3x2, etc.
3. Our matrix is populated by numbers of the class “double”. To help explain this, I'll steal a quote from the R help page about double objects:
All R platforms are required to work with values conforming to the IEC 60559 (also known as IEEE 754) standard. This basically works with a precision of 53 bits, and represents to that precision a range of absolute values from about 2e-308 to 2e+308. It also has special values NaN (many of them), plus and minus infinity and plus and minus zero (although R acts as if these are the same).
In other words this is a pretty standard way of representing some number in such a way that most computers and programs can universally recognize them as what they are.
class(a)  "matrix"
a, as a whole, is a matrix. The
class()function is extremely useful.
dim(a)  2 2
ais 2x2 (“dim” is short for “dimensions”). I also frequently use the
class(a)  "numeric"
This tells us that the first element in
ais of the class “numeric”. You're thinking “Wait, what were the brackets in there? What do those do?”. Excellent question! Brackets are how you index into objects to pull out individual components. If I want to know what the fourth element in
ais, I would type:
a  2
afor you and displays it neatly in the corner of your screen to help you keep track of the properties of different objects as you accumulate many objects in your work space. Click on one of the objects in the “Work space”. Neat huh? These are all little perks of R Studio that make life in R a little more organized.
Back to the issue of classes
c()function we used in the last post to play with operators? Try this:
class(c(1, 2, 3, 4, 5))  "numeric"
c <- pi d <- sqrt(2)
pi.) Above, I have not explicitly defined
d, but rather defined them as the result of some mathematical operation. After all, I could not explicitly define either of these numbers, they are both irrational!
If you want to try to coerce some object into a numeric value, a function exists for this:
as.numeric(a)  1 1 1 2
c()function instead of the
If you want to check if an object is numeric already, there is a function for that too:
is.numeric(a)  TRUE
awas a matrix, but now it's numeric?? That's right, because every element of
is.numeric()returns true. If one element of
awere a letter for example, R would return FALSE. The is.“something”() and as.“something”() functions are sort of universal for any class.
e <- as.integer(3) is.integer(e)  TRUE is.numeric(e)  TRUE
ebelongs to two classes - integer and numeric.
Note that the inverse is NOT true:
f <- 3 is.integer(f)  FALSE is.numeric(f)  TRUE
f <- 3is the same as
f <- as.numeric(3).
g <- c(1, 2, 3, 4, 5) <= 3 class(g)  "logical"
gis a logical vector. There exist functions
as.logical(), just like for the other classes we've discussed. The
as.logical()function classifies 0s as FALSE and anything other than 0 as TRUE. Observe:
as.logical(c(0, 1, 2))  FALSE TRUE TRUE
as.numeric(g)  1 1 1 0 0
as.integer()would do the same thing, but the result would be of the subclass “integer” that is a subclass of “numeric”. This conversion of logical values to numeric values can be quite useful. For example, suppose I want to know how many students in a class are of legal drinking age and I have a list of their ages:
ages <- c(20, 21, 19, 22, 19, 20, 22, 21, 20, 19, 21) sum(ages >= 21)  5
sum()function to a logical vector and it returns to me a numeric answer. If I have a long vector of ages, this method is much easier than counting by eye the number of students older than 21.
I can also use a logical vector to pick out elements of a vector that satisfy some condition (or many conditions). Recall the brackets used to identify certain elements within an object.
ages[ages < 21]  20 19 19 20 20 19
agesvector all values that satisfy my condition (are less than 21). Once again, very useful.
h <- "string" h  "string"
as.character()function and an
is.character()function as there were for other classes, but note that many operators no longer work with strings. One might think that
'ab' + 'c'would yield
'abc', but this is not the case. R returns an error. Similarly the other mathematical operators return an error when applied to strings. Some logical operators still work though. Try:
"string" %in% "character string"  FALSE
"string" %in% c("character", "string")  TRUE
As an aside, I'd like to point out that some other operators also work on strings in a somewhat nonsensical way:
"a" < "b"  TRUE
"b" < "abc"  FALSE
Back to comparing strings. If
'character string', how do we search for certain patterns regardless of whether they constitute a whole character object or just part of one? Excellent question. This comes up sometimes in statistics when you deal with categorical data. Not everything you measure is a number. Some data is more “multiple choice”. The thing you're observing belong to categories (i.e. blue, green, purple, blue-green, or black). What if I simply want to know how many observations contained 'green'? I could of course search for 'green' and 'blue-green' and add them, but I could also do something more elegant. Meet the “g-something” family of functions:
i <- c("blue", "green", "purple", "blue-green", "black") grep("green", i)  2 4
icontained the pattern
Note the arguments of this function are:
grep('pattern', x), where the pattern is what you're searching for and x is what you're searching through. In our case, x is
iand it is a character object with 5 elements. (I often forget what comes first, the pattern or the x). There is an additional optional argument - ignore.case which is by default FALSE, but can be set to TRUE. For example:
grep("Green", i, ignore.case = FALSE) integer(0)
ireturns the first element of
i[grep('green',i)]returns all the elements in
ithat contain the pattern
This answers the question “Which elements contain my pattern?” one way, but there's another way to answer the same question.
grepl("string", "character string")  TRUE
grepl()stands for “logical”. This function returns a logical vector of the same length as your initial vector.
grepl() to pull out only those elements of
i that contain the pattern
Hint: Set it up like I did with
grep(), but throw in a logical operator too.
gsub("bl", "X", i)  "Xue" "green" "purple" "Xue-green" "Xack"
gsub()stands for… You guessed it, “substitute”. It searches for a pattern and when it finds that pattern, substitutes it with some replacement that you specify.
subject.names <- c("Jane", "Jill", "Bob", "Bill", "Grace", "Patrick") treatment <- c("A", "A", "B", "B", "Placebo", "Placebo") treatment.f <- as.factor(treatment)
is.factor()and you can guess what they do).
treatment.fare now totally different objects. This is especially useful for statistical analysis which I'll talk a lot about later on, but for now I just want you to know that factors exist and that they are similar to strings because they deal with non-numeric information, but they are also very different from strings. There are a couple functions that you can call on factors that are very useful. The first is
levels(treatment.f)  "A" "B" "Placebo"
treatment.fas categorical data and automatically identifies all of the categories for you. These are returned by using the
summary(treatment.f) A B Placebo 2 2 2
summary()is extremely useful. It shows you the categories and how many members each has. Try calling summary on plain old
treatment. This still returns some information about
treatment, but it is much less informative if we are treating this as categorical data instead of just a character object. Try calling
summary()on some other objects we've created as well. This is a very useful function in general.
names[grep("placebo", treatment.f, ignore.case = TRUE)] Error: object of type 'builtin' is not subsettable
Dates are crucial. Almost every experiment takes place over time and a good experimentor accounts for this. If you ever do research, you will at some point encounter dates in your data. The passage of time is the only thing more certain than gravity and taxes. Date values in a computer program are tricky. They can't be alphabetized, but they obviously have a natural order. For the computer to recognize and take advantage of this, you must first tell the computer that it's dealing with dates and not funny division problems (10/21/2012) or subtraction problems (10-21-2012). Here's an example:
as.Date("10/21/2012", format = "%m/%d/%Y")  "2012-10-21"
1. I used an
as.something()function. You saw this coming. This one capitolizes Date though - all the others were lower case (
as.integer(), etc). Curve ball. Whoah.
2. I entered my date value as a string. If I hadn't, it would have tried to convert 10 divided by 21 divided by 2012 into a date.
3. The computer returns it in a different format (Year-Month-Day). This is the computer's preffered format and what it will always convert dates to, regardless of how you enter it.
format =argument. This is crucial.
The percent sign followed by a letter causes R to expect a specific type of entry. For example where you specify
%m, R now expects a number 1-12 that it assumes corresponds to a month. (Also note the delimiters in between my %something's. In this case I have seperated my days/months/years with a slash, but I could have also used a dash or a space.) If you tried to put a 13 in the %m (month) spot, R would be confused and angry, and it would return an NA instead of a Date object. R returns an NA (a missing value essentialy) for other impossible inputs as well. Take for example Feb. 29th, 1900 - a leap day, except for the fact that every milleneium we skip a leap day:
as.Date("29-2-1900", format = "%d-%m-%Y")  NA
The capitol Y in
%Yindicates that this is where you are going to put a year, and the fact that it's capitol means that it is going to be 4 digits instead of 2. There a bunch of these %something formattings that you can use. For a decent overview, type
?strptimeinto your command line, look at the panel to the right of your command panel in R Studio and scroll down some.
BTW, you've just discovered one use for this panel in R Studio. Type ? and the name of any function you have a question about. The help documentation on that function pops up in the lower right-hand panel of your R Studio window. This is infinitely useful.
Challenge, Part II:
Convert 'Feb 28 1900' to a date.
Hint: Use the help page.
thanksgiving <- as.Date("11/22/2012", format = "%m/%d/%Y") christmas <- as.Date("12/25/2012", format = "%m/%d/%Y") christmas - thanksgiving Time difference of 33 days
christmas < thanksgiving  FALSE