Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Welcome to Introduction to R for Data Science Session 2! The course is co-organized by Data Science Serbia and Startit. You will find all course material (R scripts, data sets, SlideShare presentations, readings) on these pages.

[in Serbian]

## Summary of Session 2, 05. may 2016 :: Introduction to R: vectors, matrices, and data frames

Introduction to vectors, matrices, and data frames in R.  R is a vector programming language, which means you will be using vectors, matrices, and n-dimensional arrays a lot. Vectorizing your code means enhanced performance in terms of speed. Data frame objects in R are elementary carriers of most of your data in R; unlike vectors and matrices, data frames can encompass various data types.

## R script :: Session 2

```########################################################
# Introduction to R for Data Science
# SESSION 2 :: 5 May, 2016
# Data Science Community Serbia + Startit
# :: Goran S. Milovanović and Branko Kovač ::
########################################################

# clear all
rm(list=ls());

char_list <- character(length = 0) #empty character list
num_list <- numeric(length = 10) #length can be != 0, but 0 is default value
log_list <- logical(length = 3) #default value is FALSE

# But you can always use good ol' c() for the same purpose
log_list_2 <- c(TRUE, FALSE, FALSE, TRUE, TRUE, TRUE) #some Ts and Fs
num_list_2 <- c(1, 4, 12, NA, 101, 999) #numb
char_list_2 <- c("abc", "qwerty", "test", "data", "science")

# Factor vectors are also part of R
fac_list <- gl(n = 4, k = 1, length = 8, ordered = T,
labels = c("low", "med", "high", "very high")) #only mentioning now :)

# Subsetting is regular-thing-to-do when using R
char_list_2[5] #single element can be selected
log_list_2[2:4] #or some interval
num_list_2[3:length(num_list_2)] #or even length() function

# New objects can be created when subsetting
test <- num_list_2[-c(2,4)] #or somthing like this - displays all but 2nd and 4th element
test_2 <- num_list_2 %in% test #operator %in% can be very useful
not_na <- num_list_2[!is.na(num_list_2)] #removing NAs using operator ! and is.na() function

# Vector ordering
sort(test, decreasing = T) #using sort() function
test[order(test, decreasing = T)] #or with order() function

# Vector sequences
seq(1,22,by = 2) #we already mentioned seq()
rep(1, 4) #but rep() is something new :)
rep(num_list_2, 2) #replicate num_list_2, 2 times

# Concatenation
new_num_vect <- c(num_list, num_list_2) #using 2 vectors to create new one
new_num_vect
new_combo_vect <- c(num_list_2, log_list) #combination of num and log vector
new_combo_vect #all numbers? false to zero? coercion in action

new_combo_vect_2 <- c(char_list_2, num_list_2) #works as well
new_combo_vect_2 #where are the numbers?
class(new_combo_vect_2) #all characters

# Matrices are available in R
matr <- matrix(data = c(1,3,5,7,NA,11), nrow = 2, ncol = 3) #2x3 matrix
class(matr) #yes, it's matrix
typeof(matr) #double as expected

matr[,2] #2nd column
matr[3,] #oops, out of bounds, there's no 3rd row
matr[2,3] #element in 2nd row and 3rd column

matr_2 <- matrix(data = c(1,3,5,"7",NA,11), nrow = 2, ncol = 3) #another 2x3 matrix
class(matr_2) #matrix again
typeof(matr_2) #but not double anymore, type conversion in action!
t(matr_2) #transponed matr_2

# What can we do if a matrix needs to encompass different types of data?
# Introducing data frame!

library(datasets) #there are some datasets in base R like mtcars
cars_data <- mtcars

# Some useful information about data frames
str(cars_data) #lets see what we have here
names(cars_data) #column names
?mtcars #dataset documentation is *very* important

# Think of data frame columns as vectors! Because they are!
mean(cars_data\$mpg) #mean of cars_data mpg (miles per galon) column
median(cars_data\$cyl) #median of cars_data cyl (cylinders) column

is.list(cars_data[1,]); #but rows are lists!

# Lets do some data frame subsetting

cars_data[-1, ] # first row out
cars_data[ ,-1] # first column out

cars_data[c(1,3)] #keeping 1st and 3rd column only
cars_data[-c(1,3)] #removing 1st and 3rd column
cars_data[ ,-c(1,3)] #same as the previous line of code

cars_data[!duplicated(cars_data\$mpg), ] #maybe we want to remove all cars with same mpg?
#remember it keeps only the first occurence!

subset(cars_data, mpg < 19) #this is one way (and it can be slow!)
cars_data[cars_data\$mpg < 19, ] #this is another one (faster)
cars_data[which(cars_data\$mpg < 19), ] #and another one (usually even more faster)

cars_data[cars_data\$mpg > 20 & cars_data\$am == 1, ] #multiple conditions

cars_data[grep("Merc", row.names(cars_data), value=T), ] #filtering by pattern match

# Data frame transformations
cars_data\$trans <- ifelse(cars_data\$am == 0, "automatic", "manual") #we can add new colums
cars_data\$trans <- NULL #or we can remove them

cars_data[c(1:3,11,4,7,5:6,8:10)] #this way we change column order

# Separation and joining of data frames
low_mpg <- cars_data[cars_data\$mpg < 15, ] #new data frame with mpg < 15
high_mpg <- cars_data[cars_data\$mpg >= 15, ] #new data frame with mpg >= 15

mpg_join <- rbind(low_mpg, high_mpg) # we can combine 2 data frames like this

car_condition <- data.frame(sample(c("old","new"), replace = T, size = 32)) #creating random data frame
#with "old" and "new" values
names(car_condition) <- "condition" #for all kinds of objects
colnames(car_condition) <- "condition" #for "matrix-like" objects, but same effect here
rownames(car_condition) <- rownames(cars_data) #use row names of one data frame as row names of other

mpg_join <- cbind(mpg_join, car_condition) #or combine data frames like this```

## Readings :: Session 3 [12. May, 2016, @Startit.rs, 19h CET]

Chapters 1 - 5, The Art of R Programming, Norman Matloff

• Intro to R
• Vectors and Matrics
• Lists