Data Viz and Manipulation P1
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
This is the start of the tutorial series, where we will cover visualising and manipulating data in R. There will be a series of mini checkpoints that should be used as a guide to check understanding.
To check the most basic functionality of R (you can use it as a calculator) what does 9+3
equal?
9+3 ## [1] 12
Checkpoint 1: Were you able to get 12 as an answer?
Another cool thing we can do is store variables for example we can have a variable x
which is the sum of 9+3
earlier.
x = 9+3
After running x= 9+3
if we type x
and hit enter like below then our x variable is printed.
x ## [1] 12
Now that we have our variable x stored, instead of going 9+3+4
we can just go x + 4
Now that we have variable x
stored, we can use x
instead of 9+3
x + 4
Now that we have the very basics sorted, lets try something a little bit more interesting….
Rembering Tony Locketts Career
Tony Lockett is the games leading goal kicker, and is easily one of the best to lace them up.
When thinking about data, we can either enter it in manually or we can get the data in a pre-processed format be it from a R package or other.
Lets pretend for a second that we didn’t have such a good R package for AFL data. We would go to a site like afltables and enter in his data manually in a csv file to analyse.
We can also do this in R. So instead of in excel entering the data in cells we would enter each column as a vector
Year = c( 1983, 1984, 1985, 1986, 1987, 1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2002) GL = c(19,77, 79, 60, 117, 35, 78, 65, 127, 132, 53, 56, 110, 121, 37, 109, 82, 3) GM=c(12, 20, 21, 18, 22, 8, 11, 12, 17, 22, 10, 10, 19, 22, 12, 23, 19,3)
This would give us 3 variables * Year – Season that Tony Lockett played * GL – Goals kicked in season by Tony Lockett * GM – Total games played by Tony Lockett in season
We can view these just by typing in the variables once we have created them.
Checkpoint 2 Are you able to print the vectors you have created?
Year ## [1] 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 ## [15] 1997 1998 1999 2002 GL ## [1] 19 77 79 60 117 35 78 65 127 132 53 56 110 121 37 109 82 ## [18] 3 GM ## [1] 12 20 21 18 22 8 11 12 17 22 10 10 19 22 12 23 19 3
Basic arithmetic are done element wise in R, for example lets say we wanted Tony Lockets average goals per game GL_GM
GL_GM = GL /GM GL_GM ## [1] 1.583333 3.850000 3.761905 3.333333 5.318182 4.375000 7.090909 ## [8] 5.416667 7.470588 6.000000 5.300000 5.600000 5.789474 5.500000 ## [15] 3.083333 4.739130 4.315789 1.000000
Arithmetic operations involving a scalar (a consistent number applied to all values) and a vector (like Year) act element wise aswell. For example, the command below substract 1966 from each element of our year vector. Because Lockett was born in 1966 , this gives us his age in each season of his career.
age = Year - 1966 age ## [1] 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 36
Checkpoint 3 – Are you able to get the graph below.
Next let’s plot Lockett’s goals per game by age:
plot(age, GL_GM, type="l", col="red", main="Tony Lockets Average Goals Per Game by Age")
Plot has a lot of options in R, to get a feel for them all simply put a question mark before the function and R will help you out!
?plot
Indexing in R
Lets say we wanted to get Tony Lockets first 3 years goals per game we would do this using the square brackets in R
GL_GM[1:3] ## [1] 1.583333 3.850000 3.761905
We can also remove data we don’t want, for example Tony Lockett retired and came back. So maybe we don’t want to have his comeback year as part of our analysis. We would remove it using negative index.
GL_GM[-c(18)] ## [1] 1.583333 3.850000 3.761905 3.333333 5.318182 4.375000 7.090909 ## [8] 5.416667 7.470588 6.000000 5.300000 5.600000 5.789474 5.500000 ## [15] 3.083333 4.739130 4.315789
Which we should compare to the original GL_GM
GL_GM ## [1] 1.583333 3.850000 3.761905 3.333333 5.318182 4.375000 7.090909 ## [8] 5.416667 7.470588 6.000000 5.300000 5.600000 5.789474 5.500000 ## [15] 3.083333 4.739130 4.315789 1.000000
What if we wanted to find out the values of when Tony Lockett played more than 10 games, we could just go GM>10
GM>10 ## [1] TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE FALSE ## [12] FALSE TRUE TRUE TRUE TRUE TRUE FALSE GM[GM>10] ## [1] 12 20 21 18 22 11 12 17 22 19 22 12 23 19
So I gather at this point you are probably thinking “Hey mate this isn’t the cool tidyverse stuff I see online”
Well that is true so lets change tack and move to using tidyverse and fitzRoy for cool AFL things.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.