# Data Viz and Manipulation P1

May 19, 2018
By

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This is the start of the tutorial series, where we will cover visualising and manipulating data in R. There will be a series of mini checkpoints that should be used as a guide to check understanding.

To check the most basic functionality of R (you can use it as a calculator) what does `9+3` equal?

``9+3``
``##  12``

# Checkpoint 1: Were you able to get 12 as an answer?

Another cool thing we can do is store variables for example we can have a variable `x` which is the sum of `9+3` earlier.

``x = 9+3``

After running `x= 9+3` if we type `x` and hit enter like below then our x variable is printed.

``x``
``##  12``

Now that we have our variable x stored, instead of going `9+3+4` we can just go `x + 4`

# Now that we have variable `x` stored, we can use `x` instead of `9+3`

`x + 4`

Now that we have the very basics sorted, lets try something a little bit more interesting….

# Rembering Tony Locketts Career

Tony Lockett is the games leading goal kicker, and is easily one of the best to lace them up.

When thinking about data, we can either enter it in manually or we can get the data in a pre-processed format be it from a R package or other.

Lets pretend for a second that we didn’t have such a good R package for AFL data. We would go to a site like afltables and enter in his data manually in a csv file to analyse.

We can also do this in R. So instead of in excel entering the data in cells we would enter each column as a vector

``````Year = c( 1983, 1984, 1985, 1986, 1987, 1988, 1989, 1990, 1991, 1992, 1993,
1994, 1995, 1996, 1997, 1998, 1999, 2002)
GL = c(19,77,  79,  60, 117,  35,  78,  65, 127, 132,  53,  56, 110,
121,  37, 109,  82,  3)

GM=c(12, 20, 21, 18, 22,  8, 11, 12, 17, 22, 10, 10, 19, 22, 12, 23, 19,3)``````

This would give us 3 variables * Year – Season that Tony Lockett played * GL – Goals kicked in season by Tony Lockett * GM – Total games played by Tony Lockett in season

We can view these just by typing in the variables once we have created them.

# Checkpoint 2 Are you able to print the vectors you have created?

``Year``
``````##   1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996
##  1997 1998 1999 2002``````
``GL``
``````##    19  77  79  60 117  35  78  65 127 132  53  56 110 121  37 109  82
##    3``````
``GM``
``##   12 20 21 18 22  8 11 12 17 22 10 10 19 22 12 23 19  3``

Basic arithmetic are done element wise in R, for example lets say we wanted Tony Lockets average goals per game `GL_GM`

``````GL_GM = GL /GM
GL_GM``````
``````##   1.583333 3.850000 3.761905 3.333333 5.318182 4.375000 7.090909
##   5.416667 7.470588 6.000000 5.300000 5.600000 5.789474 5.500000
##  3.083333 4.739130 4.315789 1.000000``````

Arithmetic operations involving a scalar (a consistent number applied to all values) and a vector (like Year) act element wise aswell. For example, the command below substract 1966 from each element of our year vector. Because Lockett was born in 1966 , this gives us his age in each season of his career.

``````age = Year - 1966
age``````
``##   17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 36``

# Next let’s plot Lockett’s goals per game by age:

``plot(age, GL_GM, type="l", col="red", main="Tony Lockets Average Goals Per Game by Age")`` Plot has a lot of options in R, to get a feel for them all simply put a question mark before the function and R will help you out!

``?plot``

# Indexing in R

Lets say we wanted to get Tony Lockets first 3 years goals per game we would do this using the square brackets in R

``GL_GM[1:3]``
``##  1.583333 3.850000 3.761905``

We can also remove data we don’t want, for example Tony Lockett retired and came back. So maybe we don’t want to have his comeback year as part of our analysis. We would remove it using negative index.

``GL_GM[-c(18)]``
``````##   1.583333 3.850000 3.761905 3.333333 5.318182 4.375000 7.090909
##   5.416667 7.470588 6.000000 5.300000 5.600000 5.789474 5.500000
##  3.083333 4.739130 4.315789``````

Which we should compare to the original `GL_GM`

``GL_GM``
``````##   1.583333 3.850000 3.761905 3.333333 5.318182 4.375000 7.090909
##   5.416667 7.470588 6.000000 5.300000 5.600000 5.789474 5.500000
##  3.083333 4.739130 4.315789 1.000000``````

What if we wanted to find out the values of when Tony Lockett played more than 10 games, we could just go `GM>10`

``GM>10``
``````##    TRUE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE FALSE
##  FALSE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE``````
``GM[GM>10]``
``##   12 20 21 18 22 11 12 17 22 19 22 12 23 19``

So I gather at this point you are probably thinking “Hey mate this isn’t the cool tidyverse stuff I see online”

Well that is true so lets change tack and move to using tidyverse and fitzRoy for cool AFL things.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.