**Fear and Loathing in Data Science**, and kindly contributed to R-bloggers)I wanted to avoid advanced topics in this post and focus on some “blocking and tackling” with R in an effort to get novices started. This is some of the basic code I found useful when I began using R just over 6 weeks ago.

Reading in data from a .csv file is a breeze with this command.

> data = read.csv(file.choose())

No need to have your own data set as R comes with data packages already.

> data() #list the datasets available in R

> # load the dataset ‘cars’ and display the variables

> data(cars)

> head(cars)

speed dist

1 4 2

2 4 10

3 7 4

4 7 22

5 8 16

6 9 10

#the command head() gives shows we have two variables, car speed and stopping distance along with the first 6 rows of data

#using attach() splits the data into separate columns and avoids having to use what I feel is the pesky $

> attach(cars)

# descriptive statistics of our two variables

> summary(cars)

speed dist

Min. : 4.0 Min. : 2.00

1st Qu.:12.0 1st Qu.: 26.00

Median :15.0 Median : 36.00

Mean :15.4 Mean : 42.98

3rd Qu.:19.0 3rd Qu.: 56.00

Max. :25.0 Max. :120.00

> # univariate plots for speed

> #scatterplot for speed and dist

> plot(speed,dist)

boxplot(speed, dist, notch=T)

# you can use [] to create a subset. Here is how to get rows 1 thru 10 of both variables

> subsetcars = cars[1:10, ]

> subsetcars

speed dist

1 4 2

2 4 10

3 7 4

4 7 22

5 8 16

6 9 10

7 10 18

8 10 26

9 10 34

10 11 17

#rows 1 thru 5 of just speed

> subspeed = cars[1:5, 1]

> subspeed

[1] 4 4 7 7 8

# Observations where stopping distance is greater than 50

> stop = cars[dist > 50, ]

> stop

speed dist

22 14 60

23 14 80

26 15 54

33 18 56

34 18 76

35 18 84

38 19 68

41 20 52

42 20 56

43 20 64

44 22 66

45 23 54

46 24 70

47 24 92

48 24 93

49 24 120

50 25 85

# and finally the correlation

> cor(speed, dist)

[1] 0.8068949

