Getting started with R

June 25, 2013
By

(This article was first published on Fear and Loathing in Data Science, and kindly contributed to R-bloggers)

I wanted to avoid advanced topics in this post and focus on some “blocking and tackling” with R in an effort to get novices started.  This is some of the basic code I found useful when I began using R just over 6 weeks ago.
Reading in data from a .csv file is a breeze with this command.
> data = read.csv(file.choose())
No need to have your own data set as R comes with data packages already.
> data()  #list the datasets available in R
> # load the dataset ‘cars’ and display the variables
> data(cars)
> head(cars)
  speed   dist
1     4      2
2     4     10
3     7      4
4     7     22
5     8     16
6     9     10
#the command head() gives shows we have two variables, car speed and stopping distance along with the first 6 rows of data
#using attach() splits the data into separate columns and avoids having to use what I feel is the pesky $
> attach(cars)
# descriptive statistics of our two variables
> summary(cars)
     speed               dist      
 Min.   : 4.0          Min.   :  2.00 
 1st Qu.:12.0       1st Qu.: 26.00 
 Median :15.0     Median : 36.00 
 Mean   :15.4      Mean   : 42.98 
 3rd Qu.:19.0      3rd Qu.: 56.00 
 Max.   :25.0       Max.   :120.00 
> # univariate plots for speed

> plot(speed)

> hist(speed)

> #scatterplot for speed and dist

> plot(speed,dist)
boxplot(speed, dist, notch=T)
# you can use [] to create a subset.  Here is how to get rows 1 thru 10 of both variables
> subsetcars = cars[1:10, ]
> subsetcars
     speed   dist
1       4          2
2       4         10
3       7          4
4       7         22
5       8         16
6       9         10
7      10        18
8      10        26
9      10        34
10    11        17
#rows 1 thru 5 of just speed
> subspeed = cars[1:5, 1]
> subspeed
[1] 4 4 7 7 8
# Observations where stopping distance is greater than 50
> stop = cars[dist > 50, ]
> stop
   speed dist
22    14   60
23    14   80
26    15   54
33    18   56
34    18   76
35    18   84
38    19   68
41    20   52
42    20   56
43    20   64
44    22   66
45    23   54
46    24   70
47    24   92
48    24   93
49    24  120
50    25   85
# and finally the correlation
> cor(speed, dist)
[1] 0.8068949

To leave a comment for the author, please follow the link and comment on their blog: Fear and Loathing in Data Science.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)