# ThinkStats … in R :: Example 1.3

March 7, 2012
By

(This article was first published on rud.is » R, and kindly contributed to R-bloggers)

With 1.2 under our belts, we go now to the example in section 1.3 which was designed to show us how to partition a larger set of data into subsets for analysis. In this case, we’re going to jump to example 1.3.2 to determine the number of live births.

While the Python loop is easy to write, the R code is even easier:

``` 1 livebirths <- subset(pregnancies,outcome==1) ```

First: don’t let the `<-` throw you off. It's just a more mathematical presentation of "`=`" (the assignment operator). While later versions of R support using `=` for assignment operations, it's considered good form to continue to use the left arrow.

The `subset` function will traverse `pregnancies`, looking for fields (variables) that meet the boolean expression `outcome == 1` and place all those records into `livebirths`.

You can apply any amount of field logic to the `subset` function, as asked for by example 1.3.3:

``` 1 firstbabies <- subset(pregnancies,birthord==1 & outcome==1) ```

Since R was built for statistical computing, it's no surprise that to solve example 1.3.4 all we have to do is ask R to return the mean of that portion of the data frame:

``` 1 2 mean(firstbabies\$prglength) mean(notfirstbabies\$prglength) ```

(Here's a refresher on the basics of R data frame usage in case you skipped over that URL in the first post.)

To get the ~13hrs difference the text states, it's simple math. Just subtract the two values, multiply by 7 (days in a week) and then again by 24 (hours in a day).

In the next post, we'll begin to tap into the more visual side of R, but for now, play around with the following source code as you finish working through chapter one of Think Stats (you can also download the book for free from Green Tea Press).

``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 # ThinkStats in R by @hrbrmstr # Example 1.3 # File format info: http://www.cdc.gov/nchs/nsfg/nsfg_cycle6.htm   # setup a data frame that has the field start/end info   pFields <- data.frame(name = c('caseid', 'nbrnaliv', 'babysex', 'birthwgt_lb','birthwgt_oz','prglength', 'outcome', 'birthord', 'agepreg', 'finalwgt'), begin = c(1, 22, 56, 57, 59, 275, 277, 278, 284, 423), end = c(12, 22, 56, 58, 60, 276, 277, 279, 287, 440) )   # calculate widtds so we can pass them to read.fwf()   pFields\$width <- pFields\$end - pFields\$begin + 1   # we aren't reading every field (for the book exercises)   pFields\$skip <- (-c(pFields\$begin[-1]-pFields\$end[-nrow(pFields)]-1,0))   widths <- c(t(pFields[,4:5])) widths <- widths[widths!=0]   # read in the file   pregnancies <- read.fwf("2002FemPreg.dat", widths)   # assign column names   names(pregnancies) <- pFields\$name   # divide mother's age by 100   pregnancies\$agepreg <- pregnancies\$agepreg / 100   # convert weight at birth from lbs/oz to total ounces   pregnancies\$totalwgt_oz = pregnancies\$birthwgt_lb * 16 + pregnancies\$birthwgt_oz   rFields <- data.frame(name = c('caseid'), begin = c(1), end = c(12) )   rFields\$width <- rFields\$end - rFields\$begin + 1 rFields\$skip <- (-c(rFields\$begin[-1]-rFields\$end[-nrow(rFields)]-1,0))   widths <- c(t(rFields[,4:5])) widths <- widths[widths!=0]   respondents <- read.fwf("2002FemResp.dat", widths) names(respondents) <- rFields\$name   # exercise 1 # not exactly the same, but even more info is provided in the summary from str()   str(respondents) str(pregnancies)   # for exercise 2 # use subset() on the data frames # again, lazy use of str() for output   str(livebirths)   livebirths <- subset(pregnancies,outcome==1)   # exercise 3   firstbabies <- subset(pregnancies,birthord==1 & outcome==1) notfirstbabies <- subset(pregnancies,birthord > 1 & outcome==1)   str(firstbabies) str(notfirstbabies)   # exercise 4   mean(firstbabies\$prglength) mean(notfirstbabies\$prglength)     hours = (mean(firstbabies\$prglength) - mean(notfirstbabies\$prglength)) * 7 * 24 hours ```

To leave a comment for the author, please follow the link and comment on their blog: rud.is » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags:

Comments are closed.

## Sponsors

Contact us if you wish to help support R-bloggers, and place your banner here.

# Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts.(You will not see this message again.)

Click here to close (This popup will not appear again)