# Build a data frame from vectors

May 18, 2020
By

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Tabular data is the most common format used by data scientists. In R, tables are respresented through data frames. They can be inspected by printing them to the console.

• Understand why data frames are important
• Interpret console output created by a data frame
• Create a new data frame using the `data.frame()` function
• Define vectors to be used for single columns
• Specify names of data frame columns
```data.frame(___ = ___,
___ = ___,
...)```

## Introduction to Data Frames

In analysis and statistics, tabular data is the most important data structure. It is present in many common formats like Excel files, comma separated values (CSV) or databases. R integrates tabular data objects as first-class citizens into the language through data frames. Data frames allow users to easily read and manipulate tabular data within the R language.

Let’s take a look at a data frame object named `Davis`, from the package carData, which includs height and weight measurements for 200 men and women:

`Davis`
```  sex weight height repwt repht
1   M     77    182    77   180
2   F     58    161    51   159
3   F     53    161    54   158
[ reached 'max' / getOption("max.print") -- omitted 197 rows ]```

From the printed output we can see that the data frame spans over 200 rows (3 printed, 197 omitted) and 5 columns. In the example above, each row contains data of one person through attributes, which correspond to the columns `sex`, `weight`, `height`, reported weight `repwt` and reported height `repht`.

For example, the first row in the table specifies a `M`ale weighing `77`kg and has a height of `182`cm. The reported weights are very close with `77`kg and `180`cm, respectively.

The rows in a data frame are further identified by row names on the left which are simply the row numbers by default. In the case of the `Davis` dataset above the row names range from 1 to 200.

## Quiz: Data Frame Output

```      rank discipline yrs.since.phd yrs.service  sex salary
1     Prof          B            19          18 Male 139750
2     Prof          B            20          16 Male 173200
3 AsstProf          B             4           3 Male  79750
[ reached 'max' / getOption("max.print") -- omitted 394 rows ]```

The data frame above shows the nine-month academic salary for Assistant Professors, Associate Professors and Professors in a college in the U.S.

• The data frame has 3 rows.
• The data frame has 394 rows.
• The data frame has 397 rows.
• The data frame has 6 attributes.
• The attribute names contain `Prof` and `AsstProf`

Start Quiz

## Quiz: Data Frame Output (2)

```      rank discipline yrs.since.phd yrs.service  sex salary
1     Prof          B            19          18 Male 139750
2     Prof          B            20          16 Male 173200
3 AsstProf          B             4           3 Male  79750
[ reached 'max' / getOption("max.print") -- omitted 394 rows ]```

The data frame above shows the nine-month academic salary for Assistant Professors, Associate Professors and Professors in a college in the U.S.

• All three are male.
• The salaries of all three members are about the same.
• The Professor in row three is most probably be the oldest.
• All shown professors are from the same discipline.
• The highest salary amongst the three Professors is \$139,750.

Start Quiz

## Creating Data Frames

```data.frame(___ = ___,
___ = ___,
...)```

Data frames hold tabular data in various columns or attributes. Each column is represented by a vector of different data types like numbers or characters. The `data.frame()` function supports the construction of data frame objects by combining different vectors to a table. To form a table, vectors are required to have equal lengths. A data frame can also be seen as a collection of vectors connected together to form a table.

Let’s create our first data frame with four different persons including their ids, names and indicators if they are female or not. Each of these attributes is created by a different vector of different data types (numeric, character and logical). The attributes are finally combined to a table using the `data.frame()` function:

```data.frame(
c(1, 2, 3, 4),
c("Louisa", "Jonathan", "Luigi", "Rachel"),
c(TRUE, FALSE, FALSE, TRUE)
)```
```  c.1..2..3..4. c..Louisa....Jonathan....Luigi....Rachel..
1             1                                     Louisa
2             2                                   Jonathan
3             3                                      Luigi
4             4                                     Rachel
c.TRUE..FALSE..FALSE..TRUE.
1                        TRUE
2                       FALSE
3                       FALSE
4                        TRUE```

The resulting data frame stores the values of each vector in a different column. It has four rows and three columns. However, the column names printed on the first line seem to include the column values separated by dots which is a very strange naming scheme!

Column names can be included into the `data.frame()` construction as argument names preceding the values of column vectors. To improve the column naming of the previous data frame we can write

```data.frame(
id = c(1, 2, 3, 4),
name = c("Louisa", "Jonathan", "Luigi", "Rachel"),
female = c(TRUE, FALSE, FALSE, TRUE)
)```
```  id     name female
1  1   Louisa   TRUE
2  2 Jonathan  FALSE
3  3    Luigi  FALSE
4  4   Rachel   TRUE```

The resulting data frame includes the column names needed to see the actual meaning of the different columns.

## Exercise: Creating Your First Data Frame

weekday temperature hot
Monday 28 FALSE
Tuesday 31 TRUE
Wednesday 25 FALSE

Let’s create a data frame as shown above using the `data.frame()` function. The resulting data frame should consist of the three columns `weekday`, `temperature` and `hot`:

1. The first column named `weekday` contains the weekday names `"Monday"`, `"Tuesday"`, `"Wednesday"`.
2. The second column named `temperature` contains the temperatures (in degrees Celsius) as `28`, `31`, `25`.
3. The third column named `hot` contains the logical values `FALSE`, `TRUE`, `FALSE`.

Store the final data frame in the variable `temp` and print its output to the console:

Start Exercise

```price <- c(28, 31, 25)
data.frame(
weekday = c("Monday", "Tuesday", "Wednesday", "Thursday"),
price = price,
expensive = price > 30
)```

Which statements are true about the data frame above?

• The `data.frame()` function will fail because the column `expensive` is no vector.
• The `data.frame()` function will not fail
• The `data.frame()` function fails because the lengths of the vectors are different
• The command would work if `weekday` had the values `c("Monday", "Tuesday", "Wednesday")`

Start Quiz

Build a data frame from vectors is an excerpt from the course Introduction to R, which is available for free at https://www.quantargo.com

VIEW FULL COURSE

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.