In R, single column
data.frames are often converted to vectors when manipulated. For example:
d <- data.frame(x = seq_len(3)) print(d) #> x #> 1 1 #> 2 2 #> 3 3
# not a data frame! d[order(-d$x), ] #>  3 2 1
We were merely trying to re-order the rows and the result was converted to a vector. This happened because the rules for
[ , ] change if there is only one result column. This happens even if the there had been only one input column. Another example is:
d[,] is also vector in this case.
The issue is: if we are writing re-usable code we are often programming before we know complete contents of a variable or argument. For a
data.frame named “
g” supplied as an argument:
g[vec, ] can be a
data.frame or a
vector (or even possibly a
list). However we do know if
g is a
g[vec, , drop = FALSE] is also a
vec is a vector of valid row indices or a
logical vector, note:
NA induces some special cases).
We care as
data.frames have different semantics, so are not fully substitutable in later code.
The fix is to include
drop = FALSE as a third argument to
[ , ].
# is a data frame. d[order(-d$x), , drop = FALSE] #> x #> 3 3 #> 2 2 #> 1 1
To pull out a column I suggest using one of the many good extraction notations (all using the fact a
data.frame is officially a list of columns):
d[["x"]] #>  1 2 3 d$x #>  1 2 3 d[] #>  1 2 3
My overall advice is: get in the habit of including
drop = FALSE when working with
[ , ] and
data.frames. I say do this even when it is obvious that the result does in fact have more than one column.
For example write “
mtcars[, c("mpg", "cyl"), drop = FALSE]” instead of “
mtcars[, c("mpg", "cyl")]“. It is clear that for
data.frames both forms should work the same (either selecting a data frame with two columns, or throwing an error if we have mentioned a non existent column). But longer
drop = FALSE form is safer (go further towards ensuring type stable code) and more importantly documents intent (that you wanted a
One can also try base::subset(), as it has non-dropping defaults.