|This post was kindly contributed by Ecología estadística - go there to comment and to read the full post.|
- data types are integer, numeric (real numbers), logical (TRUE or FALSE), and character (alphanumeric strings)
- data frame is a table of data that combines vectors (columns) of different types (e.g.
character, factor, and numeric data). hybrid of two simpler data structures: lists, which can mix arbitrary types of data but have no other structure, and matrices, which have rows and columns but usually contain only one data type (typically numeric).
- Organización o forma
- stack and unstack are simple but basic functions — stack converts from wide to long format and unstack from long to wide; they aren’t
- reshape is very flexible and preserves more information than stack/unstack,
but its syntax is tricky: if long and wide are variables holding the
data in the examples above, then
- library(reshape): melt, cast, and recast functions, which are similar to reshape but sometimes easier to use
- ˆ Is there the right number of observations overall? Is there the right number of observations in each level for factors?
- Do the summaries of the numeric variables — mean, median, etc. — look reasonable? Are the minimum and maximum values about what you expected?
- Are there reasonable numbers of NAs in each column? If not (especially if you have extra mostly-NA columns), you may want to go back a few steps and look at using count.fields or ill=FALSE to identify rows with extra fields . . .
- str: tells you about the structure of an R variable
- class: prints out the class (numeric, factor, Date, logical,etc.) of a variable.
- head: prints out the beginning of a data frame;
- table: command for cross-tabulation
- NAs: identificarlos