Tips & Tricks 3: Ordering Datasets Alphabetically

April 18, 2014
By

(This article was first published on geomorph, and kindly contributed to R-bloggers)

Today’s exercise is another nice and simple one, and allows you to get used to manipulating datasets in R.

Exercise 3 – How to reorder the dataset alphabetically by specimen name.

Say you have a 2D array dataset with specimen names (here species) as the row names of species data 
(I have jumbled up the species of geomorph’s data(plethspecies) for illustrative purposes)
                      [,1]         [,2]      [,3]        [,4]        [,5]        [,6]       [,7]       [,8]
P_cinereus       0.2170911 -0.000276374 0.2592660 -0.05280429 -0.01647032 -0.01611658 -0.2561081 -0.1222936
P_nettingi       0.2182337 -0.007565857 0.2550516 -0.07272572 -0.02249780 -0.02076196 -0.2448233 -0.1139709
P_hoffmani       0.2157877 -0.002494932 0.2538394 -0.05447151 -0.02241651 -0.01523319 -0.2502073 -0.1157858
P_virginia       0.2155726  0.001949692 0.2575607 -0.05119322 -0.03633579 -0.01912463 -0.2491259 -0.1158853
P_serratus       0.2086011  0.005599942 0.2474015 -0.05370812 -0.02172973 -0.01504136 -0.2538781 -0.1217033
P_electromorphus 0.2094443 -0.001604654 0.2502967 -0.05379854 -0.02843583 -0.01551027 -0.2536498 -0.1242497
P_shenandoah     0.2164737  0.003217350 0.2621750 -0.04501284 -0.01737607 -0.01863017 -0.2564934 -0.1198138
P_hubrichti      0.2154898 -0.000836868 0.2522478 -0.06473866 -0.03708750 -0.02215923 -0.2497272 -0.1155023

P_richmondi      0.2115516 -0.005225566 0.2499019 -0.06172998 -0.02749914 -0.01938572 -0.2553196 -0.1238423


and you want to order them alphabetically by name, this is simply:

> y <- y[order(rownames(y)),] # where y is a 2D array of your data

The equivalent for your 3D array is:
> Y <- Y[,,order(dimnames(Y)[[3]])] # where Y is a 3D array

Easy! In these two examples, the specimen names are in the data matrix itself. But what if you want to sort by other classifiers?

Using this construct, you can be smarter with ordering and sort by any other classifier, simply by adding in the classifying vector as x in Y[,,sort(x)]
for example,

> data(plethodon)
> Y <- plethodon$land
# the individual specimens in this dataset are in a random order. To sort the individuals by species names as given by the classifier $species:
> plethodon$species 
 [1] Jord  Jord  Jord  Jord  Jord  Jord  Jord  Jord  Jord  Jord  Teyah Teyah Teyah Teyah Teyah Teyah Teyah Teyah
[19] Teyah Teyah Jord  Jord  Jord  Jord  Jord  Jord  Jord  Jord  Jord  Jord  Teyah Teyah Teyah Teyah Teyah Teyah
[37] Teyah Teyah Teyah Teyah
Levels: Jord Teyah

> Y[,,order(plethodon$species)] # where Y is a 3D array

Manipulating datasets in R is challenging at first, but is easy once you know the tricks!

Emma

Update 19th April: The original version  of this post used the function sort() rather than order(). While they both came to the same conclusion in these examples, I should explain the difference between them, and why I changed them. sort() returns an ordered vector of the object within the parentheses, while order() returns the addresses of the elements within object if they would be ordered. order() therefore is more often used within Y[] than on its own.

To leave a comment for the author, please follow the link and comment on their blog: geomorph.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)