R function to reverse and complement a DNA sequence

November 13, 2008

(This article was first published on Fabio Marroni's Blog » R, and kindly contributed to R-bloggers)

This post is intended for documentation only. I would like to remind everyone (me in first place!) that the comp() function of the (seqinr) package can complement a DNA sequence, and rev() function of Rbase can reverse a character vector. Using a combination of the two you can reverse, complement, and reverse complement sequences as well.

Complements (and eventually reverse) a DNA sequence, which has to be inserted as a character vector, no matter if lower or uppercase.
1) Cannot work with RNA, only DNA
2) Cannot reverse without complementing. You can complement and reverse complement, but not just reverse.Author Fabio Marroni (http://www.fabiomarroni.altervista.org/)
x:character vector, the DNA sequence.
rev: logical. If TRUE, the function will return the reverse complemente, if FALSE, it will return the complementary sequence. The default value is TRUE.

The complemented (and eventually reverse) sequence, as a character vector.

There are several web sites which can easily complement and reverse a DNA sequence (and RNA as well).
The advantage of using this piece of code is that it is possible to automatically reverse complement a series of sequences: I had several primers to reverse/complement and I didn’t want to copy and paste them every time. Only now I found a web site in which you can copy and paste the primers on different lines and get the reverse complement of each primer on a different lines. You may want to try it: http://arep.med.harvard.edu/cgi-bin/adnan/revcomp.pl.
However, the versatility of R allows you to automatically retrieve the reverse complement and (for example) save each of the primer in a different text file.
Also, there is a nice library in R (seqinr) which can reverse complement and perform several other tasks (http://cran.r-project.org/web/packages/seqinr/index.html).

Since my R programming skills are “limited”, comments and suggestions are welcome!

for (bbb in 1:nchar(x))
		if(xx[bbb]=="A") y[bbb]<-"T"		
		if(xx[bbb]=="C") y[bbb]<-"G"		
		if(xx[bbb]=="G") y[bbb]<-"C"		
		if(xx[bbb]=="T") y[bbb]<-"A"
	for(ccc in (1:nchar(x)))
		if(ccc==1) yy<-y[ccc] else yy<-paste(yy,y[ccc],sep="")
	for(ccc in (1:nchar(x)))
		if(ccc==1) yy<-zz[ccc] else yy<-paste(yy,zz[ccc],sep="")

Thanks to rhi for providing code for complementing without reversing. I paste it below.

if(bbb=="A") compString<-"T"
if(bbb=="C") compString<-"G"
if(bbb=="G") compString<-"C"
if(bbb=="T") compString<-"A"
if(!bbb %in% bases) compString<-"N"

To leave a comment for the author, please follow the link and comment on his blog: Fabio Marroni's Blog » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.