Count different positions between two strings of equal length

[This article was first published on Fabio Marroni's Blog » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This is another pretty simple function, written to help me solve the simplest representation of a trivial but tedious task. Most biologist are probably familiar with this task. How many nucleotide differences exist between two given sequences? I only faced the easiest part of the problem, i.e. I do not perform alignment, I just assume that the sequence have a perfect 1 to 1 correspndence, i.e. position 1 of sequence 1 is position 1 of sequence 2, and so on.
The problem is basically to compare two strings of equal length and count the number of positions in which they differ. To facilitate analysis of DNA sequences, I gave the opportunity to ignore case (so that an “a” is equivalent to “A”) and to ignore positions carrying special characters, such as “N” or “?”.

Input:
a: first DNA sequence (or string)
b: second DNA sequence (or string)
exclude: character (or vector of characters) to be excluded from comparison
ignore.case: logical. If TRUE consider “a” equal to “A”. If FALSE consider “a” different from “A”.

Output:
differences: Number of differences between the two DNA sequences (or strings)

As usual, feedback welcome!

Thanks to Larry for suggesting a nice improvement! It is now incorporated in the function.

string.diff.ex<-function(a="ATTCGAN",b="attTGTT",exclude=c("n","N","?"),ignore.case=TRUE)
{
if(nchar(a)!=nchar(b)) stop("Lengths of input strings differ. Please check your input.")
if(ignore.case==TRUE)
{
a<-toupper(a)
b<-toupper(b)
}
diff.a<-unlist(strsplit(a,split=""))
diff.b<-unlist(strsplit(b,split=""))
diff.d<-rbind(diff.a,diff.b)
for(ex.loop in 1:length(exclude))
{
diff.d<-diff.d[,!(diff.d[1,]==exclude[ex.loop]|diff.d[2,]==exclude[ex.loop])]
}
differences<-sum(diff.d[1,]!=diff.d[2,])
return(differences)
}

To leave a comment for the author, please follow the link and comment on their blog: Fabio Marroni's Blog » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)