# Count different positions between two strings of equal length

November 26, 2011
By

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This is another pretty simple function, written to help me solve the simplest representation of a trivial but tedious task. Most biologist are probably familiar with this task. How many nucleotide differences exist between two given sequences? I only faced the easiest part of the problem, i.e. I do not perform alignment, I just assume that the sequence have a perfect 1 to 1 correspndence, i.e. position 1 of sequence 1 is position 1 of sequence 2, and so on.
The problem is basically to compare two strings of equal length and count the number of positions in which they differ. To facilitate analysis of DNA sequences, I gave the opportunity to ignore case (so that an “a” is equivalent to “A”) and to ignore positions carrying special characters, such as “N” or “?”.

Input:
a: first DNA sequence (or string)
b: second DNA sequence (or string)
exclude: character (or vector of characters) to be excluded from comparison
ignore.case: logical. If TRUE consider “a” equal to “A”. If FALSE consider “a” different from “A”.

Output:
differences: Number of differences between the two DNA sequences (or strings)

As usual, feedback welcome!

Thanks to Larry for suggesting a nice improvement! It is now incorporated in the function.

```string.diff.ex<-function(a="ATTCGAN",b="attTGTT",exclude=c("n","N","?"),ignore.case=TRUE)
{
if(ignore.case==TRUE)
{
a<-toupper(a)
b<-toupper(b)
}
diff.a<-unlist(strsplit(a,split=""))
diff.b<-unlist(strsplit(b,split=""))
diff.d<-rbind(diff.a,diff.b)
for(ex.loop in 1:length(exclude))
{
diff.d<-diff.d[,!(diff.d[1,]==exclude[ex.loop]|diff.d[2,]==exclude[ex.loop])]
}
differences<-sum(diff.d[1,]!=diff.d[2,])
return(differences)
}
```

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.