Count different positions between two strings of equal length

November 26, 2011

(This article was first published on Fabio Marroni's Blog » R, and kindly contributed to R-bloggers)

This is another pretty simple function, written to help me solve the simplest representation of a trivial but tedious task. Most biologist are probably familiar with this task. How many nucleotide differences exist between two given sequences? I only faced the easiest part of the problem, i.e. I do not perform alignment, I just assume that the sequence have a perfect 1 to 1 correspndence, i.e. position 1 of sequence 1 is position 1 of sequence 2, and so on.
The problem is basically to compare two strings of equal length and count the number of positions in which they differ. To facilitate analysis of DNA sequences, I gave the opportunity to ignore case (so that an “a” is equivalent to “A”) and to ignore positions carrying special characters, such as “N” or “?”.

a: first DNA sequence (or string)
b: second DNA sequence (or string)
exclude: character (or vector of characters) to be excluded from comparison logical. If TRUE consider “a” equal to “A”. If FALSE consider “a” different from “A”.

differences: Number of differences between the two DNA sequences (or strings)

As usual, feedback welcome!

Thanks to Larry for suggesting a nice improvement! It is now incorporated in the function.

if(nchar(a)!=nchar(b)) stop("Lengths of input strings differ. Please check your input.")
for(ex.loop in 1:length(exclude))

To leave a comment for the author, please follow the link and comment on their blog: Fabio Marroni's Blog » R. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.


Mango solutions

RStudio homepage

Zero Inflated Models and Generalized Linear Mixed Models with R

Dommino data lab

Quantide: statistical consulting and training



CRC R books series

Six Sigma Online Training

Contact us if you wish to help support R-bloggers, and place your banner here.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)