Fast_statistical_alignment

FSA
Developer(s)	Robert Bradley (UC Berkeley), Colin Dewey (UW Madison), Lior Pachter (UC Berkeley)
Stable release	1.5.2
Operating system	UNIX, Linux, Mac
Type	Bioinformatics tool
Licence	Open source

Fast statistical alignment

Add article description

FSA is a multiple sequence alignment program for aligning many proteins or RNAs or long genomic DNA sequences. Along with MUSCLE and MAFFT, FSA is one of the few sequence alignment programs which can align datasets of hundreds or thousands of sequences. FSA uses a different optimization criterion which allows it to more reliably identify non-homologous sequences than these other programs, although this increased accuracy comes at the cost of decreased speed.

Quick Facts Developer(s), Stable release ...

FSA is currently being used for projects including sequencing new worm genomes and analyzing in vivo transcription factor binding in flies.

Algorithm

The algorithm for the aligning of the input sequences has 4 core components.

Pair Hidden Markov Model for generating posterior probabilities

The algorithm starts first by determining posterior probabilities of alignment $\mathbb {P} (A|X,Y)$ between any two random sequences from the pool of sequences being aligned. The posterior probabilities for each column reinforce the prediction of alignment probability between a sequence pair and also filter out columns that can be unreliably aligned. These probabilities also allow for the prediction and estimate of homology between any sequence pair. A standard five-state pair hidden Markov model (Pair HMM) is used to determine these posterior probabilities of alignment for any two input sequences. The Pair HMM model uses two sets of Delete (D) and Insert (I) states to account for symbol deletion and insertions between two aligned sequences, but it can also have three states without a significant loss of accuracy.

Since the number of pairwise comparisons needed to determine the posterior probability distributions of any two pairs of sequences is computationally expensive and quadratic in the amount of sequences that are being aligned, it is decreased by using a randomized approach inspired by the Erdos-Renyi theory of random graphs. This significantly reduces the runtimes of datasets and the computational cost of running the multiple alignments.

Share this article:

This article uses material from the Wikipedia article Fast_statistical_alignment, and is written by contributors. Text is available under a CC BY-SA 4.0 International License; additional terms may apply. Images, videos and audio are available under their respective licenses.

Fast_statistical_alignment

Fast statistical alignment

Input/Output

Algorithm

Pair Hidden Markov Model for generating posterior probabilities

Merging Probabilities

Sequence Annealing

Ordering of the alignment

Parallelization

Visualization

Comparisons to other alignment programs

References

External links

Share this article: