A standard procedure in many areas of bioinformatics is to use a multiple sequence alignment (MSA) as the basis for various types of homology-based inference. Applications include 3D structure modelling, protein functional annotation, prediction of molecular interactions, etc. These applications, however sophisticated, are generally highly sensitive to the alignment used, and neglecting non-homologous or uncertainty in the alignment can lead to significant bias in the subsequent inferences.

LEON-BIS is a new method that uses a robust Bayesian framework to estimate the homologous relations between sequences in a protein multiple alignment. Sequences are clustered into sub-families and relations are predicted at different levels, including conserved 'core blocks', 'regions' and the full-length proteins. The accuracy and reliability of the predictions has been demonstrated in large-scale comparisons using well annotated alignment databases, where the homologous sequence sections were detected with very high sensitivity and specificity.

LEON-BIS can be used to distinguish sections in multiple sequence alignments that are conserved across the whole family or within subfamilies, and should be useful for automatic, high-throughput genome annotations, 2D/3D structure predictions, protein-protein interaction predictions etc.

Download an archive of the test set of protein sequences.

Download an archive of the source code.

If you have any problem or question, please, feel free to contact us at thompson@unistra.fr