Subsections

2 Introduction

2.1 Context

Proteins, one of the fundamental building blocks of life, can be classified into various hierarchical categories based on their structural and functional similarities. This classification helps scientists understand protein evolution, function, and relationships. The concept of protein family has been established in the 70's where few protein sequences and structures were known and most of them were small and constituted of a single domain. Since then, the massive increase of protein 3D structures and sequences led to more subtle definitions, like super-family or sub-family organizations.

This introduces a granularity in the protein family concept, providing several scales for analysis that allow for the identification of the zones or residues responsible for this granularity.

2.2 Ordalie

Ordalie (ORDered ALignment Information Explorer) is an interactive tool designed for the exploration of the informational content of a Multiple Sequence Alignment (MSA) into a hierarchical manner, and within different contexts, such as phylogeny or 3D structure.

Figure 1: Diagram of the Ordalie philosophy
Image ordalie_philosophy

The Ordalie philosophy (see fig. 1) resides in its capacity to perform a concomitant multi-scale analysis across three axes: the amino acids sequence axis, the taxa axis, and the contexts axis.

The information running along the amino acid sequence (horizontal axis in figure 1) can be considered according to several scales:

Another analysis axis resides in the way the different taxa present in the alignment are handled. The study can be done at a global level (all taxa) to characterize the whole family through different features, such as conserved motifs or key signature, it can also be done on a particular taxon to identify and specify point mutation positions, or at an intermediary level to study the features allowing sub-family identification, such as differentially conserved residues between the sub-family and the other taxa.

As a third analysis axis, Ordalie embeds tools allowing different analysis contexts: residue conservation computation, phylogenetic tree computation and rendering, external features mapping, a 3D structure viewer, etc. ... All analyses can be done in a structural context, as all available features can be mapped and compared on the available 3D structures present in the alignment.

As a conclusion of this short introduction, the strength of Ordalie for a protein family analysis resides in the cross-comparison of all information seen in different contexts and at different scales. By adjusting the coarseness of the scale (all taxa, a subgroup of taxa, or a taxon alone for example), the resulting information will help in deciphering different aspects of the sequence - structure - function - evolution relationships for the protein family under study.