Subsections

2 Introduction

2.1 Context

Proteins, one of the fundamental building blocks of life, can be classified into various hierarchical categories based on their structural and functional similarities. This classification helps scientists understand protein evolution, function, and relationships. The concept of protein family has been established in the 70's where few protein sequences and structures were known and most of them were small and constituted of a single domain. Since then, the massive increase of protein 3D structures and sequences led to more subtle definitions, like super-family or sub-family organizations.

This introduces a granularity in the protein family concept, providing several scales for analysis that allow for the identification of the zones or residues responsible for this granularity.

2.2 Ordalie

Ordalie (ORDered ALignment Information Explorer) is an interactive tool designed for the exploration of the informational content of a Multiple Sequence Alignment (MSA) into a hierarchical manner, and within different contexts, such as phylogeny or 3D structure.

Figure 1: Diagram of the Ordalie philosophy
Image ordalie_philosophy

The Ordalie philosophy (see fig. 1) resides in its capacity to perform a concomitant multi-scale analysis across three axes: the amino acids sequence axis, the taxa axis, and the contexts axis.
The information distributed along the amino acid sequence (represented by the horizontal axis in Figure 1) can be analyzed across several scales:

The taxonomic depth of the study constitutes another key axis of analysis. The MSA can be exploited at three main scales:

The third analytical axis focuses on the diverse computational contexts integrated within Ordalie, ranging from residue conservation analysis and phylogenetic tree rendering to the mapping of external features and 3D structure visualization. A key strength of the platform is that all these analyses are unified within a structural framework, allowing sequence-based features to be spatially mapped and compared directly onto the 3D structures included in the alignment.

As a conclusion of this short introduction, Ordalie provides a holistic framework where cross-comparing data from different analytical dimensions becomes seamless. Whether adjusting the taxonomic scale to explore broad evolutionary patterns or zooming in on taxon-specific features, this integrative approach is essential for unraveling the multi-faceted relationships governing the sequence-structure-function-evolution paradigm.

2.3 Database and Snapshot Management

At the core of Ordalie lies a dedicated SQLite database engine. This architecture ensures high-performance data handling and persistent storage for both system-wide configurations (colors, thresholds, and default values) and alignment-specific data (sequences, annotations, and biological features).

One of Ordalie's most powerful features is its ability to manage multiple analytical iterations through a versioning system:

The Database:
Acts as the central repository, ensuring data integrity across sessions. By using the native .ord file format, all project metadata is bundled into a single, portable relational database.
Snapshots:
A snapshot represents a discrete state of the analysis at a given time. It captures a specific clustering configuration or a particular alignment variation. This allow users to:

Snapshots are managed via the dedicated Snapshot Bar (see section 3.5.2), providing a seamless way to switch between different views of the same biological dataset.