Help

PROBE is a fully automatic method for the analysis of protein family conservation and the identification of conserved regions within a protein multiple sequence alignment.

Input

Sequences can be input to the web server in two different formats:

(i) if a FASTA file is uploaded, the conserved sequence blocks will be calculated automatically.

(ii) the results of a previous PROBE analysis can also be uploaded, in which case the blocks in the uploaded XML file will be displayed.

In both cases, the input alignment must contain PROTEIN sequences (not DNA/RNA). There is no other restriction on the type of sequences: the family alignment can contain both orthologs and paralogs, and both full-length or truncated sequences. However, the accuracy of the alignment is important since misaligned regions will not be identified as conserved blocks.

A phylogenetic tree is also needed for visualization purposes only. This tree can either be uploaded by the user, or if no tree is provided, one will be calculated automatically.

Calculation

If the user uploads a FASTA file,

(i) Subfamilies in the alignment are calculated using the Secator method (Secator : A Program for Inferring Protein Subfamilies from Phylogenetic Trees Molecular Biology Evolution 2001 18:1435-1441). Errors in the Secator method may lead to errors in the discovery of conserved sequence blocks. If this happens, it is possible to download the source code of PROBE and use a different sequence clustering algorithm (see Download page).

(ii) Sequence blocks are then identified corresponding to regions that are conserved in either the whole alignment or in specific subfamilies using the LEON-BIS method (Vanhoutreve et al., LEON-BIS: multiple alignment evaluation of sequence neighbours using a Bayesian inference system. BMC Bioinformatics 2016 17:271).

(iii) A phylogenetic tree for visualization is calculated automatically using the BioNJ method (Gascuel O. BIONJ: an improved version of the NJ algorithm based on a simple model of sequence data. Molecular Biology and Evolution. 1997 14:685-695).

(iv) Finally, the sequences in the alignment are re-ordered to match the order of the phylogenetic tree. This re-ordering does not affect the sequence alignment in any way.

Results

When the calculation of the conserved blocks is complete, a new page is displayed allowing the user to either save the PROBE results in an XML format file, or visualize the results in the web browser.

The visualization page is divided into two sections :

(i) a phylogenetic tree is displayed with the conserved blocks organized in a matrix layout, where each row of the matrix represents a sequence and each column represents a conserved block. A color is assigned arbitrarily to each column (block) to facilitate the visualization. The tree is displayed using a modified version of the jsphylosvg javascript library. The modified version is available here.

(ii) the multiple alignment is displayed with the conserved showed as colored features on the corresponding sequences. The multiple sequence alignment is displayed using a modified version of the MSAViewer javascript library. The modified version is available here.