Home News Downloads Examples & Tutorial FAQ Collaborations and citations How to cite Contact

Index

VaRank introduction
VaRank requirements
Input data
Output data
VaRank initial filtering
VaRank barcode
VaRank scoring


VaRank introduction

VaRank is a simple and powerful tool designed for variant ranking from next generation sequencing data. It provides a comprehensive workflow for annotating and ranking SNVs and indels.
If you are interested by Structural Variation (SV) ANNOTATION and RANKING, please go to the AnnotSV software homepage.

Five modules create the strength of this workflow:
- Variant call quality summary (total and variant depth of coverage, phred like information), to filter out false positive calls
- Alamut Batch or SnpEff variant annotations, to integrate genetic and predictive information (functional impact, putative effects in the protein coding regions, population frequency...) from different sources, using HGVS nomenclature
- Barcode representing the presence/absence of variants (with homozygote/heterozygote status), to search for recurrence between families or group of individuals
- Phenotype-driven analysis, to score genes overlapped with a SNV/indel on biological relevance to the individual phenotype
- Prioritization score, to rank variants according to their predicted pathogenic status.




VaRank results aims at reducing the daily work of clinical geneticists and molecular biologists and will help to accelerate the progress in identifying disease causing variants.

VaRank requirements

a- You will need VaRank sources. The Source code is available here under the GNU GPL licence.
b- VaRank can run on any architecture with a standard Tcl/Tk installation. You can freely download it here for any architecture (e.g. AIX, Linux, Mac OS X, Solaris and Windows).
c- VaRank relies on 2 possible annotation engines to extract most of the data and offers the ability to score each variant:

  • Alamut Batch (Interactive Biosoftware). You can request a free, 30-day trial of Alamut Batch here.
  • SnpEff (http://snpeff.sourceforge.net).

  • Optional:
    d- PolyPhen-2 provides prediction of functional effects of human SNPs. Depending on the annotation engine PPH2 either needs to be installed separately (Alamut Batch) or is already integrated (SnpEff). Nevertheless one can still have SnpEff installed and a local installation of PPH2.
    You can freely download it here


    Input data

    VaRank supports the commonly used VCF (Variant Call Format) input format for variants analysis that allows the program to be easily integrated into NGS bioinformatics analysis pipelines.


    Output data

    VaRank provides 4 tsv output files (TAB separated values files) divided into 2 categories:

    A part from these 2 categories, each file is also available in 2 versions:

    The description of the VaRank annotation columns is available in section 7 (“ANNOTATION COLUMNS”) of the README.VaRank_*.pdf.


    VaRank initial filtering

    The default filters remove variants:

    VaRank barcode

    VaRank introduces a barcode that allows a quick overview of the presence/absence status of each variant and their zygosity status within the analyzed individuals.
    Together with the barcode, simple counts on the individuals (homozygous, heterozygous and total allelic counts) are also added and can easily be used to further filter variants not yet reported in dbSNP but present in the user’s cohort.
    The combination of barcode and counts is an extremely powerful filtering strategy.



    A. The barcode represents the SNV’s zygosity status in an ordered list of samples.
    Samples homozygotes for the reference allele are represented using “0”, heterozygous variants are represented using “1” and homozygous variants are represented with “2”.

    B. Selected annotations from the VaRank output representing 3 SNVs from a single patient.
    The barcode gives an overview of the presence/absence for one SNV in all other patients analyzed.
    Together with this, the total counts of alleles are given in the last 4 columns.
    In a cohort of 32 samples, the variant in BBS2 is present in 31/32 samples at the homozygous state as one can see from the barcode or the relevant counts.
    The variant in ALMS1 (disease causing mutation) is present once at the homozygous state.

    C: The barcode can be specifically ordered and used in family analysis such as trio exome sequencing.
    On the left, homozygous mutations in a consanguineous family could be highlighted by the “121” barcode indicating homozygous variants (“2”) in the proband inherited from heterozygous parents (“1”).
    On the right denovo variants in the proband could be highlighted with the proposed barcode “010”.

    HPO scoring

    To score genes overlapped with a SNV/indel on biological relevance to the individual phenotype, VaRank rely on Exomiser (Smedley et al., 2015) and HPO (Köhler et al., 2019).

    For a given phenotype, a HPO-based score corresponding to a damaging probability is provided for each gene overlapped with an SNV/indel so that:

  • Genes previously associated with disease can be highlighted easily
  • Genes not previously associated with disease can be highlighted
  • Genes associated with diseases that have little or no similarity to the observed phenotypes can be removed along

  • HPO:
    VaRank uses the Human Phenotype Ontology (version reported in the VaRank output). Find out more at the Human Phenotype Ontology website.

    Please cite the 2 following articles if you use these phenotype data in your work:
  • Next-generation diagnostics and disease-gene discovery with the Exomiser. Smedley D., et al, Nature Protocols (2015) doi:10.1038/nprot.2015.124
  • Expansion of the Human Phenotype Ontology (HPO) knowledge base and resources. Köhler S., et al, Nucleic Acids Research (2019) doi:10.1093/nar/gky1105

  • VaRank scoring

    The description of the VaRank scores is detailed in section 5 (“SCORING”) of the README.VaRank_*.pdf.


    If you have any problem or question, please, feel free to contact us at jeanmuller@unistra.fr or veronique.geoffroy@inserm.fr