OMICtools


Home News Downloads Examples & Tutorial FAQ Collaboration and applications How to cite Contact

Index

VaRank introduction
VaRank requirements
Input data
Output data
VaRank initial filtering
VaRank barcode
VaRank scoring


VaRank introduction

VaRank is a simple and powerful tool designed for variant ranking from next generation sequencing data. It provides a comprehensive workflow for annotating and ranking SNVs and indels.
If you are interested in Structural Variation, which also play a key role in human diseases, please go to the dbSTAR project homepage.
If you are interested by Structural Variation (SV) ANNOTATION, please go to the AnnotSV software homepage.

Four modules create the strength of this workflow:
- Variant call quality summary (total and variant depth of coverage, phred like information), to filter out false positive calls.
- Alamut Batch or SnpEff variant annotations, to integrate genetic and predictive information (functional impact, putative effects in the protein coding regions, population frequency...) from different sources, using HGVS nomenclature.
- Barcode representing the presence/absence of variants (with homozygote/heterozygote status), to search for recurrence between families or group of individuals.
- Prioritization score, to rank variants according to their predicted pathogenic status.




VaRank results aims at reducing the daily work of clinical geneticists and molecular biologists and will help to accelerate the progress in identifying disease causing variants.

VaRank requirements

a- You will need VaRank sources. The Source code is available here under the GNU GPL licence.
b- VaRank can run on any architecture with a standard Tcl/Tk installation. You can freely download it here for any architecture (e.g. AIX, Linux, Mac OS X, Solaris and Windows).
c- VaRank relies on 2 possible annotation engines to extract most of the data and offers the ability to score each variant:

  • Alamut Batch (Interactive Biosoftware). You can request a free, 30-day trial of Alamut Batch here.
  • SnpEff (http://snpeff.sourceforge.net).

  • Optional:
    d- PolyPhen-2 provides prediction of functional effects of human SNPs. Depending on the annotation engine PPH2 either needs to be installed separately (Alamut Batch) or is already integrated (SnpEff). Nevertheless one can still have SnpEff installed and a local installation of PPH2.
    You can freely download it here


    Input data

    VaRank supports the commonly used VCF (Variant Call Format) input format for variants analysis that allows the program to be easily integrated into NGS bioinformatics analysis pipelines.


    Output data

    VaRank provides 4 tsv output files (TAB separated values files) divided into 2 categories:

    A part from these 2 categories, each file is also available in 2 versions:

    The description of the VaRank annotation columns is available in section 7 (“ANNOTATION COLUMNS”) of the README.VaRank_*.pdf.


    VaRank initial filtering

    The default filters remove variants:

    VaRank barcode

    VaRank introduces a barcode that allows a quick overview of the presence/absence status of each variant and their zygosity status within the analyzed individuals.
    Together with the barcode, simple counts on the individuals (homozygous, heterozygous and total allelic counts) are also added and can easily be used to further filter variants not yet reported in dbSNP but present in the user’s cohort.
    The combination of barcode and counts is an extremely powerful filtering strategy.



    A. The barcode represents the SNV’s zygosity status in an ordered list of samples.
    Samples homozygotes for the reference allele are represented using “0”, heterozygous variants are represented using “1” and homozygous variants are represented with “2”.

    B. Selected annotations from the VaRank output representing 3 SNVs from a single patient.
    The barcode gives an overview of the presence/absence for one SNV in all other patients analyzed.
    Together with this, the total counts of alleles are given in the last 4 columns.
    In a cohort of 32 samples, the variant in BBS2 is present in 31/32 samples at the homozygous state as one can see from the barcode or the relevant counts.
    The variant in ALMS1 (disease causing mutation) is present once at the homozygous state.

    C: The barcode can be specifically ordered and used in family analysis such as trio exome sequencing.
    On the left, homozygous mutations in a consanguineous family could be highlighted by the “121” barcode indicating homozygous variants (“2”) in the proband inherited from heterozygous parents (“1”).
    On the right denovo variants in the proband could be highlighted with the proposed barcode “010”.

    VaRank scoring

    The description of the VaRank scores is detailed in section 5 (“SCORING”) of the README.VaRank_*.pdf.


    If you have any problem or question, please, feel free to contact us at jeanmuller@unistra.fr or veronique.geoffroy@inserm.fr