VaRank 1.0

VaRank is a program for genetic Variant Ranking from NGS data

Copyright (C) 2014 GEOFFROY Veronique, MULLER Jean

Please feel free to contact us for any suggestions or bug reports
email: veronique.geoffroy@inserm.fr; jeanmuller@unistra.fr

COMMAND LINE USAGE:
-------------------
$VARANK/bin/VaRank -vcfdir 'Path of your study directory containing your vcf input file' >& VaRank.log &

OPTIONS:
--------
-help			More information on the arguments

-vcfDir			Path of your study directory containing your vcf input file

-vcfInfo		To extract the info column from the .vcf file and insert the data in the outputfile (last columns).
                     	Range values: yes or no (default)

-rsfromvcf		To extract the rsID and validation status from the .vcf file and insert this in the outputfile.
                     	Range values: yes or no (default)

-nowebsearch		To allow or not the access to the web for downloading the fasta sequences for missed proteins in UniProt and/or RefSeq (only suitable when used with PolyPhen-2). It is to notice that the search can be very time consuming since getting sequences one by one.
                     	Range values: yes (default) or no

-Homstatus    	     	To force the determination of the homozygous or heterozygous state of one variation. If set to yes it will use the Homcutoff value to decide.
                     	Range values: yes or no (default)

-Homcutoff    	     	To determine the homozygous or heterozygous state of one variation. If set to some value it will force to reconsider the data provided.
                     	Range values: [0,100] default: 80 (active only if Homstatus=yes or when no status is given)

-MEScutoff    	     	MaxEntScan cutoff, to determine the impact of the variant on splicing. Expressed as the % difference between the variant and the WT score.
                     	Range values: [-100,0], default: -15

-SSFcutoff       	Splice Site Finder cutoff, to determine the impact of the variant on splicing. Expressed as the % difference between the variant and the WT score.
                     	Range values: [-100,0], default: -5

-NNScutoff       	NNSplice cutoff, to determine the impact of the variant on splicing. Expressed as the % difference between the variant and the WT score.
                     	Range values: [-100,0], default: -10

-phastConsCutoff     	To determine when a genomic position is conserved or not. Above the cutoff is considered as conserved.
                     	Range values: [0,1], default: 0.95

-readFilter          	Minimum number of reads for the variants
                     	Range values: [0,-], default: 10

-depthFilter         	Minimum depth for the variants
                     	Range values: [0,-], default: 10

-readPercentFilter   	Minimum percent of variant reads for considering a variant
                     	Range values: [0,100], default: 10

-freqFilter          	Filtering variants based on their MAF in the SNV databases (dbsnp and EVS)
                     	Range values: [0.0,1.0], default: 0.01

-rsFilter            	Filtering variants on the SNP informations
                     	Values: removeNonPathoRS (remove variants without "probable-pathogenic" or "pathogenic" annotation, see clinical significance field in dbSNP website. Filtering only for variants with at least 2 validations.)
                             none = keep all variants, no filtering on rsID
                     	Default: removeNonPathoRS

-extann              	Tab separated file containing annotation to add to the final output files. Restrictions for the format are: 1st line is a header, 1st column is the gene name
                     	Typical use would be a gene file containing specific annotations such as transmission mode, disease, expression...

-metrics             	Changing numerical values from frequencies to us or fr metrics (ex: 0.2 or 0,2)
                     	Range values: us (default) or fr

-DB          		Changes the directory where the UniProt and Refseq files are stored (optional, only use if PPH2 is installed)
                     	Ex: $VARANK/Databases (default)

-uniprot             	Name of the UniProt sequence file (optional, only use if PPH2 is installed)
                     	Ex: HUMAN.fasta.gz (default)

-refseq              	Name of the RefSeq sequence file (optional, only use if PPH2 is installed)
                     	Ex: human.protein.faa.gz (default)

-hgmdUser            	HGMD User login (optional, only use if you have an HGMD license)

-hgmdPasswd          	HGMD User password (optional, only use if you have an HGMD license)


The following options are provided to allow the user to modify the VaRank score corresponding to each category defined by the program:
-S_Known     		Known mutation as annotated by HGMD and/or dbSNP (rsClinicalSignificance="pathogenic/probable-pathogenic").
		     	Default: 110

-S_Nonsense          	A single-base substitution in DNA resulting in a STOP codon (TGA, TAA or TAG).
		     	default: 100

-S_Fs            	Exonic insertion/deletion of a non-multiple of 3bp resulting often in a premature stop in the reading frame of the gene.
		  	default: 100

-S_EssentialSplice  	Mutation in one of the canonical splice sites resulting in a significant effect on splicing (at least 2 out of the 3 programs indicate a relative variation in their score compared to the wild type sequence)
			default: 90

-S_StartLoss         	Mutation leading to the loss of the initiation codon (Met).
		     	default: 80

-S_StopLoss       	Mutation leading to the loss of the STOP codon.
			default: 80

-S_CloseSplice       Mutation outside of the canonical splice sites (donor site is -3 to +6', acceptor site -12 to +2) resulting in a significant effect on splicing (at least 2 out of the 3 programs indicate a relative variation in their score compared to the wild type sequence).
		     	default: 70

-S_Missense          	A single-base substitution in DNA not resulting in a change in the amino acid.
		     	default: 50

-S_Inframe           	Exonic insertion/deletion of a multiple of 3bp.
		     	default: 40

-S_DeepSplice        	Intronic mutation resulting in a significant effect on splicing (at least 2 out of the 3 programs indicate a relative variation in their score compared to the wild type sequence).
		     	default: 25

-S_Synonymous        	A single-base substitution in DNA not resulting in a change in the amino acid.
		     	default: 10


