[Welcome to the homepage of Hoan Nguyen]
Dr Hoan Nguyen
Integrated
structural Biology, (IGBMC)
Phone: +33 753481806
Fax: +33 3 88 65 32 01
Email: nguyen@igbmc.fr,
bmhoan@gmail.com
Summary
·
A bioinformatics and data scientist
with over many years of extensive experience in integrative bioinformatics dedicated
for understand
and interpret the molecular consequences of mutations involved in human disease.
·
A data infrastructure and
workflow specialist (next sequencing generation and Protein family analysis) dedicated for computational biology and science
·
Ability
to multitask with strong organization/management, planning and problem solving
skill.
Keywords: NGS, SNP, SNV, Mutation Interpretation and Prediction; Gene/Variant
Ranking, Complex data fusion; Large Data Management
[Experience]
Since 2006: Data
scientist/bioinformatics, Integrated structural Biology, Institute of
Genetics and Molecular and Cellular Biology (IGBMC), Strasbourg, France
I am working in bioinformatics and data scientist for
understanding and predicting the effects of disease-related mutations on function and
protein structure at Department of
Integrated Structural Biology, Institute of Genetics and Molecular and
Cellular Biology (IGBMC), Illkirch, France.
I am responsible
for development of SM2PH-Central
(from Structural Mutation to Pathology Phenotypes in Human) knowledgebase which
is a transversal data and computational infrastructure to better understand and
describe the networks of causality linking a particular phenotype, and one or
various genes or networks. SM2PH-Central also provides access to systematic
annotation tools, including sequence database searches, multiple alignment and
3D model exploitation, physico-chemical, functional,
structural and evolutionary characterizations of variants/SNPs.
In this framework, i developed numerous tools (KD4v prediction, http://decrypthon.igbmc.fr/kd4v/cgi-bin/home
and MSV3d database http://decrypthon.igbmc.fr/msv3d/cgi-bin/home)
to characterize the structural and functional impact of mutations as well as
various types of data linking genotype to phenotype or gene prioritization via
many standard web services. I used Inductive Logic Programming (ILP) to
automatically extract prolog rules characterizing deleterious mutations that
can be biologically interpreted interpretable by biologists and thus can guide
human expert discovery of new correlations between mutation, sequence/3D
structure and phenotypic severity.
My work have been demonstrated
and applied in a number of recent studies devoted to specific human diseases,
including common multifactorial diseases (Age-related
Macular Disease (
AMD Gene Consortium), complete
congenital stationary night blindness (Zeitz et al., 2013; Audo
et al. 2013; Audo
2014)).
Main tasks:
-Development of
numerous tools and methods to characterize the structural and functional impact
of missense mutations through the new KD4v prediction and MSV3d database
associated as well as various types of data linking genotype to phenotype via
many standard web services.
-Design of new cloud based pipeline for Protein family and 3D-structure
analysis (http://decrypthon.igbmc.fr/neopipe/)
-Design of the new
system for heterogeneous data integration with high level biological query
language (IBM
Res. and Dev. Journal).
-Manager of
development of integrated computational infrastructure (SM2PH-Centtral)
-Supervision of 1
PhD, 2 engineers and 10 Master students
2002-2005: PhD Student/Software Engineer at
Observatories Astronomiques de Strasbourg, France
[Education]
2006:PhD in Large Scale Data Management, Strasbourg Astronomical Observatory ,University of Strasbourg,
France
2002:Master in Computational
Science and Applied mathematics, University of La Rochelle, France
1997:B.S in Mathematics
and Computer Sciences, University of Hue, Vietnam
[Bioinformatics and NGS skill]
NGS data analysis: Partek (RNA-seq), FastQC, SNAP,SAMtools, BWA, GATK, Tophat, Annovar
Structural characterization of mutant and homology search: I-Mutant,
CSU, Modeler
Mutation prediction: Polyhen-2, SIFT, KD4v,VEP-Variant
Effect Predictor
Protein family Analyses:
Blast, DbClustal,Mafft, Macsims , TCofee, kalign, Leon
Databases and repositories: Ensembl, UCSC/EnCode, Uniprot, Pfam, Genbank, PDB,1000g, EVS, dbSNP,ClinVar
Ontology and phenotype: GO, GOA, David database, HPO
Biological networks analysis: KEGG, Stringdb, Cytoscape JS
Libraries and API: NCBI API, Ensembl API, R, BioPerl, BioJava, BioPython.
[Statistical analysis and Machine Learning skill]
Statistical
learning: Basic statistical analyses, PCA, Bayesian inference.
Machine learning:
Support Vector Machine, K-Mean, Decision tree (Random Forest, J48),
SOM-Self Organizing Map (Kohonen neural network)
Logic learning: Inductive Logic Programming, Logic
Programming.
Tools: R/BioConductor, Java-R, Python-R, Weka, IBM
Intelligent, Prolog/ALEPH,
[Computer Science skill]
Language and
methods: Prolog, Java/J2EE, Python, Perl,
C++, PHP.
Semantics
integration: OWL, RDF, Logic Programming.
Database design and implementation: IBM DB2, Mysql, Postgresql, Jdbc, odbc, MongoDB
Data-mining:
IBM Cognos, WEKA, R, Aleph/Prolog, SVM-Light,
SVMPACK.
Web application
development: Tomcat, IBM WAS, Uportal, JBOSS, Python, PHP. SOAP,
REST, JBOSS jBPM, HTML5, Ajax, Json, JSP/Servlet.
Software Architecture: UML, String Framework, JBOSS jBPM, IBM RAD
Distributed
computing and Big Data Architecture: IBM InforSphere,
Hadoop MapReduce, NoSQL (HBase, MongoDB), Qsub.
Project and database manager. Good analytical
skills and perfectly able to work under stress.
[Publications]
Publications in Journal
1.
Carlos Bermejo-Das-Neves, Hoan Nguyen, Olivier Poch
and Julie D Thompson. (2014) A comprehensive study
of small non-frameshift insertions/deletions in
proteins and prediction of their phenotypic effects by a machine learning
method (KD4i), BMC
Bioinformatics.
2. Nguyen H, Laurent M., Thompson JD, Poch O (2014). Heterogeneous Biological Data Integration
with High Level Query Language. Ibm Journal of Research and Development, vol. 58 no. 2/3 , April 1, 2014.
3.
Nguyen H, Luu TD, Poch O, Thompson JD.
(2013) Knowledge Discovery from a Variant
Database using Inductive Logic Programming. Bioinformatics and Biology
Insights.
4.
Luu TD, Rusu AM, Walter V, Linard B, Poidevin L, Ripp R, Moulinier L, Muller J, Raffelsberger
W, Wicker N, Lecompte O, Thompson JD, Poch O, Nguyen H.
(2012). KD4v: Comprehensible Knowledge
Discovery System For Missense Variant. Nucleic
Acids Res; W71-75
5.
Luu TD, Rusu AM, Walter V, Ripp R, Moulinier L, Muller J, Toursel T,
Thompson JD, Poch O, Nguyen H. (2012). MSV3d:
database of human MisSense variants mapped to 3D
protein structure. Database (Oxford); bas018.
6.
Linard B, Nguyen H, Prosdocimi
F, Poch O, Thompson JD (2012). EvoluCode: evolutionary barcodes as a
unifying framework for multilevel evolutionary data. Evol.
Bioinform Online; 8:61-77
7.
Zeitz C, Jacobson SG,
Hamel CP, Bujakowska K, Orhan
E, Zanlonghi X, Lancelot ME, Michiels
C, Schwartz SB, Bocquet B, CSNB consortium, Antonio
A, Audier C, Letexier M, Saraiva JP, Luu TD, Sennlaub F, Nguyen H,
Poch O, Dollfus H, Lecompte O, Kohl S, Sahel JA, Bhattacharya SS, Audo I. (2013) Whole exome sequencing identifies mutations in LRIT3 as a cause
for autosomal recessive complete congenital stationary night blindness. Am J Hum Genet;
8.
Audo I, Bujakowska
K, Orhan E, Sennlaub F, Guillonneau
X, Antonio A, Michiels C, Lancelot ME, Letexier M, Saraiva JP;Nguyen H, Luu TD, Leveillard
T, Poch O, Paques M, Saddek
MS, Bhattacharya S, Sahel JA, Zeitz C. (2013) The familial dementia gene
revisited: whole exome sequencing identifies a missense mutation in ITM2B underlying a novel autosomal dominant retinal dystrophy in a large family. Hum Mol Genet. 2014 Jan 15;23(2):491-501
9.
Audo I, Bujakowska K, Orhan E, Poloschek CM, Defoort-Dhellemmes
S, Drumare I, Kohl S, Luu
TD, Lecompte O, Zrenner E,
Lancelot ME, Antonio A, Germain A, Michiels C, Audier C, Letexier M, Saraiva JP, Leroy BP,
Munier FL, Mohand-Said S,
Lorenz B, Friedburg C, Preising
M, Kellner U, Renner AB, Moskova-Doumanova
V, Berger W, Wissinger B, Hamel CP, Schorderet DF, De Baere E, Sharon
D, Banin E, Jacobson SG, Bonneau
D, Zanlonghi X, Le Meur G, Casteels I, Koenekoop R, Long VW,
Meire F, Prescott K, de Ravel T, Simmons I, Nguyen H, Dollfus
H, Poch O, Leveillard T,
Nguyen-Ba-Charvet K, Sahel
JA, Bhattacharya SS, Zeitz C. (2012). Whole-exome
sequencing identifies mutations in GPR179 leading to autosomal-recessive
complete congenital stationary night blindness. Am J Hum Genet; 90:
321-330.
10. Nguyen H, Wicker N, Kieffer D, Poch O. (2010) A new projection method for biological
semantic map generation. J. Biomedical Science and Engineering; 3:13-19.
11. Friedrich A, Garnier N, Gagniere N, Nguyen H, Albou
LP, Biancalana V, O, Muller J, Moras
D, Mandel JT, Toursel T, Moulinier
L, Poch O. (2009) SM2PH-db:
an interactive system for the integrated analysis of phenotypic consequences of
missense mutations in proteins involved in human genetic diseases. Hum Mutat. 31: 127-135
12. Bard N, Bolze R, Caron E, Desprez F, Heymann M, Friedrich A, Moulinier
L, Nguyen NH, Poch
O, Toursel T.(2010): Decrypthon grid - grid resources dedicated to
neuromuscular disorders. Stud Health Technol
Inform 2010, 159:124-133.
Publication
related to my PhD framework
13. Michel
L., Motch C., Nguyen
H., Pineau FX., Building
an Archive with Saada (2014). Astronomy and
Computing. 09/2014
14. Michel L, Motch C, Nguyen H, Pineau FX. (2009) A Guided Tour of Saada.
Astronomical Data
Analysis Software and Systems XVIII ASP Conference Series, Vol. 411,
proceedings of the conference held 2-5 November 2008 at Hotel Loews Le
Concorde, Québec City, QC, Canada. Edited by David A. Bohlender,
Daniel Durand, and Patrick Dowler. San Francisco:
Astronomical Society of the Pacific, 2009., p.563. http://adsabs.harvard.edu/abs/2009ASPC..411..563M.
15. Michel L, Motch C, Pineau FX, Nguyen H. (2010) Building Astronomical Databases with Saada
(Update 2010). Astronomical
Data Analysis Software and Systems XIX. Proceedings of a conference held
October 4-8, 2009 in Sapporo, Japan. Edited by Yoshihiko Mizumoto,
Koh-Ichiro Morita, and Masatoshi Ohishi.
ASP Conference Series, Vol. 434. San Francisco: Astronomical Society of the
Pacific, 2010. p.49. http://adsabs.harvard.edu/abs/2010ASPC..434..491M.
16. Nguyen H, Michel L, Motch C. (2006). Building an Astronomical Database with Saada, Astronomical Data Analysis Software and Systems XV ASP Conference
Series, Vol. 351, Proceedings of the Conference Held 2-5 October 2005 in San
Lorenzo de El Escorial, Spain. Edited by Carlos Gabriel, Christophe Arviset, Daniel Ponz, and Enrique
Solano. San Francisco: Astronomical Society of the Pacific, 2006.,
p.15. http://adsabs.harvard.edu/abs/2006ASPC..351...15N.
17. Michel L; Nguyen H; Motch
C, How to Publish Local Data Into the VO
with Saada, (2006) Astronomical Data Analysis Software and Systems
XV ASP Conference Series, Vol. 351, Proceedings of the Conference Held 2-5
October 2005 in San Lorenzo de El Escorial, Spain. Edited by Carlos Gabriel,
Christophe Arviset, Daniel Ponz,
and Enrique Solano. San Francisco: Astronomical Society of the Pacific, 2006., p.25. http://adsabs.harvard.edu/abs/2006ASPC..351...25M
.
18. Michel L, Nguyen H, Motch
C. (2005). SAADA: Astronomical Databases
Made Easier. Astronomical
Data Analysis Software and Systems XIV ASP Conference Series, Vol. 347,
Proceedings of the Conference held 24-27 October, 2004 in Pasadena, California,
USA. Edited by P. Shopbell, M. Britton, and R. Ebert.
San Francisco: Astronomical Society of the Pacific, 2005.,
p.71. http://adsabs.harvard.edu/abs/2005ASPC..347...71M
.
19. Nguyen H, Michel L, Motch C.
(2004) SAADA: An Automatic Archival
System for Astronomy Data. Astronomical Data Analysis Software and Systems (ADASS) XIII,
Proceedings of the conference held 12-15 October, 2003 in Strasbourg, France.
Edited by Francois Ochsenbein, Mark G. Allen and
Daniel Egret. ASP Conference Proceedings, Vol. 314. San Francisco: Astronomical
Society of the Pacific, 2004., p.121. http://adsabs.harvard.edu/abs/2004ASPC..314..121N
Book Chapter
1. Hoan Nguyen, Julie D.Thompson,
Patrick Schutz and Olivier Poch. Intelligent
Integrative knowledge bases: bridging genomics, integrative biology and
translational medicine. In: Andreas Holzinger and
Igor Jurisca. Interactive Knowledge Discovery and
Data Mining: State-of-the-Art and Future Challenges in Biomedical Informatics. Springer
LNCS, Volume 8401
Conference paper
1. Benabderrahmane S., Devignes MD.,Smail-Tabbone
M., Poch O., Napoli A., Raffelsberger
W., Guenot D., Nguyen H., Guerin E. (2011). Benchmarking
a new semantic similarity measure using fuzzy clustering and reference sets:
Application to cancer expression data. French
International Conference on Knowledge
Extraction. 11ème Conférence Internationale Francophone sur l'Extraction et la
Gestion des Connaissances-EGC 2011; 01/2011.
2. Bard N., Bolze R., Caron E., Desprez F.,Heymann M.,Friedrich
A., Moulinier L.,Nguyen H.,Poch O.,Toursel T. Décrypthon grid - grid resources dedicated to neuromuscular
disorders. Studies in health technology and informatics 01/2010; 159:124-33.
[International Conferences, Oral presentation]
1. Big-Data fusion and
effects of Disease-related
mutations on protein structure and function (MSV3d Database). The 5th International Biennial
Meeting of Human Variome Project
Consortium
( HVP5), UNESCO Paris, May 2014.
2. Towards a Big Data Ecosystem for Translational
Research.
An
application in Genetic variant, BINGI DAY 2014.
Strasbourg.
3. Gepetto(GEne Prioritization ExTended
Tool): An Open Source Framework for Gene Prioritization .14 th
Annual Bioinformatics Open Source Conference BOSC 201 3, Berlin, Germany July
19 - 20, 2013, Schedule
4. SM2PH-Central: An
Integrative knowledgebase to investigate the genotype to phenotype
relationships involved in human genetic diseases. Integrative Biology-2013. LA, USA, 5-8 August 2013. Schedule
5. Intelligent Integrative KnowledgeBase:Perspective
of new Translational Resarch, OncoTrans, 28
June 2013, Reims
6. Comprehensible Knowledge Discovery System for
Missense Variant, Oral presentation at the 12th International Symposium
on Mutation in the Genome,Lake Louise, AB, Canada.
April 2013. Schedule
7. Extracting
Knowledge from a Mutation Database Related to Human Monogenic. Disease Using
Inductive Logic Programming. International Conference on Bioinformatics,
Computational Biology and Biomedical Engineering, Singapore, 2011
Author:
Hoan Nguyen, last update 02.12.2014 |