[Welcome to the homepage of Hoan Nguyen]

Photo Id

 


Dr Hoan Nguyen

Integrated structural Biology, (IGBMC)

 

Phone: +33 753481806
Fax: +33 3 88 65 32 01

Email: nguyen@igbmc.fr, bmhoan@gmail.com

Summary

 

·         A bioinformatics and data scientist with over many years of extensive experience in integrative bioinformatics dedicated for understand and interpret the molecular consequences of mutations involved in human disease.

·         A data infrastructure and workflow specialist (next sequencing generation and Protein family analysis) dedicated for computational biology and science

·         Ability to multitask with strong organization/management, planning and problem solving skill.

 

Keywords: NGS, SNP, SNV, Mutation Interpretation and Prediction; Gene/Variant Ranking, Complex data fusion; Large Data Management



[Experience]

Since 2006: Data scientist/bioinformatics, Integrated structural Biology, Institute of Genetics and Molecular and Cellular Biology (IGBMC), Strasbourg, France

 

I am working in bioinformatics and data scientist for understanding and predicting the effects of disease-related mutations on function and protein structure at Department of Integrated Structural Biology, Institute of Genetics and Molecular and Cellular Biology (IGBMC), Illkirch, France.

 

I am responsible for development of SM2PH-Central (from Structural Mutation to Pathology Phenotypes in Human) knowledgebase which is a transversal data and computational infrastructure to better understand and describe the networks of causality linking a particular phenotype, and one or various genes or networks. SM2PH-Central also provides access to systematic annotation tools, including sequence database searches, multiple alignment and 3D model exploitation, physico-chemical, functional, structural and evolutionary characterizations of variants/SNPs.

 

In this framework, i developed numerous tools (KD4v prediction, http://decrypthon.igbmc.fr/kd4v/cgi-bin/home and MSV3d database http://decrypthon.igbmc.fr/msv3d/cgi-bin/home) to characterize the structural and functional impact of mutations as well as various types of data linking genotype to phenotype or gene prioritization via many standard web services. I used Inductive Logic Programming (ILP) to automatically extract prolog rules characterizing deleterious mutations that can be biologically interpreted interpretable by biologists and thus can guide human expert discovery of new correlations between mutation, sequence/3D structure and phenotypic severity.

 

My work have been demonstrated and applied in a number of recent studies devoted to specific human diseases, including common multifactorial diseases (Age-related Macular Disease ( AMD Gene Consortium), complete congenital stationary night blindness (Zeitz et al., 2013; Audo et al. 2013; Audo 2014)).

 

            Main tasks:

-Development of numerous tools and methods to characterize the structural and functional impact of missense mutations through the new KD4v prediction and MSV3d database associated as well as various types of data linking genotype to phenotype via many standard web services.

-Design of new cloud based pipeline for Protein family and 3D-structure analysis (http://decrypthon.igbmc.fr/neopipe/)

-Design of the new system for heterogeneous data integration with high level biological query language (IBM Res. and Dev. Journal).

-Manager of development of integrated computational infrastructure (SM2PH-Centtral)

-Supervision of 1 PhD, 2 engineers and 10 Master students

 

 

2002-2005: PhD Student/Software Engineer at Observatories Astronomiques de Strasbourg, France

- Conception and development of SAADA System (Automatic Archival System for Astronomical Data http://amwdb.u-strasbg.fr/saada ) which was used to allow systematic exploitation of the European satellite XMM-Newton (ESA/ESO/CNES) data catalogue http://xcatdb.unistra.fr/3xmm/

 

 

 

[Education]

2006:PhD in Large Scale Data Management, Strasbourg Astronomical Observatory ,University of Strasbourg, France

2002:Master in Computational Science and Applied mathematics, University of La Rochelle, France

1997:B.S in Mathematics and Computer Sciences, University of Hue, Vietnam

 

[Bioinformatics and NGS skill]

 

NGS data analysis: Partek (RNA-seq), FastQC, SNAP,SAMtools, BWA, GATK, Tophat, Annovar

Structural characterization of mutant and homology search: I-Mutant, CSU, Modeler

Mutation prediction: Polyhen-2, SIFT, KD4v,VEP-Variant Effect Predictor

Protein family Analyses: Blast, DbClustal,Mafft, Macsims , TCofee, kalign, Leon

Databases and repositories: Ensembl, UCSC/EnCode, Uniprot, Pfam, Genbank, PDB,1000g, EVS, dbSNP,ClinVar

Ontology and phenotype: GO, GOA, David database, HPO

Biological networks analysis: KEGG, Stringdb, Cytoscape JS

Libraries and API: NCBI API, Ensembl API, R, BioPerl, BioJava, BioPython.

 

[Statistical analysis and Machine Learning skill]

 

Statistical learning: Basic statistical analyses, PCA, Bayesian inference.

Machine learning:  Support Vector Machine, K-Mean, Decision tree (Random Forest, J48), SOM-Self Organizing Map (Kohonen neural network)

Logic learning:  Inductive Logic Programming, Logic Programming.

Tools: R/BioConductor, Java-R, Python-R, Weka, IBM Intelligent, Prolog/ALEPH,

 

[Computer Science skill]

 

Language and methods: Prolog, Java/J2EE, Python, Perl, C++, PHP.

Semantics integration: OWL, RDF, Logic Programming.

Database design and implementation: IBM DB2, Mysql, Postgresql, Jdbc, odbc, MongoDB

Data-mining:  IBM Cognos, WEKA, R, Aleph/Prolog, SVM-Light, SVMPACK.

Web application development: Tomcat, IBM WAS, Uportal, JBOSS,  Python, PHP. SOAP, REST,              JBOSS jBPM, HTML5, Ajax, Json, JSP/Servlet.

Software Architecture: UML, String Framework, JBOSS jBPM, IBM RAD

Distributed computing and Big Data Architecture: IBM InforSphere, Hadoop MapReduce,             NoSQL (HBase, MongoDB), Qsub.

Project and database manager. Good analytical skills and perfectly able to work under stress.


[Publications]

Publications in Journal

 

1.      Carlos Bermejo-Das-Neves, Hoan Nguyen, Olivier Poch and Julie D Thompson. (2014) A comprehensive study of small non-frameshift insertions/deletions in proteins and prediction of their phenotypic effects by a machine learning method (KD4i), BMC Bioinformatics.

 

2.       Nguyen H, Laurent M., Thompson JD, Poch O (2014). Heterogeneous Biological Data Integration with High Level Query Language. Ibm Journal of Research and Development, vol. 58 no. 2/3 , April 1, 2014.

 

3.      Nguyen H, Luu TD, Poch O, Thompson JD. (2013) Knowledge Discovery from a Variant Database using Inductive Logic Programming. Bioinformatics and Biology Insights.

 

4.      Luu TD, Rusu AM, Walter V, Linard B, Poidevin L, Ripp R, Moulinier L, Muller J, Raffelsberger W, Wicker N, Lecompte O, Thompson JD, Poch O, Nguyen H. (2012). KD4v: Comprehensible Knowledge Discovery System For Missense Variant. Nucleic Acids Res; W71-75

 

5.      Luu TD, Rusu AM, Walter V, Ripp R, Moulinier L, Muller J, Toursel T, Thompson JD, Poch O, Nguyen H. (2012). MSV3d: database of human MisSense variants mapped to 3D protein structure. Database (Oxford); bas018.

 

6.      Linard B, Nguyen H, Prosdocimi F, Poch O, Thompson JD (2012). EvoluCode: evolutionary barcodes as a unifying framework for multilevel evolutionary data. Evol. Bioinform Online; 8:61-77

 

7.      Zeitz C, Jacobson SG, Hamel CP, Bujakowska K, Orhan E, Zanlonghi X, Lancelot ME, Michiels C, Schwartz SB, Bocquet B, CSNB consortium, Antonio A, Audier C, Letexier M, Saraiva JP, Luu TD, Sennlaub F, Nguyen H, Poch O, Dollfus H, Lecompte O, Kohl S, Sahel JA, Bhattacharya SS, Audo I. (2013) Whole exome sequencing identifies mutations in LRIT3 as a cause for autosomal recessive complete congenital stationary night blindness. Am J Hum Genet;

 

8.      Audo I, Bujakowska K, Orhan E, Sennlaub F, Guillonneau X, Antonio A, Michiels C, Lancelot ME, Letexier M, Saraiva JP;Nguyen H, Luu TD, Leveillard T, Poch O, Paques M, Saddek MS, Bhattacharya S, Sahel JA, Zeitz C. (2013) The familial dementia gene revisited: whole exome sequencing identifies a missense mutation in ITM2B underlying a novel autosomal dominant retinal dystrophy in a large family. Hum Mol Genet. 2014 Jan 15;23(2):491-501

 

9.      Audo I, Bujakowska K, Orhan E, Poloschek CM, Defoort-Dhellemmes S, Drumare I, Kohl S, Luu TD, Lecompte O, Zrenner E, Lancelot ME, Antonio A, Germain A, Michiels C, Audier C, Letexier M, Saraiva JP, Leroy BP, Munier FL, Mohand-Said S, Lorenz B, Friedburg C, Preising M, Kellner U, Renner AB, Moskova-Doumanova V, Berger W, Wissinger B, Hamel CP, Schorderet DF, De Baere E, Sharon D, Banin E, Jacobson SG, Bonneau D, Zanlonghi X, Le Meur G, Casteels I, Koenekoop R, Long VW, Meire F, Prescott K, de Ravel T, Simmons I, Nguyen H, Dollfus H, Poch O, Leveillard T, Nguyen-Ba-Charvet K, Sahel JA, Bhattacharya SS, Zeitz C. (2012). Whole-exome sequencing identifies mutations in GPR179 leading to autosomal-recessive complete congenital stationary night blindness. Am J Hum Genet; 90: 321-330.

 

10.  Nguyen H, Wicker N, Kieffer D, Poch O. (2010) A new projection method for biological semantic map generation. J. Biomedical Science and Engineering; 3:13-19.

 

11.  Friedrich A, Garnier N, Gagniere N, Nguyen H, Albou LP, Biancalana V, O, Muller J, Moras D, Mandel JT, Toursel T, Moulinier L, Poch O. (2009) SM2PH-db: an interactive system for the integrated analysis of phenotypic consequences of missense mutations in proteins involved in human genetic diseases. Hum Mutat. 31: 127-135

 

 

12.  Bard N, Bolze R, Caron E, Desprez F, Heymann M, Friedrich A, Moulinier L, Nguyen NH, Poch O, Toursel T.(2010): Decrypthon grid - grid resources dedicated to neuromuscular disorders. Stud Health Technol Inform 2010, 159:124-133.

 

 

Publication related to my PhD framework

 

13.  Michel L., Motch C., Nguyen H., Pineau FX., Building an Archive with Saada (2014). Astronomy and Computing. 09/2014

 

14.  Michel L, Motch C, Nguyen H, Pineau FX. (2009) A Guided Tour of Saada. Astronomical Data Analysis Software and Systems XVIII ASP Conference Series, Vol. 411, proceedings of the conference held 2-5 November 2008 at Hotel Loews Le Concorde, Québec City, QC, Canada. Edited by David A. Bohlender, Daniel Durand, and Patrick Dowler. San Francisco: Astronomical Society of the Pacific, 2009., p.563. http://adsabs.harvard.edu/abs/2009ASPC..411..563M.

 

15.  Michel L, Motch C, Pineau FX, Nguyen H. (2010) Building Astronomical Databases with Saada (Update 2010). Astronomical Data Analysis Software and Systems XIX. Proceedings of a conference held October 4-8, 2009 in Sapporo, Japan. Edited by Yoshihiko Mizumoto, Koh-Ichiro Morita, and Masatoshi Ohishi. ASP Conference Series, Vol. 434. San Francisco: Astronomical Society of the Pacific, 2010. p.49. http://adsabs.harvard.edu/abs/2010ASPC..434..491M.

 

16.  Nguyen H, Michel L, Motch C. (2006). Building an Astronomical Database with Saada, Astronomical Data Analysis Software and Systems XV ASP Conference Series, Vol. 351, Proceedings of the Conference Held 2-5 October 2005 in San Lorenzo de El Escorial, Spain. Edited by Carlos Gabriel, Christophe Arviset, Daniel Ponz, and Enrique Solano. San Francisco: Astronomical Society of the Pacific, 2006., p.15. http://adsabs.harvard.edu/abs/2006ASPC..351...15N.

 

17.  Michel L; Nguyen H; Motch C, How to Publish Local Data Into the VO with Saada, (2006) Astronomical Data Analysis Software and Systems XV ASP Conference Series, Vol. 351, Proceedings of the Conference Held 2-5 October 2005 in San Lorenzo de El Escorial, Spain. Edited by Carlos Gabriel, Christophe Arviset, Daniel Ponz, and Enrique Solano. San Francisco: Astronomical Society of the Pacific, 2006., p.25. http://adsabs.harvard.edu/abs/2006ASPC..351...25M .

 

18.  Michel L, Nguyen H, Motch C. (2005). SAADA: Astronomical Databases Made Easier. Astronomical Data Analysis Software and Systems XIV ASP Conference Series, Vol. 347, Proceedings of the Conference held 24-27 October, 2004 in Pasadena, California, USA. Edited by P. Shopbell, M. Britton, and R. Ebert. San Francisco: Astronomical Society of the Pacific, 2005., p.71. http://adsabs.harvard.edu/abs/2005ASPC..347...71M .

 

19.  Nguyen H, Michel L, Motch C. (2004) SAADA: An Automatic Archival System for Astronomy Data. Astronomical Data Analysis Software and Systems (ADASS) XIII, Proceedings of the conference held 12-15 October, 2003 in Strasbourg, France. Edited by Francois Ochsenbein, Mark G. Allen and Daniel Egret. ASP Conference Proceedings, Vol. 314. San Francisco: Astronomical Society of the Pacific, 2004., p.121. http://adsabs.harvard.edu/abs/2004ASPC..314..121N

 

 

 

Book Chapter

 

1. Hoan Nguyen, Julie D.Thompson, Patrick Schutz and Olivier Poch. Intelligent Integrative knowledge bases: bridging genomics, integrative biology and translational medicine. In: Andreas Holzinger and Igor Jurisca. Interactive Knowledge Discovery and Data Mining: State-of-the-Art and Future Challenges in Biomedical Informatics. Springer LNCS, Volume 8401

 

Conference paper

 

1.      Benabderrahmane S., Devignes MD.,Smail-Tabbone M., Poch O., Napoli A., Raffelsberger W., Guenot D., Nguyen H., Guerin E. (2011). Benchmarking a new semantic similarity measure using fuzzy clustering and reference sets: Application to cancer expression data. French International Conference on Knowledge Extraction. 11ème Conférence Internationale Francophone sur l'Extraction et la Gestion des Connaissances-EGC 2011; 01/2011.

 

2.      Bard N., Bolze R., Caron E., Desprez F.,Heymann M.,Friedrich A., Moulinier L.,Nguyen H.,Poch O.,Toursel T. Décrypthon grid - grid resources dedicated to neuromuscular disorders. Studies in health technology and informatics 01/2010; 159:124-33.

 


[International Conferences, Oral presentation]

 

1.    Big-Data fusion and effects of Disease-related mutations on protein structure and function (MSV3d Database). The 5th International Biennial Meeting of Human Variome Project Consortium ( HVP5), UNESCO Paris, May 2014.

2.    Towards a Big Data Ecosystem for Translational Research. An application in Genetic variant, BINGI DAY 2014. Strasbourg.

3.    Gepetto(GEne Prioritization ExTended Tool): An Open Source Framework for Gene Prioritization .14 th Annual Bioinformatics Open Source Conference BOSC 201 3, Berlin, Germany July 19 - 20, 2013, Schedule

4.    SM2PH-Central: An Integrative knowledgebase to investigate the genotype to phenotype relationships involved in human genetic diseases. Integrative Biology-2013. LA, USA, 5-8 August 2013. Schedule

5.    Intelligent Integrative KnowledgeBase:Perspective of new Translational Resarch, OncoTrans, 28 June 2013, Reims

6.    Comprehensible Knowledge Discovery System for Missense Variant, Oral presentation at the 12th International Symposium on Mutation in the Genome,Lake Louise, AB, Canada. April 2013. Schedule

7.    Extracting Knowledge from a Mutation Database Related to Human Monogenic. Disease Using Inductive Logic Programming. International Conference on Bioinformatics, Computational Biology and Biomedical Engineering, Singapore, 2011

 

 

 

 

 


Author: Hoan Nguyen, last update 02.12.2014