Adam Yao, Research Scientist, Academia Sinica
Bio-IT group of National Center for Genome Medicine (NCGM) consists of three major teams including Data Analysis Team (DAT), Data Management Team (DMT), and Technical Support and Computing Team (TSCT) to provide IT support for precision medicine in big data era. NCGM started next generation sequencing (NGS) in 2007 using Roche 454. In 2010, Illumina Hiseq 2000 was introduced to the center. Use of these sequencers resulted in enormous amounts of data reaching over hundreds of terabytes. In order to support the increasing demand, DAT together with TSCT had built a specific and efficient infrastructure dedicated to NGS data analysis. Our teams have succeeded in making fundamental NGS data analysis quick and easy by defining adequate analysis workflows, utilizing and optimizing high performance computing, and designing an intuitive graphic user interface.
Scalable Data Management and Analysis Infrastructure for Clinical Sequencing – From Mapping to Interpretation
Thursday, 1 December 2016 at 11:30
There is an emerging need for data management, integration, analysis, and interpretation of data generated from NGS (next-generation sequencing) technologies. Our group consisting of a multidisciplinary team has built a scalable infrastructure capable of effectively handling the above tasks. Our infrastructure contains a bioinformatics pipeline to process NGS data, an analysis pipeline to annotate all genetic variants detected in patient samples, and a genome variant database to store known genetic variants and their clinical interpretation. This infrastructure is easily adaptable for various research settings ranging from small research labs to large sequencing centers.