AUTHOR=Lee Garam , Kang Byungkon , Nho Kwangsik , Sohn Kyung-Ah , Kim Dokyoon TITLE=MildInt: Deep Learning-Based Multimodal Longitudinal Data Integration Framework JOURNAL=Frontiers in Genetics VOLUME=Volume 10 - 2019 YEAR=2019 URL=https://www.frontiersin.org/journals/genetics/articles/10.3389/fgene.2019.00617 DOI=10.3389/fgene.2019.00617 ISSN=1664-8021 ABSTRACT=As large amounts of heterogeneous biomedical data become available, numerous methods for integrating such datasets have been developed to extract complementary knowledge from multiple domains of sources. Recently, a deep learning approach has been proposed with promising results in a variety of research areas. However, applying such deep learning approaches to heterogeneous dataset involves the strenuous design of deep architectures. Thus, in this paper, we present a deep learning-based python package that can efficiently perform data integration. The python package we propose is called MildInt (Deep learning-based Multimodal longitudinal data integration framework), which provides a pre-trained deep learning architecture for classification tasks. MildInt contains two learning phases: 1) learning feature representation from each modality of the data and 2) training a classifier for the final decision. Adopting a deep architecture in the first phase leads to learning more task-relevant feature representation than a linear model. In the second phase, a linear classifier is used for detecting and investigating biomarkers from multimodal data. By combining the linear model and the deep learning model, higher accuracy and better interpretability can be achieved at the same time. We validate the performance of our package using simulation data and real data. For the real data, as a pilot study, we used clinical and multimodal neuroimaging datasets in Alzheimer’s Disease to predict the disease progression. In such experiments, we show that MildInt is capable of integrating multiple forms of numerical data, including time series data, for extracting complementary features from the multimodal dataset.