AUTHOR=Hsieh Ai-Ru , Li Yi-Mei Aimee TITLE=Using Image Recognition to Process Unbalanced Data in Genetic Diseases From Biobanks JOURNAL=Frontiers in Genetics VOLUME=Volume 13 - 2022 YEAR=2022 URL=https://www.frontiersin.org/journals/genetics/articles/10.3389/fgene.2022.822117 DOI=10.3389/fgene.2022.822117 ISSN=1664-8021 ABSTRACT=With precision medicine as the goal, the human biobank of each country should be analyzed to determine the complete research results related to genetic diseases. In addition, with the increase of medical imaging data, automatic image processing with image recognition has been widely studied and applied in biomedicine. However, case-control data imbalance often occurs in human biobanks, which is usually solved by the statistical method SAIGE. Due to the huge amount of genetic data in the human biobanks, direct use of the SAIGE method often faces the problem of insufficient computer memory to support calculations and excessive calculation time. The other method is to use sampling to adjust the data to balance the case-control ratio, which is called Synthetic Minority Oversampling Technique (SMOTE). Our study employed the Manhattan Plot and genetic disease information from the Taiwan Biobank (TWB) to adjust the imbalance case-control ratio by SMOTE, called “TWB-SMOTE”. We further used a deep learning image recognition system to identify the TWB-SMOTE. We found TWB-SMOTE can achieve the same results as that of SAIGE and UK Biobank (UKB). The processing of technical data can be equivalent to the use of data plots with a relatively large UKB sample size and achieve the same effect as the statistical method SAIGE to address data imbalance.