AUTHOR=Yang Si , Li Chenxi , Mei Yang , Liu Wen , Liu Rong , Chen Wenliang , Han Donghai , Xu Kexin TITLE=Determination of the Geographical Origin of Coffee Beans Using Terahertz Spectroscopy Combined With Machine Learning Methods JOURNAL=Frontiers in Nutrition VOLUME=Volume 8 - 2021 YEAR=2021 URL=https://www.frontiersin.org/journals/nutrition/articles/10.3389/fnut.2021.680627 DOI=10.3389/fnut.2021.680627 ISSN=2296-861X ABSTRACT=Different geographical origins can lead to a great variance in coffee quality, taste and commercial value Hence, controlling the authenticity of the origin of coffee beans is of great importance for producers and consumers worldwide. In this study, terahertz (THz) spectroscopy combined with machine learning methods were investigated as a fast and nondestructive method to classify geographic origin of coffee beans. The popular machine learning method including convolutional neural network (CNN), linear discriminant analysis (LDA) and support vector machine (SVM) were compared with to obtain the best model. The curse of dimensionality will cause some classification methods struggling to train effective models. Thus, principal component analysis (PCA) and genetic algorithm (GA) were applied for LDA and SVM to create a new, smaller set of features. The first nine principal components (PCs) with accumulative contribution rate of 99.9% extracted by PCA and twenty-one variables selected by GA were the inputs of LDA and SVM models. The results demonstrated that the excellent classification (accuracy was 90% in prediction set) could be achieved using CNN method. The results also indicate variable selection as an important step to create an accurate and robust discrimination model. The performances of LDA and SVM algorithms could be improve with spectra feature extracted by PCA and GA. The GA-SVM has achieved 75% accuracy in prediction set, while the SVM and PCA-SVM have achieved 50% and 65% accuracy, respectively. These results demonstrate that THz spectroscopy together with machine learning methods are effective and satisfactory approach for classifying the geographical origin of coffee beans, suggesting our techniques to tap the potential application of deep learning in the authenticity of agricultural products, while expanding the application of THz spectroscopy.