AUTHOR=Li Yizhang , Liu Lingyu , Wang Zhongmin , Chang Tianying , Li Ke , Xu Wenqing , Wu Yong , Yang Hua , Jiang Daoli TITLE=To Estimate Performance of Artificial Neural Network Model Based on Terahertz Spectrum: Gelatin Identification as an Example JOURNAL=Frontiers in Nutrition VOLUME=Volume 9 - 2022 YEAR=2022 URL=https://www.frontiersin.org/journals/nutrition/articles/10.3389/fnut.2022.925717 DOI=10.3389/fnut.2022.925717 ISSN=2296-861X ABSTRACT=It is of necessity to determine significant food or traditional Chinese medicine with low cost, which are more likely to achieve high accurate identification by THz-TDS. In this study, feedforward neural networks based on terahertz spectra are employed to predict animal origin of gelatins, whose adaption to the mission is examined by parallel models built by random sample partition and initialization. It is found that the generalization performance of feedforward ANNs on original data is not satisfactory although prediction on trained samples can be accurate. Multivariate scattering correction is conducted to enhance prediction accuracy and 20 additional models verify the effectiveness of such dispose. Special partition of total dataset is conducted based on statistics of parallel models, whose influence on ANN performance is investigated with another 20 model. The performance of models is unsatisfactory due to notable difference in training and test sets according to principal component analysis. By comparing distribution of first two principal components before and after multivariate scattering correction, we found that the reciprocal of minimum number of line segments required for error free classification in 2-D feature space can be viewed as an index to describe linear separability of data. The rise of proposed linear separability would lower requirement for harsh parameter tuning of ANN model and tolerate random initialization. The difference in principal components of samples between training set and data set determines whether partition is acceptable or whether model would have generality. A rapid way to estimate performance of ANN before sufficient tuning on a classification mission is to compare difference between groups and difference within group. Given that a representative peak missing curve is discussed in this paper, the analysis based on gelatin THz spectra may be helpful for studies on some other feature-less species.