AUTHOR=Deng Yihan , Denecke Kerstin TITLE=Classification of user queries according to a hierarchical medical procedure encoding system using an ensemble classifier JOURNAL=Frontiers in Artificial Intelligence VOLUME=Volume 5 - 2022 YEAR=2022 URL=https://www.frontiersin.org/journals/artificial-intelligence/articles/10.3389/frai.2022.1000283 DOI=10.3389/frai.2022.1000283 ISSN=2624-8212 ABSTRACT=The Swiss classification of surgical interventions (CHOP) has to be used in daily practice by physicians to classify the clinical procedures. Its purpose is to encode the delivered healthcare services for the sake of quality assurance and billing. For encoding a procedure, a code of maximal 6-digits has to be selected from the classification system, which is currently realised by a rule-based system composed of encoding experts and a manual searching in the CHOP catalog. In this paper, we will investigate the possibility of automatic CHOP code generation based on a short query to enable automatic support of the manual classification. The wide and deep hierarchy of CHOP and the differences between text used in queries and catalog descriptions are two apparent obstacles for training and deploying a learning-based algorithm. Because of these challenges, there is a need for an appropriate classification approach. We evaluate different strategies (multi-class non-terminal and per-node classifications) with different configurations, so that a flexible modular solution with high accuracy and efficiency can be provided. The results clearly show that the per-node binary classification outperforms the non-terminal multi-class classification with an F1-micro measure between 92.6-94%. The hierarchical prediction based on per-node binary classifiers achieved an high exact match by the single code assignment on the five fold cross-validation. In conclusion, the hierarchical context from the CHOP encoding can be employed by both classifier training and representation learning. The hierarchical features have all shown improvement on the classification performances under different configuration respectively: the stacked autoencoder and training examples aggregation using true path rules as well as the unified vocabulary space have largely increased the utility of hierarchical features. Additionally, the threshold adaption through Bayesian aggregation has largely increased the vertical reachability of the per node classification. All the trainable nodes can be triggered after the threshold adaption, while the F1 measures at code level 3-6 have been increased from 6% to 89% after the thresholds adaption.