Acupuncture and tuina knowledge graph with prompt learning

Introduction Acupuncture and tuina, acknowledged as ancient and highly efficacious therapeutic modalities within the domain of Traditional Chinese Medicine (TCM), have provided pragmatic treatment pathways for numerous patients. To address the problems of ambiguity in the concept of Traditional Chinese Medicine (TCM) acupuncture and tuina treatment protocols, the lack of accurate quantitative assessment of treatment protocols, and the diversity of TCM systems, we have established a map-filling technique for modern literature to achieve personalized medical recommendations. Methods (1) Extensive acupuncture and tuina data were collected, analyzed, and processed to establish a concise TCM domain knowledge base. (2)A template-free Chinese text NER joint training method (TemplateFC) was proposed, which enhances the EntLM model with BiLSTM and CRF layers. Appropriate rules were set for ERE. (3) A comprehensive knowledge graph comprising 10,346 entities and 40,919 relationships was constructed based on modern literature. Results A robust TCM KG with a wide range of entities and relationships was created. The template-free joint training approach significantly improved NER accuracy, especially in Chinese text, addressing issues related to entity identification and tokenization differences. The KG provided valuable insights into acupuncture and tuina, facilitating efficient information retrieval and personalized treatment recommendations. Discussion The integration of KGs in TCM research is essential for advancing diagnostics and interventions. Challenges in NER and ERE were effectively tackled using hybrid approaches and innovative techniques. The comprehensive TCM KG our built contributes to bridging the gap in TCM knowledge and serves as a valuable resource for specialists and non-specialists alike.


Introduction
Acupuncture and Tuina stand as foundational therapies within Traditional Chinese Medicine (TCM), boasting centuries of esteemed practice and theoretical development.Their origins trace back to ancient China, where detailed theories and practical methodologies were documented in medical classics like the Huangdi Neijing over two millennia ago (Unschuld, 2016).Since the Neolithic era, these practices have gradually become integrated into human life (Cao, 2008) and are widely employed in clinical treatments due to their simplicity and rapid effectiveness.In recent years, acupuncture and tuina have gained increased recognition in various countries, including the United States and Europe.In 2020, a research article featured in the journal Neuron, authored by a team led by Professor Qiufu Ma from Harvard Medical School, demonstrates that acupuncture, through targeted stimulation of specific acupoints on the body's surface, can activate a range of "somatic-sensory-autonomic-targetorgan" reflex pathways.This activation is capable of effecting immune-inflammatory modulation (Liu et al., 2020).In 2021, the team led by Qiufu Ma further affirmed the scientific validity of acupuncture treatments by shedding light on the neuroanatomy involved in acupuncture's activation of specific signaling pathways (Liu et al., 2021).Moreover, the World Health Organization (WHO) acknowledges acupuncture's efficacy in addressing a wide range of conditions, including but not limited to chronic pain, nausea, and certain neurological disorders.Such recognition further underscores the significance and widespread applicability of acupuncture within contemporary medical practices.In the era of the Internet's evolution, deep learning's pervasive use in daily life has become evident.Its applications span essay recommendations to text data analysis, serving as an essential component (Wang et al., 2018;Guan et al., 2020).The integration of deep learning into the medical domain is equally noteworthy, synergizing neural networks and medical technology to advance science and technology.The surge in regional health informatization and medical technology has amassed substantial medical data.Extracting and effectively utilizing this information is vital for intelligent medical support (Hou et al., 2018).Knowledge graph (KG) technology, as an emerging technology for information organization and processing, demonstrates the ability to efficiently integrate and analyze vast amounts of data and information.By systematically consolidating both ancient and modern literature into a knowledge base, knowledge graphs offer decision-making support for medical practitioners.In the realm of TCM, this technology presents new opportunities for development.
The construction of knowledge graphs is a fundamental concern within the field of knowledge graph research (Liu et al., 2016).The knowledge graph architecture encompasses both its logical structure and technical components.In this paper, we concentrate on two vital aspects of knowledge graph construction: the technical architecture, specifically entity extraction, and relationship extraction.
Nevertheless, constructing high-quality medical knowledge graphs faces significant challenges, particularly due to the distinct characteristics and requirements of medical terminology compared to common terms.Such endeavors often demand substantial human and material resources.In particular, the development of Traditional Chinese Medicine (TCM) knowledge graphs has received relatively less research attention compared to general medical knowledge graphs.Furthermore, detailed research specific to each branch of TCM remains limited.
In the field of acupuncture and tuina, there are still many problems.
• Many acupuncture and tuina treatment plans have many ambiguous concepts, such as the location of acupuncture, the strength of tuina, etc. • There is a lack of specific quantitative assessment in the implementation of treatment protocols.
• The extant TCM systems are heterogeneous.In the case of acupuncture alone, 26 mainstream schools exist (Zhang and Xia, 2018).
All of the above problems pose a major obstacle to the development of personalized treatment plans.In our previous work, a knowledge graph of acupuncture and tuina was constructed using ancient Chinese medical literature, but there is still no graph filling technique for modern literature (Han et al., 2021).
In the context of Prompt Tuning for Few-shot Named Entity Recognition (NER), the majority of existing methods have primarily focused on English text, resulting in a relative scarcity of methods specifically designed for Chinese text.Chinese sentences tend to be longer compared to English sentences, often spanning tens or even hundreds of Chinese characters.This increased sentence length significantly expands the search space for templates and poses challenges in finding suitable templates for Chinese text.Moreover, Chinese entities typically consist of at least two characters, and in domains like Chinese medicine, entities can even comprise nearly ten words.The diverse meanings of Chinese characters introduce the possibility of unintended partially nested entities within longer entities.All this makes the previous methods do not fit well in Chinese texts.
In order to solve the above problems, in this paper, we hope to accomplish the structured storage and retrieval of acupuncture and tuina knowledge in modern literature with the help of knowledge graphs, and at the same time provide a basis for personalized medical solution recommendation.To summarize the contribution of this work: • We have collected a large amount of knowledge related to the field of acupuncture and tuina and have built a small domain knowledge base based on this knowledge.It contributes to the construction of the KG later, and also facilitates the needs of other researchers.The study of this paper is shown in Figure 1.

Related work . Named Entity Recognition (NER)
Entity extraction, also known as Named Entity Recognition (NER), holds significant importance in the field of natural language Frontiers in Big Data frontiersin.org .Prompt tuning for few-shot NER Prompt learning refers to changing the downstream task to the text generation task by adding some hints to the input

. Knowledge graph of TCM
The construction of the Traditional Chinese Medicine (TCM) knowledge graph has attracted the involvement of numerous domestic and international internet companies.Yu et al. leveraged the linguistic system of TCM as a foundation and integrated a series of TCM-related databases to develop a comprehensive TCM knowledge graph.This knowledge graph was subsequently embedded and utilized in a TCM knowledge service platform.Furthermore, Yu et al. (2015) capitalized on the digital resources accumulated in the field of TCM to construct a knowledge graph specifically tailored to TCM healthcare.Currently, one of the most notable TCM knowledge graphs is developed by the Institute of Traditional Chinese Medicine Information at the Chinese Academy of Traditional Chinese Medicine.This knowledge graph encompasses 127 semantic types and 58 semantic relationships, establishing it as a valuable resource in the field (Cui et al., 2014).The global acupuncture clinical trial research is booming, and clinical evidence for acupuncture is emerging.Nenggui Xu's team applied artificial intelligence analysis technology to complete the "linking" of original research and 332 systematic evaluations of evidence in 20 disease areas, comprehensively improved the clinical evidence matrix of acupuncture therapy in the Epistemonikos database, and formulated the world's first clinical evidence atlas for acupuncture (Lu et al., 2022).

Construction of the knowledge graph schema layer
Before the knowledge graph is constructed, it's critical to have a deep understanding of domain requirements.In the field of acupuncture and tuina, the core problem faced is disease and treatment, and the main questions to be addressed are as follows.
• What are the possible symptoms of a disease?What treatment techniques should be used and which acupoints should be targeted?• What are the possible diseases associated with certain symptoms?What treatment techniques should be used and which acupoints should be targeted to relieve symptoms?• What are the common therapies of acupuncture and tuina?
Which acupoints are commonly used for a particular therapy?What kind of functions will it have?• What are the common acupoints used for acupuncture and tuina?What are the effects of stimulating a particular acupoint?After analyzing the aforementioned issues, we have identified five main categories of entities that are crucial to the field of acupuncture and tuina.These categories include disease, symptom, acupoint, therapy, and function.The specific descriptions of each category are summarized in Table 1.In Section 5 of our paper, we specifically address the challenge of dealing with these issues.
The distinction between diseases and symptoms lacks a clear demarcation.For instance, the entity "headache" can be regarded both as a disease and as a symptom of a disease.To ensure consistency and prevent conflicts during the construction of the knowledge graph, we impose limitations on the scope of diseases.Specifically, all disease entities are required to align with either the International Classification of Diseases, 10th Revision (ICD-10) for [Classification in Health (Australia), 2004] or the Clinical Terminology for Chinese Medicine, Disease Section (GB/T 16751.1-1997).Consequently, if a suspected entity corresponds to a disease or symptom and fulfills both of the mentioned criteria, it is classified as a disease entity; otherwise, it is categorized as a symptom entity.
Both acupoints and meridians play crucial roles in TCM, representing significant theoretical relationships.Meridians are characterized by their linear distribution throughout the body, while acupoints are specific points or zones along these meridians.It can be considered that acupoints are attributed to meridians, including the existence of extra-meridian points.Given the limited number of meridians, for the purpose of this paper, we consider meridians and acupoints as entities of the same type.
Then, this paper also defines six relationships according to the requirements, as described in Table 2.
After the above definition, the specific framework design of the acupuncture and tuina knowledge graph schema layer is shown in Figure 2.

Construction of the domain knowledge base
The cold-start problem poses a significant challenge in the construction of domain knowledge graphs at the current stage.Both entity extraction and relationship extraction rely heavily on annotated data, necessitating the creation of comprehensive datasets.Furthermore, annotators involved in dataset annotation are required to possess substantial domain-specific knowledge.
Hence, prior to constructing the data layer, this paper advocates the establishment of a comprehensive domain knowledge base.This knowledge base encompasses essential components, namely the domain dictionary, disease information base, acupoint information base, and acupuncture and tuina literature base.The integration of these resources aims to address the challenges posed by the limited domain expertise of personnel and mitigates the complexities associated with data annotation processes.We also hope to improve the structure of the graph with the domain knowledge base, including increasing the types of relationships and supplementing entity attributes.

. Structured knowledge base
According to the above, we mainly collected data for two types of entities, disease, and acupoint.For the disease entities, information on aliases, onset sites, registered departments, and typical symptoms of common diseases was crawled.A total of 2011 common diseases were crawled, and the sample data (only some attributes of some records were extracted) are shown in Table 3.For the acupoint entity, we crawled the data related to acupoint and meridian in the Chinese medicine network.A total of 366 common acupoints and meridians were crawled, and the sample data (only some attributes of some records were extracted) are shown in Table 4.
In order to make the knowledge base scalable and more efficient for retrieval, the obtained data are stored in database in the form of tables to establish a structured knowledge base.
After crawling the structured data, we directly populate it into the knowledge graph as entities or their corresponding relationships.This approach enables us to include some data directly in the initial graph, effectively addressing the cold start problem.As an illustration, consider the second row of data in Table 3, we discern that the symptoms associated with rhinitis include runny nose, stuffy nose, and loss of smell.This information was directly incorporated into the KG, accompanied by the relevant relationships, resulting in the creation of a node graph representing the interconnections among entities linked to rhinitis.

. Domain dictionary
Before proceeding with formal entity extraction, it is necessary to collect a representative sample of each entity type in order to build a comprehensive entity dictionary.The entities were primarily sourced from Chinese medicine websites and other relevant sources.An example dataset for the dictionary is presented in Table 5.To maintain consistency, the length of all entities was limited to 10 characters.It should be noted that not all entities in the knowledge base or domain dictionary were added to the knowledge graph in order to avoid creating a large number of "isolated nodes" and connected components during entity extraction.If the current entity appears in the corpus then it is added to the knowledge graph.Conversely, if it only appeared within the domain dictionary and not in the corpus, it means that it is likely to have no relational triples and is isolated, then it will not be added to the knowledge graph.Finally, only about 48% of the entities were added to the graph. .Unstructured literature base The corpus data used for information extraction is mainly obtained from three parts: China National Knowledge Infrastructure (CNKI), Chinese Medicine and Chinese Herbs Network, and Chinese Medicine Network (see text footnote ). literature for subsequent analysis and knowledge graph construction.
CNKI is a comprehensive repository of knowledge resources in China, covering a wide range of subject areas.In this website, we are able to obtain valuable and authoritative literature related to acupuncture and tuina.However, due to copyright restrictions, we can only download a limited number of relevant documents.
To further analyze the literature, we performed optical character recognition (OCR) on the PDF documents, converting them into TXT format to obtain editable text.This allowed us to extract and process the textual content of the literature for subsequent analysis and knowledge graph construction.
Chinese Medicine and Chinese Herbs Network and Chinese Medicine Network are similar open websites for Chinese medicine, with a large number of medical post related to acupuncture and tuina.These posts serve as valuable sources for extracting entities and relationships required for constructing the knowledge graph.To compensate for the limited corpus available on the Knowledge Network, we collected web texts from these medical websites.In total, we crawled 3,236 posts relevant to acupuncture and tuina.
The collected literature underwent basic preprocessing, including the filtering of special symbols.The title, content, and source information of the literature were then stored in the database, establishing a comprehensive library of acupuncture and tuina literature. .

. Entity extraction based on Trie tree
Trie tree, which is also known as dictionary tree and prefix tree.Figure 3 shows a Trie tree and the list of words it contains.For this tree, a query for the entity "rhinitis" would follow the path "1-2-5".
In this study, the construction of Trie trees was implemented using a Dictionary built in Python.The key-words of the Trie tree are strings, and the Trie tree stores each keyword in a path instead of a node.With different node paths, different keywords were gotten, i.e. entities.In addition, two keywords with a common prefix have the same path in the prefix part of the Trie tree.Here, we modeled a total of five Trie trees, each representing a different entity type, for entity extraction.The algorithm for constructing the Trie trees was described in Algorithm 1.
After obtaining the Trie tree, entity extraction and data annotation were performed for the 819 literature samples.The data annotation followed the BIO annotation format.For each literature, it was divided into sentences, and Algorithm 2 was applied to generate candidate words and match them with the Trie tree.The entities that matched successfully were annotated with the corresponding BI tags, while unsuccessful candidates were annotated with the O tag.Entity extraction was conducted for all 819 samples.The number of entity samples obtained for each entity type is presented in Table 6, and the annotation samples can be seen in Figure 4.

. . TemplateFC based entity extraction . . . Dataset pre-processing
The BERT embedding models have a maximum input sequence length of 512 tokens.Considering the two special characters [CLS] and [SEP], the original sequence length should be less than 510 tokens.Since most of the literature in the previous section exceeds this length, we needed to divide the literature.
We opted to divide the literature into blocks while aiming to preserve as much information as possible about the headings at each level.The blocks were created to be as large as possible but still smaller than 510 tokens.Additionally, we retained the requirement of dividing the literature by sentence.As a result, a total of 2505 literature pieces were obtained after the block division.Subsequently, Trie tree-based entity extraction was conducted on the divided literature to obtain the initial training dataset for the deep learning model.

. . . TemplateFC model
In the template-free prompt tuning method, NER (Named Entity Recognition) was reimagined as a language modeling (LM) task.However, instead of relying on predefined templates, a new objective called Entity-oriented LM (EntLM) was introduced to fine-tune NER without reusing the LM objective (as done in previous approaches using templates) (Ma et al., 2021).In this method, the LM was trained to predict a label word at the position of the entity when given input text, serving as an indication of the entity's label.For non-entity words like "was" the LM continued to predict the original word.
However, the template-free prompt tuning method also has some disadvantages when applied to the NER task, particularly in the context of Chinese text.Chinese entities often exhibit continuity, leading to situations where a single entity is identified as multiple entities.For instance, in the case of "gastrointestinal neurosis", the term "gastrointestinal" might be identified as an acupoint entity, while "neurosis" is recognized as a disease entity.This highlights the challenge of accurately identifying and disambiguating Chinese entities due to their structural uniqueness.
Another limitation is that the EntLM model is primarily designed for English text and may not be well-suited for Chinese language processing.In English, most entities are typically represented by a single token, whereas Chinese entities often span multiple tokens.This difference in tokenization and entity representation increases the likelihood of inaccurate identification when applying the EntLM model to Chinese text.
In this paper, we proposed an NER model that combines the benefits of template-free prompt learning with enhanced applicability to Chinese text (TemplateFC).We continued to employ a template-free entity-level LM fine-tuning process, but we introduced additional components, namely Bi-directional Long Short-Term Memory (BiLSTM) and Conditional Random Field (CRF) layers, to improve the accuracy of Chinese entity recognition.By incorporating these layers into the training process, the model can learn useful constraints that facilitate better finetuning of the pre-trained model.
First, we selected the appropriate label words for the NER model.Due to the limited availability of few-shot data, the label words were chosen randomly.However, to improve the generalizability of the model, we selected label words from a pool of 1,060 documents in the literature base.The selection of label words was performed using the best method from the EntLM model, which combined both Data and LM search.The process of selecting label words for the B-DIS label was illustrated in the Figure 5.
Figure 6 is the general structure of the model.This model first gives the input text X = {x 1 , . .., x n }, and its corresponding label sequence is Y = {y 1 , . .., y n }.Here, we set the set of label words to V l , and it was connected to the set of task labels with a mapping function M : Y → V l .Next, the target sequence we want to obtain is X Ent = {x 1 , . .., M(y i ), . .., x n } ( Here, we assume that the word at i is the entity label).The loss function is shown in Equation 1. (1)

FIGURE
BIO labeling example.
In the following, we set the embedding vector W = {w 1 , . .., w n } obtained after the BERT pre-training model.Next, the embedding vector representation W gets the Emission score matrix X r,l ∈ R n×l (n is the number of tokens, l is the number of label types) of the current token location about each label through the BiLSTM layer.Each token may be a different label, so there are n×l possible paths, and only one true path exists among them.Therefore, after passing through the CRF layer, a new loss function was obtained as shown in Equation 2.
L entity = P realpath n i=1 P i . (2) is the score of the i-th path.The exact calculation of S i is shown in Equation 3.
EmissionScore can be obtained from the Emission score matrix, and TransitionScore is composed of parameters in the CRF layer.Finally, we combined the two loss functions to get the new Loss value and performed joint training, the new Loss value is shown in Equation 4.
After the input text X got the embedding vector W by the pretraining model, the label corresponding to the current token can be obtained directly by the argmax function, as shown in Equation 5. (5) Alternatively, the corresponding label can be obtained by decoding through the CRF layer. .

. Fusion of entity extraction results
When the entity extraction based on the Trie tree and ALBERT-CRF were completed, the results of the two extractions needed to be fused.Drawing on the literature, and Meituan's design of the entity extraction model, we used dictionary matching to mine discovered entities and deep learning models to mine potential entities.The following fusion rules were developed.

Rule-based relationship extraction
Acupuncture and tuina constitute an ancient and specialized medical domain characterized by a stable terminology and conceptual framework.Given the absence of an initial relational dataset and the challenge of acquiring extensive labeled data, opting for a rule-based approach becomes advantageous.This approach involves extracting relational patterns based on expert experience and existing literature, thereby enhancing alignment with the specialized nature of acupuncture and massage.Additionally, it serves to alleviate the burden associated with data labeling.To establish an expression paradigm for the relationships, a substantial amount of text was analyzed and summarized.Subsequently, we devised a set of coherent matching rules to facilitate the extraction of relationships.The formulation of rules primarily encompassed subject word-based extraction, entity location-based extraction, and keyword-based extraction strategies.Our rules were built based on a large amount of text and facts.For example, "Body acupuncture and massage with acupuncture operation: Quchi, Hegu and matching acupoints", our model identified body acupuncture and massage as OPE entities, identified Quchi and Hegu as XW entities, and extracted the corresponding OPE-XW relationships.
This paper combines three common rule-based approaches and an analysis of literature data to develop rules as shown in Table 7.
A total of 40,919 relationships were extracted from all the literature.Among them, 5566 are DIS-SYM relationships, 6705 are DIS-OPE relationships, 6412 are DIS-XW relationships, 6808 are OPE-XW relationships, 6824 are OPE-FUN relationships, 8604 are XW-FUN relationships.

. Datasets and implementation details
This paper addressed the limitation of resources in the TCM domain by conducting two experiments: one in a resource-rich setting and another in a few-shot setting.To evaluate our approach, we utilized two datasets from distinct domains, the CoNLL2003 dataset (Sang and De Meulder, 2003) sourced from the newswire domain, and a self-built database specific to the acupuncture and tuina domain within TCM.The details of these experiments were provided below.It is important to note that due to limited availability of public information and restricted access to Chinese medicine acupuncture and tuina, the dataset may be limited in size.
Consequently, for the few-shot experiment, we selected a subset of this data.Specifically, the few-shot experiment utilized 10 training texts and 498 test texts.Finally, We selected the F1 score as the evaluation metric for our experiments.The F1 score, which is a balanced measure of precision and recall, effectively reflects the performance of our experiments in a fair and comprehensive manner.

. Baselines and proposed models
In our experiments, we have chosen the richer baselines with the following details.
Bert + HMM: A classical model which combines BERT and HMM for NER.It learns the representation of the input text using BERT, which converts each word or subword into a high-dimensional vector representation capturing its rich semantic information.And then models the sequence of BERT representations using the HMM model, which can be used for NER.
TemplateNER (Cui et al., 2021): A Prompt Learning approach: using generative BART models for sequence annotation tasks and exploring the potential of BART models for few-shot scenarios using a Template-based approach.Two-tower (Ma et al., 2022): The Few-shot Named Entity Recognition (NER) problem was addressed using a two-tower model.The model comprises two BERT encoders: one encoder is responsible for encoding the representation of each token, while the other encoder encodes the natural language form of the BIO label to obtain the label representation.Subsequently, the model predicts the similarity between each token and all the label representations within the text.Finally, the label with the highest similarity is assigned to the token.
EntLM (Ma et al., 2021): A approach which abandoned the template and used NER as a language model task.In NER task, the position of the entity was predicted as label word, and the non-entity position was predicted as the original.
EntLM-CRF: Add CRF layer decoding on top of the above model.

Implementation of the knowledge graph
After the completion of entity extraction, the obtained set of entities and relational triples were stored in Neo4j, a graph database.To facilitate querying and exploration, graph were executed using the Cipher language.Additionally, to cater to the needs of non-specialists, a Python query interface.anda web-based query user interface (UI) were developed.
Considering the practical applicability in real-life scenarios, queries were primarily conducted for four main types of entities: diseases, symptoms, acupoints, and therapies.The queries were limited to a step size of 1, which corresponds to single-hop queries.This approach enabled efficient retrieval of related information within a single query operation.
After completing the development phase, the website was deployed locally using a personal computer.Subsequently, local testing was conducted to ensure its functionality.
During the entity search, we entered the query "coronary artery disease", and the results are displayed in Figure 7.The diagram illustrates that coronary artery disease is associated with symptoms such as shortness of breath and sweating.Moreover, commonly employed treatments include pointing, pressing, and kneading.Lastly, specific acupoints such as Zhimen and Zhiyang can be targeted for treatment.This effectively addresses question one as outlined in section two.
To substantiate the credibility of our research, we have opted to engage a reputable medical platform, Hua Tuo GPT.We have meticulously designed a set of three targeted inquiries to be posed and elucidated on this platform: • What are the main symptoms of coronary heart disease?
Answer: The principal manifestations encompass angina or discomfort in the chest, sometimes accompanied by secondary symptoms like dyspnea and diaphoresis.In severe instances, symptoms can escalate to myocardial infarction.Urgent medical attention is imperative upon experiencing any of these signs, ensuring prompt mitigation of health risks.Furthermore, proactive lifestyle adjustments, such as smoking cessation, alcohol moderation, weight maintenance, and increased physical activity, can collectively serve as preventative measures against disease onset and progression.Commonly used acupuncture techniques include tonifying deficiency, clearing the heart, calming asthma, and warming yang, while tuina focuses on regulating qi and blood, promoting blood circulation for removing obstruction in collaterals, and other aspects.However, due to the variability of each individual's body and different disease states, the choice of acupoints and techniques may vary.Therefore, it is recommended that you consult with a professional practitioner and obtain the necessary assessment and guidance before undergoing acupuncture and tuina treatment.
After analyzing the three provided questions and their corresponding answers, several conclusions can be deduced: • Enhanced Detail and Precision: Our responses exhibit a notable augmentation in both detail and specificity.For instance, in elucidating the symptoms of coronary heart disease, our answers encompass a broader spectrum of symptoms, encompassing pallor and bruising, which the website's responses did not encompass.This augmentation in symptom coverage enhances the comprehensiveness of our answers.• Domain-Specific Focus: Our answers distinctly exhibit a focused domain expertise, centered around the realm of acupuncture and tuina.This is reflected in the knowledge graph that this paper presents, which is intrinsically tailored to the acupuncture and tuina domain.Conversely, the website's responses tend to encompass a more generalized medical perspective, devoid of the intricate nuances specific to acupuncture and tuina.This divergence underscores our paper's potency in providing in-depth insights within the acupuncture and tuina domain.
In summation, our website notably excels in the sphere of acupuncture and tuina, securing a distinct advantage over the comprehensive medical advice provided by the website.Our work Frontiers in Big Data frontiersin.orgleverages its domain-specific focus to furnish detailed and precise counsel, aligning with the depth and expertise inherent in the acupuncture and tuina field.This renders our website a preeminent source of tailored advice within the realm of acupuncture and tuina, underscoring its authority and value within this specialized domain.

Discussion
This paper presents the establishment of a novel knowledge base in the domain of acupuncture and tuina, utilizing modern literature as the foundation.Subsequently, the schema layer of the acupuncture and tuina knowledge graph was designed, considering the requirements for practical applications in TCM.The NER task was accomplished through fused Trie extraction and model extraction techniques, while the ERE task was completed using rule-based methods.Additionally, in the context of few-shot learning, we proposed a TemplateFC model, which becomes a more adaptable template-free prompt tuning method for Chinese text by adding BiLSTM layer and CRF layer for joint training.Lastly, the work encompassed graph storage and querying, enabling the KG of acupuncture and tuina to facilitate doctors' understanding of relevant knowledge and give diagnostic and therapeutic advice.
The method proposed in this paper aims to organize and integrate various knowledge and concepts within the acupuncture and tuina domain, culminating in a structured knowledge graph.This facilitates knowledge sharing and communication among experts and researchers from diverse fields, fostering crossdisciplinary collaboration and discourse.Nonetheless, certain limitations persist in this study.For instance, within ERE, the conventional rule-based approach struggles to encompass all text features adequately, posing challenges for migration.In future work, we intend to explore deep learning-based methods for relationship extraction, tailored to the nuances of Chinese text, thereby advancing TCM development.Concurrently, we aim to delve into the capabilities of Large Language Models (LLMs), aspiring to enhance the precision in the extraction of entities and their interrelationships within textual datasets.The ultimate objective of this exploration is to attain zero-shot learning capabilities, thereby significantly advancing the efficacy and adaptability of our models in understanding and processing complex textual information.

FIGURE
FIGURE Flow chart of knowledge graph construction.(A) The flow of knowledge graph construction.(B) Entity extraction.(C) Relationship extraction.

FIGURE
FIGUREAcupuncture and Tuina knowledge graph entity relationship diagram.
We used the above corpus for entity extraction.It's a combination of the Trie tree model and deep learning model, as Search-China Knowledge Network.https://kns.cnki.net/kns/brief/default_result.aspx.Acessed: --.TCM.com_TCM.com_ExcellentChinese Medicine Portal.https://www.zhzyw.com/.Acessed: --.depicted in Figure1Bof a diagram.The Trie tree construction relied on the previously built domain dictionary and primarily extracted known entities.The extraction results from the Trie tree served as the dataset for training the deep learning model.The trained model was then used for a second round of entity extraction, focusing on identifying potential entities that were not initially discovered.Finally, the extraction results from both methods were merged to obtain the final entity set.

FIGURE
FIGUREThe example of a Trie Tree.
When the Trie tree extraction results agree with the model extraction results, the results are fused directly.• When the Trie tree extraction result is "O" and the model extraction result is "B-" or "I-", the fusion result will be based on the model extraction result.• When the Trie tree extraction result is "B-" or "I-" and the model extraction result is "O", the fusion result will be based on the Trie tree extraction result.The specific fusion algorithm is shown in Algorithm 3. Finally, a total of 10346 entities were extracted from all the literature for the fusion model.Among them, 3671 are DIS entities, 3252 are SYM entites, 149 are XW entities, 1330 are OPE entities, 602 are FUN entities.

FIGURE
FIGURELabel words selection.

FIGURE
FIGURETemplateFC model architecture diagram.

FIGURE
FIGURECoronary heart disease text extraction results.
TABLE Examples of various types of entities.
TABLE Example of each type of entity relationship.
TABLE Sample from disease data.
TABLE Sample from acupoint data.
TABLE Trie tree extraction entity sample statistics.

TABLE
Relationship extraction rules.
TABLE Comparison of F score results for di erent resources under two data domains.
Table8presents the results of the proposed method and the baseline approaches across different settings.Here, the bold values represent the highest experimental results in the current dataset.Based on the findings from the table, the following observations can be made.Even in scenarios with abundant resources, our model maintains an advantage of 0.72% compared to the highest-scoring baseline model.This highlights the versatility of our approach, which is applicable across both few-shot and multi-shot scenes.Despite the potential risk of overfitting in multi-shot scenarios, our experimental findings indicate that our model continues to enhance model performance and stability.4. It is important to note that for the last three baselines, we conducted four training runs to obtain four sets of results.Subsequently, we selected the highest F1 score among the results for comparison.Conversely, for the experiments conducted in this paper, we directly decoded the outputs in the multiple-shot scenario.In the few-shot resource scenario, we employed CRF decoding.

•
What are the common therapies for the treatment of coronary heart disease?Acupoints include the Xinshu and Danzhong.