Important Medicinal Plant Families in Thailand

Throughout the world, surveys have been conducted at the country level to answer research questions pertaining to ethnomedicinal usage patterns. This study is focused on Thailand, which has never been surveyed systematically in this way. We mined 16,000 records of medicinal plant use from 64 scientific reports, which were published from 1990 to 2014. In total, 2,187 plant species were cited as being useful for medicinal purposes. The overall aim was to reveal the relative importance of the plant families for pharmacological research. To determine the most important medicinal plant families, we use a combination of three statistical approaches: linear regression, Binomial analysis, and Bayesian analysis. At the regional level, 19 plant families repeatedly stood out as being the most important from an ethnomedicinal perspective.


INTRODUCTION
It is well-documented in the scientific literature that plants have been used for medicinal purposes for the past 60,000 years (Solecki, 1975). Still today, millions of people around the world depend on medicinal plants for their well-being (WHO, 2002). In the tropics, medicinal plants are often used on a regular basis in rural communities where pharmaceuticals are hard to obtain or even unavailable. This is in contrast to westernized societies where medicinal plants are typically used as an alternative or supplement to prescribed medicine (WHO, 2002). Medicinal plants are important for people, not only as a primary source of medicines but also as phytochemical building blocks for development of new drugs (Fabricant and Farnsworth, 2001). It is estimated that 67% of drugs used in chemotherapy are derived from natural products (Wangkheirakpam, 2018). This applies to the discoveries of active compounds such as vincristine (Raviña, 2011), taxol (Fischer et al., 2010), and artemisinin (Tu, 2011). Moreover, medicinal plants also offer an opportunity for rural dwellers to generate a cash income (EL-Hilaly et al., 2003).
There are several factors that influence how people select plants for medicinal purposes: tradition, efficacy, abundance, accessibility, doctrine of signatures, as well as taxonomic affiliation (Bennett and Husby, 2008). Interestingly, some plant families comprise higher proportion of medicinally useful species than expected from the null model of a linear relationship between species diversity and number of medicinally useful species. Traditional use of plants for medicines is not random but determined, to a certain degree, by taxonomic affiliation (Moerman, 1996;Bennett and Husby, 2008). Various statistical tools have been applied to test the relationship between species richness and number of medicinally useful species. In an ethnomedicinal study conducted in North America at the family level, Moerman (1996) plotted the relationship between the number of medicinally useful plant species and the number of native species. Although the classic regression model is a simple and common way to test patterns of medicinal knowledge (Moerman, 1996;Leonti et al., 2003;Bourbonnas-Spear et al, 2005;Douwes et al, 2008;Saslis-Lagoudakis et al., 2011), it suffers from a bias toward large families (Weckerle et al., 2011). Bennett and Husby (2008) used binomial analysis to overcome this bias in their test of the medicinal importance of Ecuadorian plant families. Bayesian analysis was recently introduced as an alternative to reveal over-and under-represented medicinal families (Weckerle et al., 2011). The technique performed particularly well for small data sets and showed similar results to those of the binomial analysis. Here, we combine all three above-mentioned statistical techniques and only refer to a plant family as important if all three statistical methods confirm that it includes a higher number of medicinal species than expected under the null model.
Studies around the world reveal that the medicinal importance of plant families is only partly overlapping across space. In North America, Moerman (1996) showed that Asteraceae, Apiaceae, Ericaceae, Rosaceae, and Ranunculaceae included higher numbers of medicinal species than expected based on their species richness. Likewise, in Ecuador medicinal plant species were overrepresented in Zingiberaceae, Piperaceae, Lamiaceae, Amaranthaceae, Apiaceae, and Costaceae (Bennett and Husby, 2008). In Campania, Italy, a study revealed that medicinal species were over-represented in Lamiaceae, Rosaceae, and Malvaceae. In contrast, the families Orchidaceae, Caryophyllaceae, Poaceae, and Leguminosae included fewer medicinally useful species than suggested by their total species numbers (Weckerle et al., 2011). A recent study from Hawaii revealed that not only Leguminosae, Ericaceae, Malvaceae, Zingiberaceae, and Apocynaceae but also Poaceae and Cyperaceae were the most important medicinal families (Ford and Gaoue, 2017). Similar studies were conducted in Belize (Bourbonnas-Spear et al., 2005), Mexico (Leonti et al., 2003), New Zealand (Saslis-Lagoudakis et al., 2011), Nepal (Saslis-Lagoudakis et al., 2011), and South Africa (Saslis-Lagoudakis et al., 2011. Recently, Leguminosae, Lamiaceae, Euphorbiaceae, Apocynaceae, Malvaceae, Apiaceae, and Ranunculaceae were listed as being the medicinally most important plant families on a worldwide scale (Kew, 2017).
In this study, we analyze and compare the medicinal usefulness of plant families across Thailand. The country boasts a high diversity of both plants and ethnic groups (Pooma andSuddee, 2014 and. Thailand has traditionally been divided in seven phytogeographic regions in accordance with Tem Smitinand's classification (Smitinand, 1958). Based on a meta-analysis of plant species distribution records, van Welzen et al. (2011) concluded that the country should be divided into four phytogeographic regions defined as areas with a "typical, unique, and distinct plant composition": the southern, northern, eastern, and central regions. The overall plant diversity increases toward the Malay Peninsula, which has a less seasonal climate. The biodiversity in Thailand is generally under pressure from human activities, especially farming and urban development.
Within Thailand, more than 80 dialects are spoken, which belong to five linguistic families: Austronesian, Hmong-Mien, Sino-Tibetan, Tai-Kadai, and Austro-Asiatic (Premsrirat, 2004). The majority of the population in Thailand is referred to as local Thai. However, the population also comprises a number of ethnic minorities many of which have their main distribution outside Thailand in countries such as Myanmar (Moken, Kachin, and Taiyai), Yunnan (Mien, Haw, and Lue), and Tibet (Karen, Musue, and Lahu) (Phumthum and Balslev, 2018). It should be noticed that the comprehensive ethnomedicinal data underlying this study are biased toward the ethnic minorities, which have been visited most frequently by ethnobotanical researchers. Although the data may not reflect the average situation at the village level, it does capture all common medicinal plant uses  and provides insights into the diversity of plant used for medicinal purposes, not only in Thailand but also in the neighboring countries.
We will address the following research questions: which are the most important medicinal plant families locally and across Thailand? How do these compare with medicinally important families found elsewhere in the world?

Plant Use Records
Plant use records were extracted from 64 studies reported in the scientific literature (1 book, 2 reports, 29 journal articles, and 32 master and PhD theses; for more details on how we avoided data replication, criteria for including or excluding references, and a list of references, we refer to our previous publications (Phumthum and Balslev, 2018;. Data sources such as pocketbooks, local knowledge, plant labels in parks and gardens, and news articles were avoided since they are often based on inaccurate vernacular names, and they are difficult to reproduce due to lack of metadata. We only included records that cited scientific names and plants that had been identified to the species. To be considered, the data should comply with recognized ethnobotanical collection standards. Most of the scientific reports underlying this study cited voucher specimens deposited in herbaria across Thailand. In cases where journal articles had been published based on Master and PhD studies i.e., Srithi (2012), Srithi et al. (2012a), and Srithi et al. (2012b), we extracted the data strictly from the original thesis. We used data from studies conducted from 1990, when systematic ethnobotanical exploration began in Thailand, until 2014. All Masters and PhD theses were accessed via the website of the Thai Library Integrated System (https://www.thailis.or.th/tdc), which includes all Thai institutes of higher education. To gain insight into the variation of the most important medicinal plant families (MIMFs) at the subregional level, we divided the dataset in seven subsets based on geographic location of the villages in accordance with Phumthum and Balslev (2018).

Plant Diversity
The Thai Plant Names book provides list of all plant species in Thailand (Pooma and Suddee, 2014). We updated the taxonomy using The Plant List website (www.theplantlist.org) and follow the family names used on the Angiosperm Phylogeny website (www.mobot.org). We identified no discrepancies except for the families Lamiaceae and Leguminosae on The Plant List website, which were named Labiatae and Fabaceae on Angiosperm Phylogeny website. We followed the former. The complete version of the list after updating included 9,793 plant species representing totally 276 families in Thailand. To avoid bias and random noise, we removed from the dataset all families with less than 10 species in Thailand. This reduced the number of families included in our analysis to 115 (Table 1) and the number of species to 9,097.
To investigate the geographic variation in MIMFs we used estimates of regional family diversity. In cases where the species diversity within a family was unknown for a specific region, we used instead the corresponding figure for the entire country as a conservative approach.

Linear Regression and R Values
We used regression residuals to analyze a simple plot of the number of medicinally useful species against the total number of species in a family. It should be noted that this analytical approach when used for analyzing medicinal floras typically violates the statistical assumptions of homescedasticity and normality (Bennett and Husby, 2008).
In this study, we coined a new metric that we refer to as relative regression residual (R). It is defined as the relative difference between the observed number of medicinally useful species (O) and predicted number of medicinally useful species under the model (P): R value equals zero (0) when there is a perfect fit between predicted and observed data; values below zero (-R) implies that a given family had fewer medicinal species than expected; and values above zero (+R) implies that a family had more medicinally useful species than expected. To filter random noise, which could be problematic in data sets assembled from many sources, we introduced a critical range of R values from -0.5 to 0.5 within which we considered deviation from the model of no consequence. Families with R values above 0.5 we refer to here as MIMFs.

Binomial Analysis
We used the BINOMDIST function in Microsoft Excel to conduct a binomial analysis in accordance with the method described by Bennett and Husby (2008). The dataset was identical to the one used for linear regression. Families with higher numbers of medicinal species than expected from the model were identified as MIMFs. Based on the probability that the actual number of medicinal species is equal to or lower than the number of expected medicinal species (a) and the probability that the number of medicinal species is equal to the number of expected medicinal species (b), we calculated a 95% interval probability (p) that the number of medicinal species is more than the number of expected medicinal species as: We considered all plant families with 95% interval probabilities less than 0.05 as MIMFs.

Bayesian Analysis
We conducted a Bayesian analysis according to the principles laid down by Weckerle et al. (2014) using the BETA.INV function in Microsoft Excel. We considered all plant families with an inferior 95% probability credible interval higher than the one calculated for all species in Thailand (0.2362) as MIMFs in this analysis.

Regression analysis
At the national scale, the modeled linear relationship between the number of medicinal species (P) and all species (T) within a given family was: The model only explained 37% of the variability in the data (R 2 = 0.37) (Figure 1).
Although there was a positive relationship overall between the number of medicinally useful plants within a family and the number of species it comprises, it should be noted that some of the larger families actually contained fewer medicinal plants than some of the smaller families. A number of families had no record of medicinally useful species at all and consequently scored -1 for their R values. This applied to Thelypteridaceae, Hymenophyllaceae, Podostemaceae, Eriocaulaceae, Hydrocharitaceae, Burmanniaceae, and Cupressaceae (Table 1). Using the relative regression residual (R), we identified a total of 22 MIMFs across Thailand. The families with the highest R score were in decreasing order: Asteraceae (R = 2.7), Leguminosae (R = 1.5), Annonaceae (R = 1.3), Lamiaceae (R = 1.2), Rutaceae (R = 1.2), and Cucurbitaceae (R = 1.1).

Binomial analysis and Bayesian analysis
The binomial and Bayesian analyses identified the same 27 MIMFs ( Table 1). The families Rhamnaceae, Ranunculaceae, Dilleniaceae, Apiaceae, Connaraceae, Oxalidaceae, Sapindaceae, and Thymelaeaceae were not identified by linear regression analysis as MIMFs. On the contrary, the families Rubiaceae, Zingiberaceae, and Moraceae, which were included in the MIMFs resulting from the linear regression analysis, were not among the MIMFs identified by the binomial and Bayesian analyses ( Table 1).

Combined analyses
At the national level, 19 plant families were identified as MIMFs by all three statistical approaches ( Table 1). Fifteen of these were shared across all villages studied Thailand: Asteraceae, Leguminosae, Combretaceae, Cucurbitaceae, Rutaceae, Menispermaceae, Lamiaceae, Amaranthaceae, Malvaceae, Solanaceae, Phyllanthaceae, Apocynaceae, Euphorbiaceae, Vitaceae, and Acanthaceae. The families Annonaceae, Polygonaceae, Araliaceae, and Anacardiaceae were included among the MIMFs at the national level but not shared across all villages.

Variation in MIMFs across Regions
The regression-residual analysis revealed substantial variation in the MIMFs across the seven regions as defined by Phumthum and Balslev (2018) ( Table 2). The highest number of MIMFs was found in the northeastern region (27) whereas the lowest number was recorded in the southeastern region (20). In the southwestern, the northern, the central, the peninsula, and eastern regions we recorded were 25, 24, 23, 23, and 21 MIMFs, respectively. The families Leguminosae, Euphorbiaceae, Apocynaceae, Malvaceae, and Amaranthaceae were shared among the MIMFs of all the regions. The families Rutaceae, Lamiaceae, Menispermaceae, Asteraceae, Combretaceae, Acanthaceae, Phyllathaceae, and Rubiaceae appeared among the MIMFs in six regions, whereas Zingiberaceae, Moraceae, and Sapindaceae appeared in five regions. Nineteen families were recorded as MIMFs in at least four regions. The total list of MIMFs across the seven regions combined included 51 plant families.
The binomial and Bayesian analyses identified the 44 MIMFs in total at the regional level. None of these families were shared across all seven regions. Rutaceae, Leguminosae, Menispermaceae, Euphorbiaceae, and Combretaceae were identified as MIMFs in six regions; Sapindaceae, Amaranthaceae, and Solanaceae, in five regions; Lamiaceae, Malvaceae, and Phyllanthaceae, in four regions ( Table 3).

Combining statistical Tools in the evaluation of the Medicinal Importance of Plant Families
The three statistical approaches used here to evaluate MIMFs at the national and regional levels in Thailand have all been used previously for analyzing medicinal floras in other parts of the world, but never in combination. Similar to Bennett and Husby (2008) and Weckerle et al. (2014), we found that the Binomial and Bayesian analyses are more sensitive to small families. By using these two statistical methods, we added the plant families  (Table 1). Interestingly, the R values were biased toward the larger families and identified Zingiberaceae (290 spp.) and Rubiaceae (416 spp.) as MIMFs, whereas the Binomial and Bayesian analyses did not ( Table 1). This result confirms previous findings that these two families are high in numbers of medicinal use reports (691 and 608 respectively) and in use value scores (5.71 and 5.02, respectively) .    Despite the fact that the three statistical approaches applied here to estimate medicinal importance of plant families all have their strength and weaknesses, our results show remarkable overlap in the results ( Table 1). This applies especially to the national scale where we obtained robust estimates of the MIMFs due to large sample sizes. In cases where sample sizes are moderate, e.g., at smaller spatial scales, it is particularly important to combine statistical approaches to avoid inherent methodological biases.
The number of MIMFs identified by the three statistical approaches varied between 20 and 27 across the regions ( Table 2). Of the three methodological approaches, the binomial analysis generally identified fewer . The highest number was recovered in the northern region, the lowest   Continued) in the southeastern region. The fact that none of the families were estimated as MIMF throughout all regions (Table 3) is an artifact of missing information, viz., when the species diversity of a family was unknown for a specific region, we substituted it with the family diversity for the entire country. Since we used the same datasets throughout the analyses, this cannot, however, explain the differences between the three analytical approaches. The use of R values at the regional scale interestingly resulted in uniform results across scales, as for both the number and overlap of MIMFs are concerned (Tables 1 and 2).  (Table 4).
Within Thailand, it is noticeable that the list of MIMFs from the different regions is so similar despite the fact that the regional floras are composed of different phytogeographic elements. This is particularly obvious, when the northern and the peninsular regions are compared. The northern region contains a Indo-Burmese phytogeographic element, whereas the peninsular region has a pronounced Malayan element (van Welzen et al., 2011). Fifteen plan families were shared between the list of MIMFs identified from northern region (27) and the peninsular region (23), respectively: Leguminosae, Euphorbiaceae, Apocynaceae, Malvaceae, Amaranthaceae, Rutaceae, Lamiaceae, Menispermaceae, Asteraceae, Phyllanthaceae, Rubiaceae, Zingiberaceae, Moraceae, Solanaceae, and Vitaceae. This corresponds to a 60% overlap ( Table 2). Our results indicate that although there are pronounced regional differences in the flora throughout Thailand, the MIMFs remain remarkably similar.

CONClUsION
This study fills a gap in our ethnomedicinal knowledge about Thailand and the Southeast Asian region in general. By using three different analytical approaches, we recovered the MIMFs in Thailand and revealed substantial differences at the regional level. Ten of the MIMFs in Thailand are also listed as important elsewhere in the world: Asteraceae, Leguminosae, Rutaceae, Lamiaceae, Malvaceae, Solanaceae, Apocynaceae, Euphorbiaceae, Araliaceae, and Anacardiaceae. We consider these families particularly promising for the pharmacological industry because their widespread use is most probably due to their content of physiologically active compounds.
We recommend that statistical approaches are combined especially when dealing with small to moderate sample sizes to avoid inherent methodological biases. It should be noted that regression residuals as used in this study to identify MIMFs apparently are less sensitive to artifacts caused by unequal family sizes. Differences in the MIMFs identified across the regions in Thailand are determined by both floristic and cultural factors. Recovering ethnomedicinal usage patterns in Thailand and elsewhere will inform natural resource management and guide pharmacologists in their constant search for chemical compound with hitherto, unknown physiological effects, which can be used as a starting point for developing new drugs.

DaTa aVaIlaBIlITY sTaTeMeNT
All datasets generated for this study are included in the manuscript/supplementary files.

aUThOR CONTRIBUTIONs
MP conceived the idea, collected and analyzed data, and wrote the manuscript. HB supervised the research and wrote the manuscript. AB supported the analyses, wrote the manuscript, and supervised the research.

FUNDINg
The study was supported by a grant to HB from the Carlsberg Foundation to the Flora of Thailand research project.

aCKNOWleDgMeNTs
We gratefully acknowledge the great devotion of our Thai colleagues who conducted the ethnobotanical fieldwork and assembled the data underlying this study.