Host-pathogen relationship in retreated tuberculosis with major rifampicin resistance–conferring mutations

Introduction It is assumed that host defense systems eliminating the pathogen and regulating tissue damage make a strong impact on the outcome of tuberculosis (TB) disease and that these processes are affected by rifampicin (RIF) resistance–conferring mutations of Mycobacterium tuberculosis (Mtb). However, the host responses to the pathogen harboring different mutations have not been studied comprehensively in clinical settings. We analyzed clinico-epidemiological factors and blood transcriptomic signatures associated with major rpoB mutations conferring RIF resistance in a cohort study. Methods Demographic data were collected from 295 active pulmonary TB patients with treatment history in Hanoi, Vietnam. When recruited, drug resistance–conferring mutations and lineage-specific variations were identified using whole-genome sequencing of clinical Mtb isolates. Before starting retreatment, total RNA was extracted from the whole blood of HIV-negative patients infected with Mtb that carried either the rpoB H445Y or rpoB S450L mutation, and the total RNA was subjected to RNA sequencing after age-gender matching. The individual RNA expression levels in the blood sample set were also measured using real-time RT-PCR. Logistic and linear regression models were used to assess possible associations. Results In our cohort, rpoB S450L and rpoB H445Y were major RIF resistance–conferring mutations [32/87 (36.8%) and 15/87 (17.2%), respectively]. H445Y was enriched in the ancient Beijing genotype and was associated with nonsynonymous mutations of Rv1830 that has been reported to regulate antibiotic resilience. H445Y was also more frequently observed in genetically clustered strains and in samples from patients who had received more than one TB treatment episode. According to the RNA sequencing, gene sets involved in the interferon-γ and-α pathways were downregulated in H445Y compared with S450L. The qRT-PCR analysis also confirmed the low expression levels of interferon-inducible genes, including BATF2 and SERPING1, in the H445Y group, particularly in patients with extensive lesions on chest X-ray. Discussion Our study results showed that rpoB mutations as well as Mtb sublineage with additional genetic variants may have significant effects on host response. These findings strengthen the rationale for investigation of host-pathogen interactions to develop countermeasures against epidemics of drug-resistant TB.


Introduction
Management of tuberculosis (TB) has been challenged by increased resistance to anti-TB drugs, especially rifampicin (RIF), which has long been the backbone of the standard short-course chemotherapy against Mycobacterium tuberculosis (Mtb). In 2021, the estimated proportion of people with TB who had RIF-resistant TB with or without isoniazid-resistance was 3.6% among new cases and 18% among those previously treated, which continues to be public health threat. 1 Globally, it is estimated that transmission of multi-drug resistant (MDR) TB strains is responsible for over 70% of MDR-TB cases (Kendall et al., 2015).
RIF binds to the beta-subunit of RNA polymerase (RpoB) and kills bacteria by inhibiting their critical RNA synthesis (Chauhan et al., 2021). In Mtb, a variety of RIF resistance-conferring mutations are located in the 81-bp RIF resistance-determining region (RRDR) of the rpoB gene, corresponding to codons 426-452 of the betasubunit. Within the RRDR, amino acid substitutions at codons 450 and 445 are enriched and are associated with high-level resistance to RIF (Unissa and Hanna, 2017).
Vietnam is one of the 30 countries with high TB burden, with an estimated incidence rate of 173 (112-247) per 100,000 population, and also one of the 30 countries with high MDR/RIF-resistance TB burden, with 8,900 patients estimated to have MDR-TB in 2021 (see footnote 1). In Vietnam, the most prevalent RIF-resistant mutations were found at codon 450 (37.8%), followed by codon 445 (23.0%) in the RRDR of the rpoB gene of 74 strains (Minh et al., 2012). Regarding the genetic background of Mtb in Vietnam, Beijing strains belonging to lineage 2 of Mtb are mainly spreading in the urban areas (Maeda et al., 2014).
The rpoB mutations affect bacterial biological processes via the functional modulation of RNA polymerase differently, and thus may regulate the host response. The rpoB H445Y mutation in Mtb is associated with an increased transcription elongation rate (Stefan et al., 2018), which is also the case for equivalent Escherichia coli mutations (Rasouly et al., 2021). In the earlier reports, a panel of clinically common rpoB mutations has been reported to upregulate the phthiocerol dimycocerosate biosynthetic pathway (Bisson et al., 2012) and to change the Mtb cell wall lipid composition (Lahiri et al., 2016). According to a recent report (Howard et al., 2018), macrophage activation occurs via aerobic glycolysis when infected with Mtb strains harboring rpoB S450L mutation in mice, whereas H445Ycarrying strains preferentially induce interferon (IFN)-β, driving less effective aerobic glycolysis, presumably through the mutationmediated modulation of mycobacterial lipid components. It is widely accepted that during the early phase of Mtb infection, the pathogen provokes IL-1β production and macrophage glycolytic reprogramming with decreased oxidative phosphorylation that act toward bacterial elimination (Collins et al., 2021;Silvério et al., 2021). However, Mtb escapes into cytosol using the mycobacterial ESX-1 system, induces IFN-β production and inhibits aerobic glycolysis with IL-1β induction (Howard and Khader, 2020;Olson et al., 2021). If the bacteria survive and persist by migrating into the lung interstitial tissue that leads to the granuloma formation, maintenance of aerobic glycolysis in surrounding macrophages may rather contribute to immune pathology with detrimental effects (Sheedy and Divangahi, 2021). It is thus assumed that host defense systems eliminating the pathogen and regulating tissue damage make a strong impact on the outcome of TB disease and that these processes are affected by different RIF mutations in clinical settings. In this context, it is imperative to investigate the overall effects of drug-resistant Mtb strains in TB patients (Allué-Guardia et al., 2021).
For this approach, we should also consider that the effects of drug resistance mutations are complicated by their secondary genomic change, including well-known compensatory mutations (Alifano et al., 2015;Stefan et al., 2018) and other mutations that regulate the early resumption of bacterial growth after drug exposure. The latter ones have recently been reported as "antibiotic resilience" mutations, and associated with poor treatment outcome .
In addition, gene expression signatures induced by type I and/or II IFNs are considered as blood-based biomarkers of host response in TB. Among these, basic leucine zipper ATF-like transcription factor 2 (BATF2) is frequently identified as a key gene (Singhania et al., 2018). BATF2 induces inflammatory responses in lung recruited macrophages, which in turn leads to deleterious inflammation and contributes to TB disease progression in a murine model (Guler et al., 2019).
In this study, we thus hypothesized that Mtb strains with different rpoB mutations may result in different host responses, with the possible involvement of Mtb genetic factors or IFN signaling pathway. We described the characteristics of the RIF resistance-conferring mutations observed in sputum smear-positive active TB patients who Frontiers in Microbiology 03 frontiersin.org had previously received TB treatment in Hanoi, Vietnam, and assessed the relationship between clinico-epidemiological and pathogen factors. We further investigated transcriptomic signatures in peripheral blood obtained from patients infected with Mtb harboring major rpoB mutations, H445Y or S450L, using differentially expressed genes (DEG) analysis for significantly up-or downregulated genes and gene set enrichment analysis (GSEA) to characterize differences in genome-wide expression profiles between the groups (Subramanian et al., 2005), and discussed the clinical relevance of these genomic variations of Mtb to the host response.

Ethics statement
This study was approved by the Ethical Committees of the Hanoi Department of Health, Vietnam, the National Center for Global Health and Medicine, and the Research Institute of Tuberculosis, Japan Anti-Tuberculosis Association, Japan (RITIRB29-35). Written informed consent was obtained from all patients.

Study sites, patient recruitment, and sample collection
In our prospective study on recurrent TB, we included 11 of 12 urban districts in Hanoi city and Hanoi Lung Hospital, a referral hospital for all TB patients in Hanoi, as the catchment area, where more than 60% of TB patients in the city were diagnosed and treated during the study period. Patients were considered eligible if they were aged 18 years or older, resided in the catchment area, self-declared that they had suffered from TB accompanied with TB treatment courses, currently diagnosed with smear-positive pulmonary TB, and agreed to participate in this study. Eligible patients who visited the local TB care units and Hanoi Lung Hospital were recruited consecutively from December 2012 to November 2015.
Information about previous TB treatment, including completion of the standard regimen and registration to the National TB Program in Hanoi city, was based on interviews conducted by pre-trained healthcare staff. Before initiating anti-TB treatment, sputum specimens were cultured and subjected to identification of Mtb, DNA extraction for molecular typing, and whole-genome sequencing. The bacterial load in sputum smear and affected lesions on chest X-ray were assessed to estimate the severity of the disease.
Whole-genome sequencing of Mycobacterium tuberculosis DNA samples DNA samples were extracted from Mtb strains using an Isoplant kit (Nippon Gene, Tokyo, Japan) and analyzed using Illumina HiSeq and MiSeq systems (Illumina, San Diego, CA, United States). For HiSeq 2500, libraries were prepared using a TruSeq DNA PCR Free Sample Prep Kit (Illumina), and paired-end (2 × 150 bp) sequencing was performed. For Miseq, libraries were prepared with a QIAseq FX DNA Library Kit (QIAGEN GmbH, Hilden, Germany), and paired-end (2 × 250 bp) sequencing was used.
Sequence reads were trimmed and mapped to the H37Rv genome (AL123456.3) using BWA-MEM 0.7.15. 2 After excluding severely contaminated samples, variant calling was performed with a Genome Analysis Toolkit (GATK version 3.7). 3 Drug resistance-conferring mutations, including small indels and lineage-specific variations, and major sublineages were identified using TB-Profiler version 4.3. 4 Ancient and modern Beijing sublineages were indicated by the absence and presence of the C1477596T (ogt12) mutation, a surrogate for a copy of IS6110 in the NTF region (Mestre et al., 2011), and further classified using a recently proposed SNV-based genotyping scheme for lineage 2 strains (Thawornwattana et al., 2021). Pairwise single-nucleotide variation (SNV) differences were calculated using a list of concatenated variants with an in-house Python script described in a previous study (Maeda et al., 2020).

Host RNA sequencing and downwards analyses
A volume of 2.5 mL of peripheral blood was collected from each patient into a PAXgene blood RNA tube (PreAnalytiX, Hombrechtikon, Switzerland) before starting retreatment, and total RNA was extracted using a PAXgene Blood RNA Kit (QIAGEN). RNA sequencing was performed for six HIV-negative age-gender matched pairs of patients infected with Mtb strains that harbored either the rpoB S450L or rpoB H445Y mutation. Libraries were prepared with TruSeq Stranded mRNA Library Prep (Illumina), and a QIAseq FastSelect Globin Kit (QIAGEN) was combined to avoid generations of reads from globin mRNAs. Sequencing (2 × 75 bp) was performed in NextSeq500 (Illumina). Pair-end reads were aligned to human reference genome GRCh38.primary_assembly.genome.fa with gencode.v38.primary_assembly.annotation.gtf 5 using STAR aligner v2.7.9a. 6 Mapped reads associated with each gene were counted using featureCounts v2.0.1 (Liao et al., 2014).

Principal component and hierarchical clustering analysis
To identify possible groups within our dataset of patients' blood mRNA signatures, hierarchical clustering on principal components was performed with the Ward's criterion on the selected principal components and K-means for further clustering to improve the initial partition obtained from hierarchical clustering using the FactoMineR v2.4 and factoextra v1.0.7 packages (Lê et al., 2008).
Extraction of differentially expressed genes from RNA-seq data DEGs were analyzed using an R package, DESeq2 v1.26.0 (Love et al., 2014) with adjusted p values of <0.1 set as the default. Comparison of host mRNA expression in peripheral blood between patient groups infected with Mtb harboring rpoB S450L and rpoB H445Y was made using generalized linear models, assuming a negative binomial distribution in DESeq2, which enabled us to analyze pairs of samples before and after controlling for major clusters generated from gene expression patterns using the hierarchical clustering on principal components analysis method. Gene expression levels were visualized with heatmaps drawn with the pheatmap v1.0.12 or heatmap.2 packages in gplots v3.1.1.1. 8

Functional enrichment analysis
To further determine the characteristics of each cluster or of a gene set possibly showing differential expression between H445Y and S450L groups, functional enrichment analysis was performed using up-to-date annotation systems, Gene Ontology (GO) and the Kyoto Encyclopedia of Genes and Genomes. Significant GO terms and signal pathways were screened when the p value was <0.05.

Gene Set enrichment analysis
GSEA was performed using an R function in clusterProfiler v4.2.0 (Wu et al., 2021). Expression datasets by whole genes were ranked using a metric: −log10(p value)*sign(log2FC). The pre-ranked gene list and phenotype information (rpoB S450L or rpoB H445Y) were uploaded to GSEA for enrichment analysis with default parameters. The pre-defined hallmark gene sets in the Molecular Signatures Database (MSigDB, Broad Institute, Cambridge, MA, United States) were selected for analysis (Subramanian et al., 2005). The false discovery rate was used to correct for multiple comparisons (Benjamini et al., 2001).

Measurement of RNA expression levels of functional genes by real-time RT-PCR
Total RNA was reverse-transcribed with SuperScript III reverse transcriptase (Thermo Fisher Scientific, Waltham, MA, United States) with random nonamers (TaKaRa, Shiga, Japan), and cDNA was subjected to qPCR using TaqMan ® GeneExpression Assays (Thermo Fisher; Supplementary Table S1) using StepOne Plus (Thermo Fisher). GAPDH was used to normalize the values of the target gene expression using the ΔΔCt method, and the relative expression level of each gene was calculated to a fixed control complementary DNA throughout the study.

Statistical analyses
Chi-squared or Fisher's exact test was performed to compare the frequencies of events, and Wilcoxon's rank-sum test was used to compare 8 https://github.com/talgalili/gplots non-parametric distributions between the groups. Possible associations between Mtb rpoB mutations and other factors were studied using logistic and linear regression models. Changes in host gene expression following Mtb genetic variants were studied using univariate and multivariate linear regression models. The analyzes were performed using STATA version 16 (StataCorp LLC, College Station, TX, United States). All factors used in the univariate analysis, including Mtb sublineages, were added to the multivariate model for adjustment due to their biological importance. p values less than 0.05 were considered statistically significant.

Characteristics of the study population
In total, 295 adult patients with one or more episodes of treatment for active pulmonary TB were recruited. Their median age was 46.8 years (interquartile range IQR 37.1-54.8), and 87.8% were male. HIV-positive patients accounted for 3.1% (9/295) of the cohort. Of the recruited patients, 83.7% (247/295) had received a single treatment episode, and 16.3% (48/295) had received more than one episode of treatment in the past. The median interval between the initial and current episode was 3 years (IQR 1-6), and treatment discontinuation during the initial episode was observed in 14.2% (42/295) of patients due to individual reasons, including patient's self-discontinuation or drug adverse effects (Supplementary Table S2).

Epidemiological findings associated with different rpoB mutations rpoB mutations and Mycobacterium tuberculosis genotypic clusters
The proportions of genotypic clusters defined by <6 pairwise SNV differences were different depending on the types of rpoB mutations (p = 0.020; Supplementary Table S5). When using logistic regression models, H445Y was significantly associated with clustered strains defined by <6 SNVs in both univariate and multivariate analysis after adjustment for patient age, gender, BMI, and Mtb lineages/sublineages ) and adjusted OR = 11.53 (2.39-55.65)], whereas S450L and other rpoB mutations were not ( Table 3). The association of H445Y remained significant, and ancient Beijing strains were also associated with the clusters when defined by <12 SNVs (Table 3).

rpoB mutations and host background, including episodes of previous tuberculosis treatment
We further investigated patient characteristics associated with the two major rpoB mutations, S450L and H445Y. The proportions of patients with multiple episodes of anti-TB treatment were significantly different between these groups; 46.7% (7/15) for H445Y, 21.9% (7/32) for S450L, 12.5% (5/40) for others, and 10.2% (13/128) for wild-type rpoB (p = 0.001; Supplementary Table S6). The proportion of patients with a BMI <17.0, indicating moderate or severe underweight, were also significantly different between these groups: 60.0% (9/15) for rpoB H445Y, 18.8% (6/32) for S450L, 35.0% (14/40) for others, and 32.0% (41/128) for wild-type (p = 0.045), whereas there were no significant differences in patients' age, history of BCG vaccination, clinical manifestations assessed by chest X-ray, sputum smear and culture grades, HIV status, and the interval between the initial and current TB episodes between these groups (Supplementary Table S6). When using logistic regression models, H445Y, but not S450L, was associated with multiple episodes of TB treatment, either by univariate analysis (OR = 6.13, 95% CI 1.54-24.37) or multivariate analysis (adjusted OR = 6.74, 95% CI 1.53-29.72) after adjustment for patients' age, gender, BMI, Mtb lineage/sublineage, or completion of the initial treatment, when "other rpoB mutations" was set as a reference group (Table 4). Similar results were obtained when "no rpoB mutations" was set as a reference group (Table 4). To assess whether different rpoB mutations in Mtb isolates are associated with host gene expression, transcriptomic analysis was performed for blood samples from six patients with Mtb harboring H445Y and six age-gender-matched patients with Mtb harboring S450L. Initially, individual sample diversity, which may act as a confounder for the target effects, was estimated using principal component analysis (PCA) of the expression levels of whole genes in the whole blood (Supplementary Figure S1A). The distribution of samples on these two dimensions are shown in Supplementary Figure S1B. Hierarchical clustering further performed on PCA revealed three major clusters (Supplementary Figures S1B,C). A heatmap of log-CPM values for the top 100 genes showing the largest variance indicated the same three clusters (Supplementary Figure S2).
To characterize the gene expression diversity of the blood samples, which may be a confounder in the further analysis, the proportions of 22 blood cell types were estimated by CIBERSORTx, as shown in Supplementary Figure S3. The scores of neutrophils and activated memory CD4 T cells in the absolute mode were highest in cluster 3, followed by clusters 1 and 2 (Supplementary Table S7). No difference in cell type was observed between the H445Y and S450L groups (data not shown). Neutrophils directly measured in the whole blood were also predominant in cluster 3 over clusters 1 and 2 [median = 10.1 G/L (IQR 9.0-10.8) vs. 5.9 (5.5-6.6) vs. 5.8 (5.6-6.3), p = 0.0431; Supplementary Table S7].
We initially compared the expression profile of genes in patients infected with Mtb harboring the H445Y mutation with that of the S450L group, without considering aforementioned individual diversity. A heatmap of log-CPM values and read counts in bar graphs for the top 20 DEGs were constructed (Figure 1; Supplementary Figure S4, respectively). Supplementary Table S8 shows the detailed list of DEGs: S100P, FUNDC2, and TRAV1-2 were the top three DEGs ranked with p values (adjusted p values = 0.0022, 0.0022, and 0.0364; log2FC = −2.52, −1.91, and − 1.98, respectively). After adjustment for the PCA-based clusters 1, 2, and 3 as a possible confounding factor, using generalized linear models in the DESeq2 analysis, 50 genes showed differences, of which 33 genes were downregulated in H445Y or upregulated in S450L. Of these, S100P was the only gene that had an adjusted p value of <0.05 (adjusted p values = 0.0396; log2FC = −2.91), followed by TRAV1-2, with adjusted p value of <0.1 despite the individual diversities.
Pathway enrichment analysis using the Kyoto Encyclopedia of Genes and Genomes system did not show significant results to characterize the above DEGs, whereas GO analysis to identify biological processes indicated that the downregulated genes in the H445Y group were constituent genes of pyrimidine metabolism and/or antiviral response pathways, many of which are also known as IFN-stimulated genes (Supplementary Figure S5; Supplementary Table S9).

Differences in genome-wide expression profiles between different rpoB mutations
To identify gene sets showing differences in expression between the H445Y and S450L groups, GSEA was performed on the whole set of expressed genes ranked by their differential expression. GSEA showed that two hallmark gene sets, "hallmark_interferon_gamma_ response" and "hallmark_interferon_alpha_response, " had the lowest normalized enrichment scores with significant p values in H445Y compared with S450L after the adjustment for the PCA-based clusters (Figure 2; Supplementary Table S10) and before the adjustment (data not shown). Figure 3 shows the networking between these gene sets. Enrichment plots of the top two gene sets are shown in Supplementary Figure S6. By leading edge analysis, the proportion of core genes that accounted for the enrichment signals of these two gene sets were 107/200 (53.5%) and 60/97 (61.9%), respectively, of which 47 genes were shared by the two sets (Supplementary Table S10). Frontiers in Microbiology 08 frontiersin.org

Differences in the individual gene expression levels between different rpoB mutations
Because the GSEA indicated that the expression levels of IFN-α and-γ response gene sets were the lowest in H445Y compared with S450L, and because elevated expression levels of IFN response genes are associated with active TB disease (Singhania et al., 2018), six IFN-responsive genes, BATF2, SERPING1, UBE2L6, VAMP5, IFI44, and STAT1 in the hallmark gene sets (Supplementary Table S10), reported to be associated with TB (Gong et al., 2021), were chosen for validation using qRT-PCR of individual blood samples. S100P showed the best p value, as the DEG, was also included for the measurements. Among these, the expression levels of BATF2 and SERPING1 were significantly lower in the H445Y group than in the S450L group [medium and IQR = 0.48 (0.32-0.58) and 0.73 (0.39-0.82) for BATF2, p = 0.0328; and 0.60 (0.43-0.86) and 1.14 (0.71-1.70) for SERPING1, p = 0.0378; Supplementary Figure S7]. The expression levels of the rest of the genes also tended to be lower in the H445Y group compared with the S450L group, although the differences did not reach the significance levels (Supplementary Figure S7).

Epidemiological findings associated with interferon-inducible genes in the blood
Associations with genotypic clusters and multiple treatment episodes

Associations with disease severity
All patients were divided into two groups of disease severity, categorized by the distribution of infiltrates on chest X-ray: mild disease, with lesions limited to <3/6 lung zones, and severe disease  Heatmap of log-CPM values for the top 20 genes differentially expressed between the H445Y and S450L groups. Expression across each gene (each row) has been scaled so that the mean expression is zero. Samples with relatively high expression of a given gene are marked in red; samples with relatively low expression are marked in blue. DEG, Differentially expressed gene.

FIGURE 2
Bar plot showing the 20 most significantly enriched gene sets after adjustment for clustering, obtained by gene set enrichment analysis. Colored bars represent adjusted p values. NES, normalized enrichment score.  Figure S8]. In the mild disease group, such significant differences were not observed in all markers tested (Supplementary Figure S8). Other indicators of severity, cavity, smear grade, or culture grade were not significantly different between the two mutation groups (data not shown).

Discussion
In our cohort of sputum smear-positive pulmonary TB patients with history of TB treatment, rpoB mutations, particularly rpoB S450L and rpoB H445Y, were accumulated with high frequencies. rpoB H445Y was associated with the ancient Beijing genotype, being clustered and more frequently observed in patients who had received more than one TB treatment episode, whereas S450L and other rpoB mutations were not. H445Y was also associated with Rv1830 mutations. By transcriptome analysis of the whole blood from patients before starting retreatment, gene sets involved in the IFN-γ and IFN-α pathways were significantly downregulated in H445Y compared with S450L. The qRT-PCR analysis of the whole blood sample set confirmed that H445Y was inversely associated with IFN-inducible genes, BATF2 and SERPING1, as compared with S450L, even after adjustment for Mtb lineages/sublineages and other possible confounding factors. These findings represented a clinical extension of a recent observation in mouse models, which showed that Mtb strains harboring RIF resistance-conferring mutations modulate host defense systems (Howard et al., 2018).
In our study, a majority of rpoB mutations were identified at codons 450 and 445, similar to those reported worldwide (Zaw et al., 2018) and in Vietnam (Caws et al., 2006;Shah et al., 2009;Nguyen et al., 2017;Hang et al., 2019), whereas their distribution differed depending on the Mtb genetic background: modern Beijing strains harbored H445Y much less frequently than S450L, and ancient Beijing strains had H445Y and S450L equally. These findings were consistent with another study that reported extensively drug-resistant TB in China (Zhao et al., 2015).
Among the two major sublineages of Beijing genotype strains, the modern Beijing genotype is predominant in East Asia, widespread in other regions, and frequently associated with genotypic clusters (Emane et al., 2021), indicating recent spread, possibly reflecting higher transmission capability and/or host adaptation (Walker et al., 2013), whereas the ancient Beijing sublineage is limited to smaller Networking of three out of five hallmark gene sets with strongest significant adjusted p values obtained by gene set enrichment analysis after adjustment for clustering.
Frontiers in Microbiology 11 frontiersin.org regions or countries such as Japan (Seto et al., 2015), Korea (Kang et al., 2010), or Vietnam (Maeda et al., 2020). In our current cohort of TB patients with treatment history in the northern area of Vietnam, the ancient Beijing sublineage, particularly L2.2.AA.3.1, was prevalent and was significantly associated with genetic clusters defined by <12 SNVs, but not <6 SNVs, presumably indicating that this sublineage has spread widely for a long time in the study area, rather than very recently. This tendency was observed even in our previous study on new TB patients in Hanoi (Hang et al., 2019;Maeda et al., 2020). The H445Y mutation was an independent risk factor for genotypic clusters defined by <6 SNVs and more than one TB treatment episode in our study, indicating patients infected with Mtb strains carrying this mutation are not easily cured and tend to transmit to others. Although the low BMI observed in patients carrying Mtb with the H445Y mutation might be explained by exhaustion of the host caused by repeated TB episodes, the causeeffect relationship is not clear from this study. The success of RIF-resistant strains seems to depend upon the level of resistance, the initial fitness cost, the propensity to accumulate compensatory mutations to alleviate the fitness costs, and recently proposed mutations to regulate recovery from bacterial damage caused by drug exposure, designated as "antibiotic resilience" . In our study, Mtb strains harboring S450L had a high frequency of compensatory mutations in rpoA, rpoB or rpoC, consistent with another report (Emane et al., 2021), although compensatory mutations do not increase the risk of MDR transmission according to recent reports (Liu et al., 2018;Chen et al., 2022). H445Y-carrying strains were not associated with those compensatory mutations in our study, while coexistent nonsynonymous mutations in Rv1830, also named resR or mcdR, may promote "antibiotic resilience" and result in the prolonged survival of transmissible bacteria under stress conditions including antibiotic treatment Zhou et al., 2022). Indeed, Liu et al. (2022) reported that poor treatment outcome was associated with Rv1830 mutations. In addition, the independent association between Rv1830 mutations and H445Y may imply a possible epistatic interaction, which could play a role in the emergence and spread of drug-resistant Mtb (Borrell and Gagneux, 2011). A future study design to assess the treatment response to Rv1830 mutations and the contribution of epistatic mutations to competitive fitness could shed light on their impact on patient prognosis and drug resistance TB.
Before analyzing blood mRNA signatures in TB patients, we considered individual diversities. PCA of whole gene expression patterns exhibited three hierarchical clusters. In addition, the proportions of neutrophil population estimated by CIBERSORTx and actually counted values in the whole blood were both predominant in cluster 3 over clusters 1 and 2. Although predominance of the neutrophil-driven IFN-inducible gene profile is known in tuberculosis (Berry et al., 2010), there was no difference in this cell type population between the H445Y and S450L groups in our cohort.
Total blood RNA sequencing and downward analyzes showed that S100P was the most promising DEG between the H445Y and S450L groups in any conditions tested, but did not reach significance levels by qRT-PCR expression analysis of the whole sample set. Nevertheless, S100P was also observed in the TB-related whole blood mRNA signature in another study (Verhagen et al., 2013). S100P is a member of S100 protein family and functions as a regulator of diverse cellular processes, especially tumor cell lines and tissues. Transcriptional regulation of S100P is complicated (Gibadulinova et al., 2011). TRAV1-2 was another DEG candidate between the H445Y and S450L groups despite individual diversities, although adjusted p values below 0.1 and T cell receptor (TCR) repertoires were not analyzed in this study. TRAV1-2 is used as a set of TRAV1-2/TRAJ12/20/33 (Napier  , comprising the invariant TCRα chain of mucosalassociated invariant T cells. Notably, recent TCR sequencing analysis revealed that TRAV1-2 was part of mycobacteria-reactive TCRα, and its expression was associated with TB disease progression or control (Musvosvi et al., 2023). GSEA after adjusting for PCA-based clusters revealed that two gene sets involved in the type I and II IFN pathways were significantly enriched not in H445Y but in S450L, irrespective of individual diversities, while the majority of these genes are activated in both types of IFN signaling pathway. GO enrichment analysis also indirectly supported the contribution of IFN pathways. Predominance of type I IFN-inducible blood transcriptional signatures correlates with TB disease severity and progression (Moreira-Teixeira et al., 2018;Singhania et al., 2018), and type II IFN (IFN-γ) is an important immunomodulator in the pathogenesis of TB (Chin et al., 2017). Thus IFN-inducible genes have been considered as promising blood-based biomarkers for TB (Singhania et al., 2018). The four genes, BATF2, SERPING1, UBE2L6, and VAMP5, which were reported to have high diagnostic value for active TB (Gong et al., 2021), are involved in these two enriched pathways. Of these, BATF2 has often been nominated as one of the top DEGs in active TB (Gupta et al., 2020;Tabone et al., 2021). In our study, the expression levels of three of these four genes in the whole blood were inversely associated with H445Y's presence even after adjustment for confounding factors, including Mtb lineages/ sublineages, and validated the result obtained from RNA sequencing, suggesting that blood transcriptional signatures are affected by pathogenic variants in a clinical setting. Howard et al. (2018) demonstrated that the infection of strains carrying rpoB S450L or wild-type rpoB resulted in exacerbated pulmonary inflammation, but rpoB H445Y modulated bacterial cell wall lipids, and thus, based on the reprogramming of macrophage metabolism, infection with Mtb harboring H445Y induced host defense pathways differently from infection with wild-type or S450L strains in murine models. Our study results partly supported their findings because host gene expression profiles were different between H445Y and S450L. Downregulation of IFN-inducible genes in the H445Y group was shown in the subgroup with extensive lung lesions and this negative association was observed even after adjustment for Mtb lineage/sublineage and disease severity. The host immune response can be suppressed depending on Mtb lineages/sublineages, which may contribute to increased virulence of Mtb and more efficient transmission (Coscolla and Gagneux, 2014). A similar mechanism may underlie the H445Y group. Further investigation would be necessary to determine whether low levels of IFN response genes in the H445Y group was a result of exhaustion due to multiple TB episodes or a cause of altered immune response reported in murine models and the subsequent recurrence and transmission to others.
Our study has some limitations. First, the sample numbers for previously treated TB patients in our study were relatively small. However, the catchment area was limited to one city; therefore, we were able to assess local transmission of TB, analyzing genotypic clusters and drug resistance intensively. Second, our clinical and epidemiological findings have not been corroborated by an in vitro growth model. It has not been established due to insufficient resources to assure biosafety in our local setting, and was beyond the scope of this study. Third, the treatment history of active TB was obtained from a structured questionnaire. Although healthcare workers had been trained and carefully conducted face-to-face interview and any ambiguous information was reconfirmed by direct contact or further checking registration logbooks in district health centers, the recall bias may not be negligible. Fourth, Mtb isolates from past episodes were not available, and we thus had no information whether the infection represented relapse or reinfection, or exactly when the drug resistance mutations were generated. Nevertheless, to our understanding, this is the first report on the independent association between repeated TB treatment and rpoB mutations in a clinical setting.
In conclusion, the distribution of rpoB mutations was different depending on the Beijing sublineages isolated from TB patients with treatment history. It is conceivable that epistatic interactions with rpoB mutations and alteration of host responses, including downregulation of the IFN response genes, influence multiple treatment episodes or genotypic clusters indicating recent spread, possibly reflecting higher transmission capability and/or host adaptation. These findings strengthen the importance of further research into host-pathogen relationships in drug-resistant TB for better TB control, including the development of new vaccines or therapeutic agents.

Data availability statement
The data presented in the study are deposited in the DDBJ repository, https://www.ddbj.nig.ac.jp/, accession numbers PRJDB15583 and PRJDB15892.

Ethics statement
The studies involving human participants were reviewed and approved by the Ethical Committees of the Hanoi Department of Health, Vietnam, the National Center for Global Health and Medicine, and the Research Institute of Tuberculosis, Japan Anti-Tuberculosis Association, Japan (RITIRB29-35). The patients/ participants provided their written informed consent to participate in this study.

Author contributions
NTLH planned and supervised on-site implementation, performed the data analyzes, wrote and finalized the manuscript. MH performed sequencing and real-time RT-PCR, analyzed sequencing results, wrote and finalized the manuscript. SM coordinated and supervised microbiological data collection and quality control. PT, HH, PA, NTH, and VC supervised on-site implementation. NPH supervised pathogen data collection. DT supervised host data collection. NKo read chest X-ray films. KW and AM provided technical support for sequencing. SS provided support for microbiological analyzes. NKe conceptualized the project, wrote in-house scripts for data analyzes, performed the analyzes, corrected and finalized the manuscript. All authors read and agreed to the published version of the manuscript.