Predicting Non-Alcoholic Fatty Liver Disease Progression and Immune Deregulations by Specific Gene Expression Patterns

Non-alcoholic fatty liver disease (NAFLD) is the most common liver disease worldwide with rising rates in parallel to obesity, type 2 diabetes, and metabolic syndrome. NAFLD includes pathologies ranging from simple steatosis (NAFL) to non-alcoholic steatohepatitis and cirrhosis (NASH), which may eventually develop into hepatocellular carcinoma (HCC). Mechanically, lipids accumulation and insulin resistance act as the first hit, inflammation and fibrosis serve as the second hit. Currently, the diagnosis of NAFLD mainly depends on pathology examination and medical imaging, whereas proper gene signature classifiers are necessary for the evaluation of disease status. Here, we developed three signature classifiers to distinguish different NAFLD disease states (NAFL and NASH). Moreover, we found that B cells, DCs, and MAIT cells are key deregulated immune cells in NAFLD, which are associated with NAFLD and NAFLD-HCC progression. Meanwhile, AKR1B10 and SPP1 are closely related to the above three immune cell infiltrations and immunosuppressive cytokines expressions in NAFLD and NAFLD-HCC. Subsequently, we screened out AKR1B10 and SPP1 sensitive molecules TGX-221, which may provide a possible therapy for NAFLD and NAFLD-HCC.


INTRODUCTION
During the past century, Non-alcoholic Fatty Liver disease (NAFLD) has become one of the most important causes of liver disease and it may become the leading cause of end-stage liver disease in the next few decades (1)(2)(3). Currently, the global prevalence of NAFLD is estimated to be 24%, the highest rates are reported in South America (31%) and the lowest in Africa (14%) (4). NAFLD is strongly associated with metabolic syndromes, such as obesity, type 2 diabetes mellitus, dyslipidemia, and hypertension (5,6). Abnormal lifestyle (Caloric excess and sedentary lifestyle) is the major cause of NAFLD. As the rates of obesity continue to rise, the prevalence of NAFLD is constantly increasing in the past decade from 15% in 2005 to 25% in 2010 worldwide (4,5).
NAFLD is considered as a complex disease trait, and interactions between the environment and a susceptible polygenic host background determine disease phenotype and influence progression (4). Lots of genome-wide association and large candidate gene studies indicated that PNPLA3, TM6SF2, MBOAT7, LYPLAL1, APOB, and GCKR variants were important genetic and epigenetic modifiers of NAFLD progression in specific populations and races (7,8). However, the universal mechanism of NAFLD occurrence and progression remains elusive (6). So far, the most recognized theory is the 'two-hit' theory, namely, abnormal lipid metabolism and inflammatory storm (9,10). The first-hit is abnormal liver lipid metabolism, resulting in excessive lipid influx or/and decreased lipid clearance. In this progress, steatosis may be reversible and does not necessarily cause permanent liver damage (11). The second-hit is the inflammatory storm process, which may be caused by oxidative stress, lipid peroxidation and cytokine action. Although the secondhit occurs less frequently, it is more toxic and irreversible, as lobular inflammation directly leads to ballooning degeneration and perisinusoidal fibrosis, which promotes apoptosis and liver cell death, and finally leads to scarring and progression to nonalcoholic steatohepatitis (NASH) (10). Therefore, understandings about two 'hits' molecular mechanisms and prognostic biomarkers are essential to NAFLD prevention and treatment.
In our study, we developed three classifiers to classify different NAFLD states, exhibiting NAFLD associated genes to reflect NAFLD immune microenvironment deregulation and progression, and predicting potential therapeutic targets and drugs.

Data Processing
Forty-four patients in GSE33814, 73 patients in GSE48452 and 63 patients in GSE89632 were downloaded from Gene Expression Omnibus (GEO), 45 patients in E-MEXP-3291 were from the ArrayExpress database. We used the "SVA" package in R for batch correction (12). HCC data were contained from The Cancer Genome Atlas database (TCGA), including 374 HCC samples and 50 normal samples.

Diagnostic Methods to Diagnose Different States of NAFLD
The diagnoses of all samples (E-MEXP-3291, GSE33814, GSE48452, and GSE89632) used were histologically validated by a boardcertified pathologist before molecular analysis. For histological analysis, hematoxylin and eosin (H&E) and chromotrope aniline blue (CAB) stained sections were used. Histological slides were diagnosed using criteria from a scoring system for human NAFLD (14). Besides, in the TCGA database, HCC samples used were histologically validated by a board-certified pathologist before molecular analysis. The standards were defined by the American Association for the Study of Liver Diseases (AASLD) (15). Information on donors, including age and gender, can be seen in Table 1.
Differential Analysis, GO\KEGG Analysis, and Gene Set Enrichment Analyses (GSEA) The genes differentially expressed (DEGs) were calculated and labeled using the "Limma" package. Subsequently, DEGs were analyzed by gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analyses. GO analysis consists of three parts, including molecular function (MF), biological process (BP), and cell component (CC). We determined that results are statistically significant at a level of less than 0.05 using a p-value. The GSEA was performed by GSEA software (version 4.0.3). We utilized it to detect the difference in the set of genes expressed between the high-risk and low-risk group in the enrichment of the MSigDB Collection (h.all.v7.1.symbols.gmt). For each analysis, gene set permutations were performed 1000 times.

Weighted Gene Co-Expression Network Analysis (WGCNA)
To find modules highly correlated with NAFLD, WGCNA was performed using the WGCNA R package and carried out on all genes (16,17). The Pearson correlation coefficient was used to establish an unsupervised co-expression relationship based on the connection strength adjacency matrix for gene pairs. This matrix was increased to b = 7 based on the scale-free topology criterion. Then the topological overlap matrix was used to analyze the adjacency matrix of clustering GC patient gene expression data. Finally, the dynamic tree cut algorithm was applied to the dendrogram for module identification with the mini-size of module gene numbers set as 50 and a cut height of 0.9. In the module-trait analysis, GS value>0.3 and MM value>0.55 were defined as a threshold. The WGCNA algorithm was described in detail by Zhang Bin et al. (18).
high-dimensional data, the LASSO method that prevents overfitting with strong predictive values and low mutual correlation was used (21). Principal component analysis (PCA) was performed before/after feature selection using the expression profiles of 9 core genes. The cores genes, as optimal genes, which were identified with non-zero regression coefficients, were used to establish an mRNA-based signature classifier for predicting different NAFLD states. A classifier index for each sample was created with the following formula: index = Exp gene1 *Coef1 + Exp gene2 *Coef2 + Exp gene3*Coef3+ … The efficiency of the classifier was assessed by mean squared errors (MSE), accuracy, sensitivity (Se), specificity (Sp), positive predictive value (PPV), negative predictive value (NPV), and area under the receiver operating characteristic (ROC) curve. These ROC curves were drawn and compared using the "pROC" package in R (22).

Identification of DEGs in NAFLD Patients
The whole research design was illustrated in Figure 1. Firstly, we identified genes which were expressed differently at different stages of NAFLD. The data of NAFLD obtained from GSE33814 and E-MEXP-3291, including 32 normal samples, 29 non-alcoholic fatty liver (NAFL) samples, and 28 NASH samples. 174 DEGs from Normal-NASH group and 117 DEGs from NAFL-NASH group were included in the analysis, which met the screening standard of our study (p<0.05, |log2FC) | >1.0). The expression distributions of these DEGs were displayed in Figure 2A-B. Subsequently, we searched for the common genes of these two groups. A total of 111 DEGs were included, consisting of 91 up-regulated and 20 down-regulated DEGs ( Figure 2C).

GO and KEGG Pathway Enrichment Analysis and Gene Set Enrichment Analysis (GSEA)
To investigate the mechanism of NAFLD progression, we performed GO/KEGG analysis among Normal-NASH and NAFL-NASH group. In the up-regulated group, intersected GO terms were significantly enriched in extracellular matrix organization, extracellular structure organization and cell chemotaxis, which were closely related to tumorigenesis. (Figures 3A, B To further explore the mechanism of NAFLD progression, we performed GSEA which took c7.1 as a reference gene set. The overlapping pathways between Normal-NASH group and NAFL-NASH group were shown in Table 2. The results recognized that pathways were significantly associated with tumorigeneses, such as epithelial-mesenchymal transition (EMT), angiogenesis and p53 pathway ( Figures 3M, N). Therefore, the above results indicated that the extracellular matrix may play an extremely important role in the progression of NAFLD.
The yellow module showed a highly negative correlation with ballooning (R 2 = 0.41, p=1e -6 ) ( Figure 4C). The positive modules were shown in Figure 4D and the negative modules in Figure 4E.
In the module-trait analysis, GS value > 0.3 and MM value > 0.55 were defined as trait-related genes. Subsequently, we intersected the trait-related genes obtained from WGCNA analysis and 111 DEGs obtained from expression difference analysis and finally obtained 29 trait-expression-related genes.

Identification of Core Genes and Construction of Protein-Protein Interaction (PPI) Network
As inflammation plays a central role in NAFLD, we constructed a PPI network with 29 trait-expression-related genes and 5 elevated inflammatory factors obtained from differential analysis. In the STRING database, we defined the removal standard to 0.4 and removed the independent genes. Later, 19 filtered genes were divided into four modules (Ballooning Associated, Inflammation Associated, Fibrosis Associated, and Glycolysis Associated) ( Figure 5A). The interactive relationship of 19 filtered genes in the whole network was determined using the Cytoscape software (CytoHubba plugin and MCODE plugin). Firstly, the data were imported into CytoHubba plugin, which helped to identify key genes through 5 different calculation methods, namely, EPC, MCC, DMNC, MNC, and Degree (Table 3). Subsequently, the data were imported into another plugin, MCODE, which helped to identify different clusters. The results showed that MT1G, MT1X, MT1F, MT1H and MT1M were in cluster 1 while FABP4, SPP1, MMP7 and CCL2 were in cluster 2 (cutoff k-score = 2). Finally, we performed a correlation analysis of these 9 core genes (MT1G, MT1X, MT1F, MT1H, MT1M, FABP4, SPP1, MMP7 and CCL2) and found that they were closely related to each other ( Figure 5B).

Construction of LASSO Logistic Regression Classifiers
To develop classifiers to distinguish different NAFLD states, we performed LASSO logistic regression based on the expression of the 9 core genes (MT1G, MT1X, MT1F, MT1H, MT1M, FABP4,  SPP1, MMP7 and CCL2). Our calculation followed the 10-fold cross-validation method and randomly divided the patients into groups-training group: testing group = 6:4.  Figure 5D presented the results of PCA before feature selection and Figure 5E presented the results of PCA with genes identified by LASSO methods, which indicated that samples with different NAFLD states (Normal/NASH) are more easily distinguished by Normal-NASH classifier. The accuracy of the Normal-NASH classifier was 0.8878 in the training group, 0.9219 in the testing group, and 0.9012 in the total group. Based on the accuracy, Se, Sp, PPV, NPV, and AUC values showed that sample recognition efficiency of the classifier was high ( Table 4). The ROC curve showed that AUC was 0.944 in the training set, 0.965 in the testing group, and their difference was not significant (Delong method (23) P = 0.5125, Figure 5F).  Figure 5H presented the results of PCA before feature selection and Figure 5I presented the results of PCA with genes identified by LASSO methods, which indicated that samples with different NAFLD states (Normal-NAFL) are more easily distinguished by Normal-NAFL classifier. The accuracy of the Normal-NAFL classifier was 0.7292 in the training group, 0.7031 in the testing group, and 0.7188 in the total group. Based on the accuracy, Se, Sp, PPV, NPV, and AUC values showed that sample recognition efficiency of the classifier was high ( Table 4). The ROC curve showed that the AUC was 0.720 in the training set, 0.730 in the testing group, and their difference was not significant (Delong method (23) P = 0.9051, Figure 5J).
For the NAFL-NASH group, 8 genes (MT1G, MT1F, MT1H, MT1M, FABP4, SPP1, MMP7, and CCL2) were identified as optimal features with non-zero regression coefficients ( Figure  5K Figure   5L presented the results of PCA before feature selection and Figure  5M presented the results of PCA with genes identified by LASSO methods, which indicated that samples with different NAFLD states (NAFL/NASH) are more easily distinguished by NAFL-NASH classifier. The accuracy of the NAFL-NASH classifier was 0.7662 in the training group, 0.7647 in the testing group, and 0.7656 in the total group. Based on the accuracy, Se, Sp, PPV, NPV, and AUC values showed that the sample recognition efficiency of the classifier was high ( Table 4). The ROC curve showed that the AUC was 0.915 in the training group, 0.915 in the testing group, and their difference was not significant (Delong method (23) P = 0.9956, Figure 5N).
Besides, many studies showed that PNPLA3 is one of the genetic risk factors with more evidence in the NAFLD progression (7,24,25), so we performed ROC curves to compare the ability of PNPLA3/classifiers to distinguish different NAFLD states. The results showed that classifiers have better accuracy and reliability (AUC=0.953, 0.730, 0.910) than PNPLA3 (AUC=0.579, 0.639, 0.565) to distinguish NAFLD states (Normal/NAFL/NASH) ( Figures 5O-Q).

Correlation of the Classifier Index and Clinical Characteristics
Considering the correlation among classifiers and NAFLD states, we tried to explore the relationships between classifier index and clinical characteristics. In the Normal-NASH group, the classifier index was significantly associated with ballooning grade (p=5.853e -05 ), inflammation (p=9.136e -10 ), steatosis (p=1.049e -09 ), fibrosis (p=1.399e -07 ) and NAS (p=2.06e -10 ). As the classifier index increased, the level of ballooning/inflammation/steatosis/ fibrosis/NAS elevated ( Figure 6A). In Normal-NAFL group, classifier index was significantly associated with steatosis (p=2.836e -06 ) and NAS (p=0.002). However, classifier index showed no significant relationship with ballooning/fibrosis. As the classifier index increased, the level of steatosis/NAS elevated ( Figure 6B). In NAFL-NASH group, results showed classifier index was significantly associated with ballooning (p=7.079e -04 ), inflammation (p=4.707e -06 ), steatosis (p=0.029), fibrosis (p=6.971e -05 ), and NAS (p=1.076e -06 ) ( Figure 6C). Meanwhile, the level of ballooning/inflammation/steatosis/fibrosis/NAS elevated as classifier index increased. The above results showed that our classifier can also be used to predict the NAFLD patients with different clinical characteristics.

Relationship Between Classifiers and Immune Score
To explore the relationship between the classifier and immune microenvironment, we performed correlation analysis among the classifier index and three indexes (StromalScore, ImmuneScore and ESTIMATEScore, which were acquired from "estimate" package in R software). In the Normal-NASH group, StromalScore, ImmuneScore, and ESTIMATEScore were higher than those of the normal group ( Figure 6D). Besides, classifier index showed a medium correlation with StromalScore (Pearson R = 0.46, p = 7.6e -10 ), ImmuneScore (Pearson R = 0.31, p = 7.7e -05 ) and ESTIMATEScore (Pearson R = 0.39, p = 2.9e -07 ) ( Figure 6E). There was no difference in StromalScore, ImmuneScore and ESTIMATEScore in the Normal-NAFL group. In the NAFL-NASH group, StromalScore, ImmuneScore, and ESTIMATEScore of the NASH group were higher than those of the NAFL group ( Figure 6F), and the classifier index showed a high correlation with StromalScore (Pearson R = 0.57, p < 2.2e -16 ), low correlation with ImmuneScore (Pearson R = 0.27, p = 0.0024) and medium correlation with ESTIMATEScore (Pearson R = 0.4, p = 3.3e -06 ) ( Figure 6G). Therefore, our classifier can also be used to predict the immune microenvironment of patients with different disease states.

Correlation of Six Progress-Related Genes With HCC
Previous studies showed that up to one-third of patients with NASH might progress to HCC (26,27), so we explored the role of six progress-related genes (AKR1B10/SPP1/CD24/UBD/ FABP4/STMN2) in HCC. In previous research, we found that these progress-related genes were closely related to tumorigenesis and HCC progression ( Figure 8A). Moreover, based on the TCGA database, we found that progress-related genes were enriched in tumor-related pathways, such as apoptosis, cell cycle, and EMT ( Figure 8B).
In the TCGA database, we found that six progress-related genes were up-regulated in the HCC ( Figure 8C). To explore the relationship between six progress-related genes and prognosis, we performed survival analysis and the results showed that high expression of SPP1 (p=0.00011) and AKR1B10 (p=0.011) were associated with a low survival rate ( Figure 8D). In HPA database, the expression of SPP1/AKR1B10 was also abnormally elevated in HCC ( Figure 8E). Therefore, SPP1/AKR1B10 may be closely related to progress and prognosis in Normal-NAFL-NASH-HCC progression.

The Immune Landscape of NAFLD and HCC Patients
Subsequently, we explored the immune landscape of NAFLD and HCC. In NAFLD, the results showed that the immune  Figure 8F). To investigate the underlying relationships among these immune cells, we evaluated the correlation between each other. We found that Neutrophils and Tfh cells appeared to have the most positive correlation (R = 0.46), while Neutrophils and CD8+ T cells had the most negative correlation (R = -0.62) ( Figure 8G). The above results indicated that in the course of Normal-NAFL-NASH, the immune microenvironment was gradually activated and reached its peak at the time of NASH, which was consistent with the conclusion that NASH is an inflammatory disease (28).

, Th1 cells, Central memory cells, Dendritic Cells (DCs), B cells, NK cells, CD4 T cells, and CD8 T cells), while the immune infiltration of Tfh cells and Neutrophil decreased (
In HCC, the immune infiltration of the following cells increased (CD8 naive cells, Tr1 cells, nTreg cells, iTreg cells, Th1 cells, Tfh cells, Central memory cells, DC cells, B cells, and CD4 T cells), while the immune infiltration of these cells decreased (Cytotoxic cells, Exhausted T cells, Th2 cells, Th17 cells, MAIT cells, Monocyte, Macrophage, NK cells, and Neutrophil)( Figure  8H). Besides, we found that Exhausted T cells and Cytotoxic cells appeared to have the most positive correlation (R = 0.67), while Neutrophils and Th1 cells had the most negative correlation (R = -0.72) ( Figure 8I). These results showed that the immune microenvironment of HCC was in a suppressed state.
Therefore, these results indicated that immune status changed from immune-activation to immune-suppression in the process of NASH to HCC. Besides, our study may provide therapeutic targets (Cytotoxic cells, MAIT cells, Tfh cells, Th2 cells, B cells and DCs) for NAFLD to slow down progression and improve prognosis.
Later, we explored the relationship between NAFLD progression and immunosuppressive cytokines' expressions. Based on previous research, we downloaded Cancer-Immunity Cycle associated immunosuppressive cytokines from the Tracking Tumor Immunophenotype website (29). We defined p<0.05 and R≥0.3 as the threshold to screen out co-expressed immunosuppressive cytokines of SPP1/AKR1B10 (Figures S1C-E). The results showed that the ratio of genes involving the negative regulation of the Cancer-Immunity Cycle increased, indicating activities of the Cancer-Immunity Cycle gradually decreased in progress of NAFL-NASH-HCC ( Figures 8K, L).
The above results indicated that SPP1/AKR1B10 might play a vital role in the change of the immune microenvironment. Besides, based on the GDSC database, we performed a drug sensitivity analysis on six progress-related genes. It found that TGX-221 was a common sensitive drug of AKR1B10 and SPP1, which may play a role in inhibiting or delaying the progression of NAFLD and improve the prognosis of NAFLD-HCC ( Figure 8M).

DISCUSSION
Non-alcoholic Fatty Liver disease (NAFLD) is currently considered as the first cause of chronic liver disease accounting for 25% of cases worldwide (4). NAFLD is generally divided into two stages: the early stage is non-alcoholic fatty liver (NAFL) with pathological features of isolated steatosis, no or minimal inflammatory activity, and no evidence of cell damage. In the progress of NAFL to non-alcoholic steatohepatitis (NASH), namely second stage, inflammation and liver cell damage characterized by hepatocyte swelling appear in the liver, accompanied by various degrees of fibrosis (30,31). Lipid accumulation, liver cell damage, immune system dysfunction and fibrosis are all involved in NAFLD, which may eventually progress to HCC (32). Early detection and diagnosis are of great importance in NAFLD treatment. Up to now, NAFLD diagnosis mainly relies on imaging examination and liver biopsy, which lacks ability to precisely assess disease status and predict NAFLD progression.
Based on 111 intersected DEGs between Normal-NASH group and NAFL-NASH group, we performed GO and KEGG pathway analysis to explore underlying mechanism of NAFLD. The results showed that enriched pathways were involved in tumorigenesis, such as extracellular matrix organization, extracellular structure organization and PI3K-AKT signaling pathway. Meanwhile, most overlapping pathways in GSEA were also related to tumorigenesis among Normal-NASH group and NAFL-NASH group, such as EMT, angiogenesis and p53 pathway (Figure 2, 3). These results indicated that extracellular matrix may play an important role in the development of NAFLD Subsequently, we constructed a PPI network of NAFLD. Based on specific gene functions, we divided PPI network into four modules: Ballooning Associated, Inflammation Associated, Fibrosis Associated, and Glycolysis Associated. This was roughly the same as the development process of NAFLD (26,27). Although patients in "first-hit" can reverse histological steatosis to a normal state, when progressing to NASH (second-hit), patients have been in an irreversible state. Thus, early detection and timely treatment are of great importance to stop and reverse NAFLD progression. So far, the diagnosis of the NAFLD state still mainly depends on pathology, whereas it is expensive and inconvenient to operate (33)(34)(35). In past years, some models, such as hepatic-portal venous pressure gradient (HVPG), computed tomography (CT), MRI, and MR elastography (MRE), were built to diagnose NAFLD disease state (36). However, low sensitivity and high false-positive rates limit their clinical use. Given the important link between genes and NAFLD progression (37), proper gene signature classifiers may provide simple and accurate evaluation methods for NAFLD status.
Later, we proved classifiers could effectively distinguish clinical characteristics (ballooning, inflammation, steatosis, fibrosis and NAS) in Normal/NASH and NAFLD/NASH group. In Normal/NAFL group, the classifier score was associated with steatosis and NAS, while no inflammation and fibrosis may be due to little inflammation and fibrosis occurring in patients in first-hit. Meanwhile, the classifier score was closely associated with an immune microenvironment score in Normal-NASH/NAFL-NASH, which indicated classifiers could also predict states of the immune microenvironment ( Figure 6).
To clarify the relationship between genes and NAFLD progress, we performed an upset-plot between pathological-related DEGs and 19 genes obtained from STRING database. Six progressrelated genes (AKR1B10/SPP1/CD24/UBD/FABP4/STMN2) were found remarkably correlated with steatosis, ballooning, inflammation, fibrosis, and NAS in the progress of Normal/ NAFL/NASH. Subsequently, we explored the relationship between these six progress-related genes and NAFLD-HCC prognosis. Previous studies showed that these six progressrelated genes play a vital role in tumorigenesis, such as PI3K-AKT pathway (38),p53 pathway (39), and STAT pathway (40). Meanwhile, in the TCGA database, the six progress-related gene expressions were increased in HCC patients, and SPP1/AKR1B10 was negatively related to overall survival. AKR1B10 was reported to metabolize a variety of substrates, such as retinal, lipid peroxidation products, and exogenous biological agents (41)(42)(43)(44). As storage of retinyl in cytoplasmic lipid droplets is the most distinctive feature of hepatic stellate cell (45), therefore, the abnormal expression of AKR1B10 may lead to the activation of HSC and promote the progression of NAFLD (46). Furthermore, studies have demonstrated that AKR1B10 was also related to tumor growth and metastasis, which may explain the lasting role in NAFL-NASH-HCC progression (47)(48)(49). Secreted phosphoprotein 1 (SPP1), also known as osteopontin (OPN), is a secreted glycoprotein that has multiple functions and affects proliferation, differentiation, migration and inflammation (50)(51)(52). Abnormal SPP1 expression was related to various cancer progressions [colorectal cancer (53,54), lung cancer (55), and HCC (56)] via inducing inflammation and reshaping the microenvironment. As SPP1 and AKR1B10 were closely related to the Norma-NAFL-NASH process and the prognosis of HCC, these two genes may be key genes in the progression of Normal-NAFL-NASH-HCC.
As deregulated immune microenvironment was proved to have a profound effect on the progression of NAFLD progression via inflammation, we tried to explore immune microenvironment changes in Normal-NAFL-NASH-HCC. The results showed that immune activated cells (CD4 T, CD8 T and NK) infiltrations gradually increased, indicating an immunostimulatory microenvironment remodeling during Normal-NAFL-NASH progress. On the contrary, immune activated cell infiltration decreased and immunosuppressive cell infiltrations gradually increased in HCC, indicating a suppressive immune microenvironment. The deregulated ratio of immune cells and cytokines reshaped microenvironment, which may be a key factor in the conversion of NAFL/NASH/HCC (57)(58)(59). Therefore, we performed a survival analysis of immune cells to look for prognosis-related immune cells and six immune cells (Cytotoxic cells, MAIT cells, Tfh cells, Th2 cells, B cells and DCs) were identified as being survival-related.
Later, we explored the relationship between AKR1B10/SPP1 and six immune cells. The results showed that AKR1B10 was positively related to DCs and MAIT cells. And SPP1 was positively related to B cells, DCs, and MAIT cells. In B cells, which account for half of the total number of lymphocytes in the liver, the infiltration gradually increases during the progression of NAFLD. Previous studies showed CCl4-induced B-cell deficient mice have reduced liver fibrosis, which indicated that B cells had the pro-fibrotic capability (60,61). As mentioned earlier, the key character of NASH was the loss of the liver's tolerance to the microenvironment, which turned the liver into a pro-inflammatory immune phenotype and subsequently released pro-inflammatory cytokines to induce DCs maturation, and finally increased adaptive immune responses by CD4+/CD8+ T cells activation and infiltration (62,63). Therefore, DCs serve as a central cell bridge connecting the innate immune system and adaptive immune system responses, which work as antigenpresenting cells in the liver. Besides, consuming CD11c + DCs or CD103 + DCs reduced proinflammatory cytokine and liver fibrosis in MCD-induced NASH or thioacetamide-diet-induced liver fibrosis models proved DCs also played a pro-inflammatory role in the NAFLD process (28,64,65).
Human mucosal-associated invariant T cells (MAIT cells) are highly enriched in liver and highly conservative at the evolutionary level. They express semi-constant T cell receptors (TCR), which can specifically recognize microbial-derived vitamin B metabolites, and then release a large number of inflammatory cytokines and granzymes (66,67). The role of MAIT cells in NASH process is not yet clear. Previous studies showed that MAIT cells increase during acute injury or infection, which promoted inflammation and protected the body (68). However, when disease progresses or turns into a chronic disease, MAIT cells start to decrease. Meanwhile, the gene expression pattern of MAIT cells will change. MAIT cells in Autoimmune liver disease (AILD) patients had evolved into an exhausted, pro-fibrotic phenotype, and promoted the development of HSC-mediated liver fibrosis (69,70). Compared to normal samples, although MAIT cells reduced in HCC, tumor-derived MAIT cells down-regulated genes enriched in the cytokine secretion and cytolytic effect pathways (NFKB1 and STAT5B) and up-regulated genes such as IL8, CXCL12 and HAVCR2 (TIM -3) promote tumor development (71). Therefore, MAIT cells may harm the individual's immune defense in chronic diseases, especially in NAFLD/HCC. The correlation between AKR1B10, SPP1, B cells, DCs, and MAIT cells indicated the possible interactive network structure that may explain and control metabolic disorders and fibrotic phenotypes from NASH to HCC. Therefore, we selected AKR1B10 and SPP1 co-sensitive drugs TGX-221 as a possible therapy to inhibit or delay the progression of NAFLD and improve the prognosis of NAFLD-HCC.

CONCLUSION
In conclusion, we established 3 gene-based signature classifiers that may serve as biomarkers to predict disease state in NAFLD. In our analysis, we also discovered changes in the immune microenvironment, the key immune cells (B cells, DCs, MAIT cells) and genes (AKR1B10, SPP1) in the progression of NAFLD. Besides, TGX-221 may be a potential therapeutic drug for the treatment of NAFLD and NAFLD-HCC.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding authors.