Edited by: Arun Kumar Sangaiah, VIT University, India
Reviewed by: Wenbo Zhang, Qiqihar University, China; Zhongyuan Li, First Affiliated Hospital of Qiqihar Medical University, China; Henry Mildredl, Federal Institute for Drugs and Medical Devices, Germany
This article was submitted to Computational Genomics, a section of the journal Frontiers in Genetics
This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
The objective was to explore the function of gene differential expressions between lung cancer tissues and the interaction between the relevant encoded proteins, thereby analyzing the important genes closely related to lung cancer. A total of 120 samples from the GEO database (including two groups, i.e., 60 lung cancer
The incidence rate of lung cancer is one of the fastest growing malignant tumors (
Bioinformatics uses computers to mine and analyze great information in biological databases, focuses on gene and proteomic analysis, and is widely used in the fields of molecular genetics and genomics. In the field of tumor research, bioinformatics combines suspicious tumor genes with known biological data through the biological network analysis of tumor-related pathways and biological processes, identifies tumor-related functional categories, and excavates tumor networks. It also predicts potential pathogenic proteins and plays an important role in tumor pathogenesis, diagnosis, and treatment. As the gene chip technology continuously develops, it has become a hot topic how to process and analyze tremendous data and find more effective information. At present, gene chip technology is mainly used in the research of tumor-related gene information, such as screening tumor-related genes, measuring tumor mutation genes, studying tumor gene expression profiles, and diagnosing tumor diseases. In this way, it can explore the extent of influences of genetic, environmental, and pharmaceutical factors for tumors on the expression of related genes during the occurrence and development of tumors.
The rapid development of high-throughput technologies, such as MeDip-seq, methylated microarrays, and RNA-seq, has provided technical support for the identification of biomarkers for a variety of diseases such as lung cancer, as well as opportunities for the availability of publicly available data sets. By selecting the gene expression dataset of lung cancer, this study innovatively explores the network of lung cancer target genes through gene expression analysis of different databases, thus exploring the molecular driving mechanism of lung cancer and providing reference for clinical molecular drug treatment and nursing guidance of lung cancer.
A total of 120 samples of lung cancer mRNA sample GSE19408 (including two groups: 60 lung cancer
First, download the sample, import the CEL (cool edit loop) format file into the R program, use the limma package in the R language to count the difference between the lung cancer gene and the normal gene, and then follow the FDR (false discovery rate) and FC (fold change, gene expression fold ratio) from which differentially expressed genes were selected, and the comparison between the two groups of genes must satisfy the requirements of FDR < 0.01 and | log2 FC | ≥ 1.
Signaling pathway analysis and biological function enrichment of the screened NSCLC differentially expressed genes were performed using the Functional annotation chart tool under the DAVID platform. First, the differentially expressed genes were introduced into the DAVID list in the form of gene symbol, and the humans were submitted to the task in the species type, and the GO (Gene Ontology) analysis and the KEGG (Kyoto Encyclopedia of Genes and Genomes) pathway were performed on both the up-regulated and down-regulated genes (
Gene data can be applied to gene regulatory network analysis to analyze the differential expression of genes for studying the differential expression of their target genes and the processes that constitute various organisms, such as organ formation, embryo development, and disease pathogenesis. The network of relationships is compared between cell types or states and analyzed further, and specific molecular features and functional blocks can be identified, which are the basis for state transitions. In order to identify key target genes related to lung cancer, this study established a protein interaction network model to explore the regulatory relationship of differential genes at the protein level. The differentially expressed genes obtained by the DAVID platform were subjected to ID (Identity Document) conversion and input into the STRING 9.1 (the Search Tool for the Retrieval of Interacting Genes) database to establish a differentially expressed gene encoding protein-protein between Interaction network diagram. Proteins at the center of the protein-protein interaction network often play a relatively important role in the development of the disease. The selection criteria for PPI (Protein-protein interaction network) analysis was combination score >0.4 (medium confidence). Enter the PPI value into the visualization tool, that is, the Cytoscape software, and use the analysis plug-in to calculate the edge of the nodes in the network to get the number of protein interactions (Degree). The analysis steps of Cytoscape software are as follows: first, import the node attribute file, file- >import- >table- >file(node.txt) (here is table instead of network), and then set the format of simple network diagram in style. Finally, export the file. The data can be network file, table file, or picture file. The picture file includes a variety of picture formats and PDF format, which can be selected in the toolbar.
Total protein extraction: Cells were taken out; the culture medium was discarded, and the cells were washed with PBS. Then, 70 μL of cell lysate was added to each well. After 5 min, the cell suspension was transferred to an Eppendorf (EP) tube (TIANGEN Biochemical Technology (Beijing) Co., Ltd., China) and shaken once every 5 min for a total of 6 times. The cell suspension was put into a 4°C centrifuge, centrifuged at 1,000 rpm/min for 15 min. The supernatant was taken for bicinchoninic acid (BCA) protein quantitative determination, and the standard curve was drawn.
Preparation of stacking gel and separation gel: The reagents (purchased from TIANGEN Biochemical Technology (Beijing) Co., Ltd., China) were summarized in
Configuration of stacking gel and separation gel.
Ingredients | Stacking gel | 10% Separation gel |
Double distilled water | 2.6 × 103 | 3 × 103 |
30% polyacrylamide liquid | 0.64 × 103 | 3 × 103 |
1.0 mol/L tromethamine (Tris, pH8.8) | 2.3 × 103 | 2.3 × 103 |
10% sodium dodecyl sulfate (SDS) | 0.03 × 103 | 0.1 × 103 |
10% ammonium persulfate | 0.04 × 103 | 0.1 × 103 |
Tetramethylethylenediamine (TEMED) | 0.004 × 103 | 0.004 × 103 |
Electrophoresis and image development: The glass plate was cleaned thoroughly with distilled water and ethanol. The glass plate was aligned and put in the clamp vertically on the glue rack. The distilled water was added to the glass plate to a suitable position. Then, the device was stood for 8 min to test whether the glass plate was leaking. A 10% separation gel was prepared according to the formula in
After the concentrated gel was solidified, the glass plate and the plastic replacement plate were sandwiched in the rack with electrodes; then, the device was put into the electrophoresis tank, and the comb was pulled out. Next, 30 μL of the expressed protein supernatant was taken out, added with 10 μL of 5 × loading buffer, mixed evenly, and boiled for 10 min at 100°C.
Eventually, 40 μL of the sample was loaded on each well of the electrophoresis gel. Under 80V voltage, the bromophenol blue formed a straight line in the gel, and then the voltage was changed to 120V. When the bromophenol blue ran to the lower edge, the power supply was disconnected, and the membrane was transferred. The membrane transfer process is as follows: soak the glue in the transfer buffer for 10 min, cut six pieces of membrane and filter paper according to the size of the glue, put the transfer buffer for 10 min, place each layer in the order of sponge/3 layers of filter paper/glue/membrane/3 layers of filter paper/sponge, and drive away the bubbles with a test tube. Then put the transfer tank into the ice bath, put the above interlayer, add transfer buffer, and insert the electrode, 100V for 1 h.
After the membrane transfer was completed, the gel image processing system (Unverbindlicher Verkaufspreis, Germany) was used to analyze the target band’s molecular weight and net optical density. The relative expression of target protein = target band gray value OD/internal reference gray value OD.
Influences of patients’ clinical characteristics on their quality of life (1: Stage I-II; 2: Stage III-IV; 3: Chemotherapy less than 3 times; 4: Chemotherapy more than 3 times).
A total of 875 differentially expressed genes, including 291 up-regulated genes and 584 down-regulated genes, were obtained with FDR ≤ 0.05 and log 2 FC ≥ 1 criteria. Among these genes, the first 10 up-regulated genes and the first 10 down-regulated genes were shown in
The log2FC values of first 10 up-regulated genes and first 10 down-regulated genes with relatively large differences.
Oncogenes | Names | log2FC | |
Up-regulated genes | COL10A1 | 3.9864 | 9.69e-32 |
COL11A1 | 3.7236 | 5.95e-22 | |
CST1 | 2.9661 | 2.90e-23 | |
CTHRC1 | 3.2530 | 9.06e-26 | |
GREM1 | 2.8852 | 2.75e-15 | |
HS6ST2 | 3.4452 | 7.12e-22 | |
MMP1 | 3.0086 | 5.06e-13 | |
MMP12 | 3.1327 | 1.54e-17 | |
SPINK1 | 3.3138 | 1.06e-14 | |
TOX3 | 2.8138 | 2.69e-20 | |
Down-regulated genes | AGER | −3.8451 | 3.62e-35 |
CLDN18 | −3.2612 | 9.98e-19 | |
FCN3 | −3.4334 | 1.01e-22 | |
GKN2 | −3.2586 | 3.43e-20 | |
GPM6A | −3.6053 | 2.97e-31 | |
TMEM100 | −3.5239 | 6.07e-21 | |
SCGB1A1 | −3.3605 | 4.04e-10 | |
SFTPC | −3.3127 | 3.23e-15 | |
SOSTDC1 | −3.3652 | 5.19e-19 | |
WIF1 | −3.7317 | 2.09e-17 |
The log2FC values of first 10 up-regulated genes and first 10 down-regulated genes that had relatively large differences
The up-regulated gene COL11A1 was taken as an example; the types of its signal transduction molecules were counted (
According to
COL11A1 upstream and downstream network signal transduction in different tissues (1 and 3: normal human tissue; 2 and 4: lung cancer tissue).
Metastatic molecular types of COL11A1 in different tissues (1 and 3: normal human tissue; 2 and 4: lung cancer tissue).
As shown in
The obtained 291 up-regulated genes and 584 down-regulated genes were input into the DAVID platform for signal pathway analysis and biological function enrichment. The results showed that the differentially expressed genes were enriched in 435, with statistically significant differences (
The GO analysis results of the first 10 up-regulated genes and first 10 down-regulated genes.
Oncogenes | Gene GO classification | Pathway ID | Pathway description | The quantity of genes | |
Up-regulated gene | BP | GO.0030574 | Collagen catabolism | 14 | 1.06e-10 |
BP | GO.0006508 | Proteolysis | 26 | 8.45e-6 | |
BP | GO.0030199 | Collagen fiber tissue | 7 | 1.62e-6 | |
BP | GO.0000281 | Mitosis and cell division | 6 | 4.63e-5 | |
CC | GO.0005615 | Extracellular | 53 | 9.56e-11 | |
CC | GO.0005576 | Extracellular | 55 | 7.45e-9 | |
CC | GO.0070062 | Extracellular | 83 | 1.56e-7 | |
CC | GO.0005581 | Collagen trimer | 12 | 9.63e-7 | |
MF | GO.0004252 | Active serine endonuclease activity | 19 | 1.45e-8 | |
MF | GO.0004556 | α-amylase | 4 | 2.74e-7 | |
Down-regulated gene | BP | GO.0030593 | Neutrophilic granulocyte chemoattractant | 19 | 1.07e-13 |
BP | GO.0006954 | Inflammatory response | 38 | 3.14e-11 | |
BP | GO.0006955 | Immunological reaction | 37 | 1.01e-11 | |
BP | GO.0001525 | Angiogenesis | 26 | 9.45e-8 | |
BP | GO.0050729 | Inflammatory reaction | 14 | 4.06e-8 | |
CC | GO.0005615 | Extracellular | 106 | 9.88e-17 | |
CC | GO.0005576 | Extracellular | 108 | 1.04e-18 | |
CC | GO.0005578 | Extracellular matrix | 33 | 1.16e-13 | |
CC | GO.0005886 | Cytoplasm membrane | 178 | 6.05e-8 | |
MF | GO.0008201 | Heparin binding | 23 | 4.33e-8 |
The GO analysis results
The KEGG pathway analyzed the biological functions of genes from the system level through abundant pathway information, including many complex biological functions such as genetic information transmission, metabolic pathways, and cellular processes. From the annotation analysis of a single gene to the annotation analysis of a gene set, it is judged whether a group of genes appears on a functional node. The KEGG pathway analysis identifies biological processes most relevant to biological phenomena and greatly enhances the reliability of the survey. The results of the KEGG pathway analysis of differentially expressed up-regulated genes and differentially expressed down-regulated genes were shown in
The KEGG pathway analysis results of differently expressed up-regulated genes and differently expressed down-regulated genes.
Oncogenes | Pathway ID | Amount of genes | Names | Pathway description | |
Up-regulated genes | 00500 | 6 | AMY1A, AMY1B, AMY1C, AMY2A, AMY2B, PGM2L1 | Starch and sucrose metabolism | 2.34e-4 |
04110 | 11 | BUB1B, CCNB1, CDC20, CDK1, CDKN2A, MAD2L1, MCM2, ORC6, PTTG1, SFN, TTK | Cell cycle | 1.06e-5 | |
04115 | 7 | CCNB1, CDK1, CDKN2A, IGFBP3, RRM2, SFN, STEAP3 | p53 signaling pathway | 9.95e-4 | |
04512 | 10 | COL1A1, COL1A2, COL3A1, COL5A1, COL5A2, COL11A1, COMP, HMMR, SPP1, THBS2 | Extracellular matrix receptor interaction | 2.01e-5 | |
04974 | 10 | ACE2, COL1A1, COL1A2, COL3A1, COL5A1, COL5A2, COL11A1, DPP4, KCNN4, KCNK5 | Digestion and absorption of protein | 2.78e-6 | |
Down-regulated genes | 04062 | 16 | ARRB1, CCL2, CCL4, CCL14, CCL21, CCL23, CXCL3, CXCL12, CXCR2, CX3CL1, ELMO1, FGR, GNG11, PLCB4, PPBP, PREX1 | Chemokine signaling transduction pathway | 1.37e-5 |
04514 | 12 | CADM1, CDH5, CD274, CLDN5, CLDN18, CLDN22, ESAM, ICAM1, ICAM2, PECAM1, PTPRM, SELP | Cell adhesion molecule | 1.46e-3 | |
04668 | 13 | CCL2, CXCL3, CX3CL1, EDN1, FOS, ICAM1, IL1B, IL18R1, IL6, JUNB, MAP3K8, PTGS2, TNFAIP3 | TNF signaling pathway | 1.69e-6 | |
05143 | 7 | ICAM1, IL1B, IL6, HBA1, HBA2, HBB, PLCB4 | African trypanosomiasis | 1.54e-4 | |
05144 | 14 | CCL2, CD36, CSF3, GYPC, HBA1, HBA2, HBB, ICAM1, IL1B, IL6, KLRB1, PECAM1, SELE, SELP | Malaria | 1.01e-11 |
The KEGG pathway analysis results
A 292 nodes and 1,425 interaction networks were obtained from 291 up-regulated genes, and 529 nodes and 1,624 interaction networks were obtained from 584 down-regulated genes by analyzing the string tool. After processing with visualization software, the significant module in the protein-protein interaction relationship network in
Significant modules in the protein-protein interaction network
Protein expressions of the high-expressed genes CCNB1 and TOP2A were illustrated in
Protein expressions of CCNB1 and TOP2A in lung cancer patients.
Afterward, the expression of messenger RNA corresponding to CCNB1 and TOP2A proteins was analyzed, and the results were shown in
The expression of messenger RNA corresponding to CCNB1 and TOP2A proteins in the control and experimental groups (1: CCNB1; 2: TOP2A).
As shown in
Protein expressions of the low-expressed genes IL6 and IL1B were illustrated in
Protein expressions of IL6 and IL1B in patients with lung cancer.
Then, the messenger RNA expression of IL6 and IL1B proteins was analyzed (
The messenger RNA expressions of IL6 and IL1B proteins in the control and experimental groups (1: IL6; 2: IL1B).
As shown in
This study attempts to reveal the molecular driving mechanism of lung cancer through signal pathway, biological function enrichment, protein interaction analysis, and gene target network analysis. A total of 875 differentially expressed genes were obtained by analyzing the samples. These genes are mainly involved in biological processes such as protein metabolism, protein hydrolysis, mitosis and cell division. TOP2A, CCNB1, CCNA2, CDK1, and TTK may be the key target genes of lung cancer. Exploring the changes of various genes and pathways in the pathogenesis of lung cancer provides reference for the molecular driving mechanism of lung cancer, and provide theoretical basis for molecular-targeted drug therapy and clinical nursing guidance of lung cancer. However, there are still some shortcomings. The selection number of up-regulated and down-regulated genes is limited, which cannot meet the huge molecular network analysis. In the later stage, the screening amount of up-regulated and down-regulated genes will be increased. The molecular driving mechanism of lung cancer was still in the preliminary stage. In the subsequent research, TOP2A with large interaction relations among the critical target genes related to lung cancer obtained by screening would be screened for drug resistance, providing assistance for the development of its inhibitors.
Publicly available datasets were analyzed in this study. This data can be found here: the GEO (Gene Expression Omnibus) database.
RH: writing – original draft and conceptualization. XX: data curation and software. KZ: supervision and resources. YZ: formal analysis. CW: validation. GH: writing, review, editing, and methodology. All authors contributed to the article and approved the submitted version.
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
This work was supported by the Natural Science Foundation of Zhejiang Province of China (LY21H160011).