High-Throughput Strategies for the Discovery of Anticancer Drugs by Targeting Transcriptional Reprogramming

Transcriptional reprogramming contributes to the progression and recurrence of cancer. However, the poorly elucidated mechanisms of transcriptional reprogramming in tumors make the development of effective drugs difficult, and gene expression signature is helpful for connecting genetic information and pharmacologic treatment. So far, there are two gene-expression signature-based high-throughput drug discovery approaches: L1000, which measures the mRNA transcript abundance of 978 “landmark” genes, and high-throughput sequencing-based high-throughput screening (HTS2); they are suitable for anticancer drug discovery by targeting transcriptional reprogramming. L1000 uses ligation-mediated amplification and hybridization to Luminex beads and highlights gene expression changes by detecting bead colors and fluorescence intensity of phycoerythrin signal. HTS2 takes advantage of RNA-mediated oligonucleotide annealing, selection, and ligation, high throughput sequencing, to quantify gene expression changes by directly measuring gene sequences. This article summarizes technological principles and applications of L1000 and HTS2, and discusses their advantages and limitations in anticancer drug discovery.


INTRODUCTION
Transcriptional reprogramming is a cause of cancer progression and recurrence. Gurdon first confirmed that differentiated somatic cells were plastic in nature and are reprogrammable into other cell fates (1). A cancer cell may present multiple phenotypes by reprogramming and changing its identity, inducing heterogeneity among tumor cells (2). Tumor heterogeneity is the major cause of drug resistance in cancer. The cancer stem cell (CSC) model and the clonal evolution model can be used to explain tumor heterogeneity. It was proposed that CSCs are derived from genetically and epigenetically altered stem cells or progenitor cells and possess self-renewal potential to sustain tumor mass, immune escape and drug resistance (3,4). The clonal evolution model results from the inherent genomic instability of cancer cells, leading to genetic and epigenetic changes (4). The epigenetic changes, such as DNA methylation and histone acetylation, are vital for cancer progress (5). It is clear that transcriptional reprogramming involves almost all of these regulations.
Transcriptional reprogramming drives the diversification of tumor cells and causes tumor deterioration, such as hyperproliferation, invasion, metastasis, immune evasion, and drug resistance, and eventually causes cancer progression and recurrence. Hence, transcriptional reprogramming has emerged as a promising drug target for cancer therapy.
The detailed regulation mechanisms of transcriptional reprogramming are still poorly understood, and this makes effective drug discovery against this process difficult. Genomic instability (6), transcriptional factors (7)(8)(9), DNA methylation of tumor suppressor genes (10), unbalanced histone modifications (11,12), aberrant Wnt signal pathway (13), PI3K signaling (14), TGF-b, and Erk/MAPK signaling (15) have been reported as some reasons for cell reprogramming and malignant transformation. Wang et al. established a principle for cell type-specific transcriptional reprogramming: Cell type-specific factors coupled with general transcriptional factors, which form a new cell-specific enhancer network, that other regulated factors can activate, and this may promote tumor cell progression (16). However, these discoveries explain only a limited part of transcriptional reprogramming. Thus, further elucidation of the transcriptional reprogramming mechanisms in normal and cancer cells may help develop cancer therapy strategies.
The gene expression signature might be a suitable readout for high-throughput drug discovery targeting transcriptional reprogramming. The expression changes of a group of interesting genes occur as a result of transcriptional reprogramming. This review, summarizes two published geneexpression signature-based high-throughput drug discovery strategies targeting transcriptional reprogramming: L1000 and high-throughput sequencing-based high-throughput screening (HTS 2 ), introducing their technological principle and discussing their applications in drug discovery.

L1000 AS A LUMINEX BEAD-BASED HIGH-THROUGHPUT SCREENING STRATEGY
L1000 is used to generate the next generation Connectivity Map (CMap) with higher throughput (17). CMap, which connects small molecules, genes, and diseases through gene signature, was first piloted in 2006 (18). By treating MCF7, PC3, HL60, and SKMEL5 cells with 164 distinct compounds and analyzing mRNA expression using Affymetrix microarrays (18), 564 datasets were generated. The small-scale datasets of pilot CMap limit its use as a powerful resource. Therefore, a low-cost approach, L1000, was proposed to produce large-scale gene signatures through a reduced representation of transcriptome (17).
The procedure of L1000 technology includes the following steps (17): cells treated with distinct perturbations in 384-well plates are lysed, and their mRNAs are captured on oligo-dTcoated plates after which it is reverse-transcribed to cDNA. The oligonucleotide probe comprises locus-specific sequences, 24mer unique barcode sequences, and universal primer sequence sites. Then, the oligonucleotide probes are annealed to cDNA, and the juxtaposed upstream and downstream probe pairs are ligated; the upstream probe consists of a unique barcode sequence. After the above process, the ligations are used as a template and subjected to PCR amplification; using the universal 5′ biotinylated T7 primer and T3 primer pairs, the final amplicons are gene-specific, barcoded, and biotinylated. After that, each barcode of the amplification product hybridizes to polystyrene microsphere (bead with fluorescence color) by complementary pairing, and the bead is stained with Streptavidin R-phycoerythrin conjugate. Because beads are available in a maximum of 500 colors, two transcripts are hybridized with the same bead color. Finally, the hybridized beads coupling to barcodes are detected and analyzed using Luminex FlexMap 3D flow cytometer. The colors of beads indicate gene identity, whereas the fluorescence intensity of the phycoerythrin signal refers to gene abundance ( Figure 1).

The Application of L1000 in Cancer Drug Discovery
L1000 was used in discovering synergistic anti-glioblastoma drugs. Glioblastoma is a type of fatal brain cancer, containing highly heterogeneous cell populations. These cell populations have various of gene signatures; therefore, both radiation and chemotherapy for glioblastoma often induce inherent or acquired resistance (20). To overcome resistance, combination therapies are considered. Therefore, glioblastoma patient-specific genes, analyzed using TCGA database and L1000 transcriptional profiling data, were used to predict drugs that produce a synergistic combination against glioblastoma. Because of this, the combinations of GSK-1070916 with JQ1, alisertib with JQ1, gemcitabine with mitoxantrone, and gemcitabine with imatinib were predicted and verified to be synergistic in inhibiting glioblastoma (21). L1000 was applied in finding a drug against renal cell carcinoma (RCC). DDX3X is involved in RNA metabolism (22)(23)(24). DDX3X is epigenetically downregulated in RCC (25). According to transcriptomic analysis, lower levels of DDX3X promote gene expression in the SPINK1-metallothionein pathway, leading to tumor growth, metastasis, and poorer prognosis of RCC patients (25,26). Based on the DDX3X gene signature and L1000 datasets, digoxin was identified to reverse the gene signature generated by low DDX3X, thus inhibiting cell proliferation and metastasis (25).
Some FDA-approved drugs could be repurposed using geneexpression signature and L1000 datasets. HMGA2 encodes a chromatin protein that promotes tumor progression and poor treatment (27)(28)(29)(30). To discover a specific inhibitor that targets HMGA2, the combination of L1000 platform and GEO database were analyzed. According to the analysis, the approved antifungal drug ciclopirox is a novel potential inhibitor that targets HMGA2, and the molecular docking results further showed that ciclopirox directly interacts with the AT-hook motif of HMGA2. The functional assays also showed that ciclopirox represses colorectal cancer cell growth by inducing cell cycle arrest and apoptosis (31).
L1000 was applied to discover drugs against quiescent spheroids. Cells residing within the center of solid tumors lack nutrients and oxygen, and most of these cells are transcriptionally reprogrammable, quiescent, and negative to antiproliferation therapy (32). Senkowski et al. used L1000 technology to generate gene-expression signatures from monolayer cultures and tumor cell spheroids treated with 22 drugs. Spheroids were cultured in fresh culture medium or media that were similar to hypoxic tumor parenchyma. According to the analysis of L1000 expression profiles, the mevalonate pathway is upregulated as a result of oxidative phosphorylation (OXPHOS) inhibition in quiescent cells. Thus, this study indicated that the application of OXPHOS inhibitors (such as salinomycin, nitazoxanide and antimycin A) and mevalonate pathway inhibitors (such as simvastatin) synergistically inhibits quiescent spheroids (32).  Pros and Cons of L1000 L1000 establishment of the causality among drug, gene, and disease provides the mechanism of action of compounds or gene perturbations, and possesses the ability to predict the function or possible side effects of compounds systemically (17). L1000 can detect the expression of as many as 1,000 genes simultaneously. DNA oligos for about 1,000 genes are designed and amplified, the biotin-labeled amplicons are then hybridized to Luminex beads. The beads' colors and the fluorescence signals attached to beads are detected. Based on hybridization, L1000 is capable of expanding the signal of non-abundant transcripts and measuring their expression (17). L1000 is inexpensive, rapid, and flexible when used to profile gene expression on a large scale (19,33). L1000 uses ligationmediated amplification to measure gene expressions, which uses 40nt gene-specific sequences for transcriptome detection other than full-length transcriptome sequencing. Therefore, compared to RNA-seq technology, the reduced representation of transcriptome makes L1000 a cost-effective method for gene expression profiling. Furthermore, several datasets generated by L1000 have been published. Due to the above advantages, it has generated 1,319,138 profiles from 42,080 perturbations on nine cell types and covered 473,647 signatures (17). These datasets are open to the public and should be helpful for researchers seeking to discover drugs against cancer and other human diseases.
However, there are also limitations for L1000. First, only 1,000 genes can be detected. Only 500 bead colors are commercially available, and thus, a maximum of 500 genes (one gene/one color) can be generally identified. Although L1000 allows the detection of two transcripts by a single bead color, which doubles the gene number of identified genes, still the number of genes detected cannot be more than 1,000 (17). Besides, L1000 assay uses polystyrene beads. Although polystyrene beads are the first generation of beads for Luminex assays, their accuracy and precision are reduced, accompanied by leaking and clogging during the protocol of washing in the plates (34).
The protocol of L1000 is complicated. Before beginning L1000 assay, 1000 pairs of gene-specific sequences and 1,000 barcode sequences need to be designed, and Luminex beads need to be joined with 500 barcodes. After preparation for work, the cells are lysed into mRNA, and the mRNA needs to be attached to the oligo-dT plate. After that, mRNA is reverse-transcribed into cDNA. The cDNA servers as a template to combine the specific gene sequences labeled with barcodes, and the upstream and downstream specific gene sequences were ligated using the T4 ligase. Then, the ligations are used as a transcriptional template for amplification using universal primers combined with biotin. Then, the amplicons are hybridized into beads and then phycoerythrin-labeled streptavidin. Finally, the bead's color (gene identity) and the phycoerythrin signal (gene abundance) are detected. The technical characteristics of L1000 are summarized in Table 1.

HTS 2 : HIGH-THROUGHPUT SEQUENCING-BASED HIGH-THROUGHPUT SCREENING
Another high-throughput approach to discover drugs by targeting transcriptional reprogramming is HTS 2 (35). The procedure of HTS 2 is as follows (Figure 2): cells are treated with various perturbations in 384-well plates. Then, cells are lysed, and the mRNA in the lysate is bound to biotin-labeled oligo-dT, joined with streptavidin-coated magnetic beads. After that, upstream oligos (consisting of 5′ universal primer site and 20nt gene-specific sequences) and downstream oligos (containing another 20nt gene-specific sequences adjacent to upstream and 3′ universal primer site) are annealed to mRNA template and ligated with T4 ligase. The ligated products with 40nt gene-specific sequences are used as templates and subjected to PCR amplification. The PCR primers contain a barcode site, which identifies samples; different genes from the same sample share the same barcode. Finally, the amplicons, including barcode and 40nt ligated oligo regions, are sequenced using next-generation sequencing technology (35).
The Application of HTS 2 in Cancer Drug Discovery by Targeting Transcriptional Reprogramming HTS 2 technology is suitable for pathway-centric discovery of anticancer drugs. Androgen receptor (AR) overexpression may lead to androgen resistance and the development of incurable prostate cancer (36). HTS 2 was applied to identify drugs that block the expression of signature genes regulated by AR in prostate cancer cells, which indicates that this candidate drug may inhibit the AR pathway. According to this study, cardiac glycosides block the expression of AR target genes and inhibits the proliferation of androgen-sensitive and androgen-resistant prostate cancer cells by causing AR destabilization (35). HTS 2 facilitates the discovery of anti-metastasis drugs. Tumor metastasis is the movement of tumor cells from a primary site to distant organs that they progressively colonize (8). Tumor metastasis is the cause of death for 90% of cancer patients, and no currently available therapies target this multi-step process. Metastasis may be regulated by transcriptional reprogramming. It was reported that the transcription factor FOXA1 is upregulated and drives the transcriptional reprogramming to promote pancreatic ductal adenocarcinoma cell metastasis (8). Teng et al. discovered that liver metastatic colorectal cancer (CRC) cells acquire higher expression of liver-specific genes than primary colorectal tumor cells. This transcriptional reprogramming is driven by liver-specific FOXA2 and HNF1A, which can bind to reshaped enhancers of liver metastatic CRC cells, and promote CRC liver metastasis (9).
Gene-expression signatures are used to characterize cancer metastasis (37)(38)(39). To effectively discover drugs against breast cancer lung metastasis (BCLM), HTS 2 technology and BCLMassociated gene signature were combined and analyzed. It was found that ponatinib represses the expression of BCLM signature genes through the inhibition of JUN transcription and degradation of the c-Jun protein, ultimately inhibiting BCLM (40). HTS 2 can also be used to explore the mechanisms of action of anticancer herbs. Combined with network pharmacology, HTS 2 was used to unveil the biological basis of medicine with complex ingredients, such as traditional Chinese medicine (TCM) in cancer therapy (41,42) as well as other diseases (43). Zheng et al. utilized HTS 2 to measure the function of 166 compounds derived from TCM on 420 antitumor or immune-related genes. The results from gene signature showed that compounds from healthstrengthening herbs increase immune effects in tumor immune microenvironment and tumor prevention (41). Guizhi Fuling Decoction (GFD) is a classic TCM prescription used in treating gynecological tumors with an unclear mechanism. Dai et al. applied HTS 2 technology along with systemic pharmacology to clarify the mechanisms of GFD in treating breast cancer; this revealed that GFD represses breast cancer through the inhibition of PI3K and MAPK signaling pathways (42). HTS 2 contributes to the discovery of combination immunotherapy agents against triple-negative breast cancer (TNBC). Low objective response rates (ORRs) of solid tumors create immune checkpoint blockade therapy failure in some aggressive cancers (44)(45)(46). ORRs are associated with tumor immunological phenotype (TIP) that leads to the extent of immune cell infiltration (47). Hot tumor is feasibly infiltrated by immune cells, which are regulated by tumor genes, including T helper1 (TH 1 )-type chemokines. Conversely, cold tumors are those that are not infiltrated by immune cells or are immune-ignorant (48). Due to epigenetic alteration and transcriptional reprogramming, hot tumor may be converted to cold tumor, leading to tumor immunosuppression (49). Small molecules can also epigenetically convert cold tumors to hot tumors by altering the gene expression of TH 1 chemokines (50). To increase the ORRs of checkpoint blockade immunotherapy, combination targets or compounds are desirable. Wang et al. first determined the difference in TIP gene signature between cold and hot tumors. Combined with this gene signature, HTS 2 technology was applied for the identification of immunotherapy combination agents in TNBC. The results showed that aurora kinase inhibitors reprogram the expression of TIP gene signature and thus promote effective Tcell infiltration into the tumor microenvironment, significantly improving anti-programmed cell death 1 (PD-1) efficacy in preclinical models (51).

Pros and Cons of HTS 2
First, HTS 2 can detect unlimited genes. It was reported that the expression of >3,000 genes was directly examined in one reaction by HTS 2 (52). In principle, all human genes (~22,000 genes) can be detected by HTS 2 in one reaction, since it takes advantage of the powerful high-throughput sequencing technologies. More importantly, these high-throughput sequencing technologies should be developed and improved quickly, to increase the detection capability of HTS 2 in the future.
Second, HTS 2 directly detects gene expression. HTS 2 detects gene signatures using high-throughput sequencing technology, detecting and quantifying gene expression by reading out their sequence directly. Due to this, the possibility of misreading should be rare. Third, the experimental scheme of HTS 2 is fully amenable to direct transcript analysis in cell lysate and automation, which are two critical parameters for highthroughput applications. The annealing step of HTS 2 is fully compatible in cell lysis containing detergent and high salt. After it is captured by streptavidin-coated magnetic beads, all subsequent washing and ligation steps are conducted on the solid phase. Furthermore, this HTS 2 strategy can be fully implemented on an automated robot (35).
However, there are also some challenges for HTS 2 strategy. First, even though HTS 2 could detect the expression of unlimited genes in principle, the number of detected genes reported so far is no more than 4,000. It would be much better if full transcriptome could be examined in one reaction using HTS 2 in the future. Alternatively, there are only few pieces of literature, which applied this technology, that have been were published so far; more studies need to be published to demonstrate the broad utility of the HTS 2 technology in both basic and translational research. The technical characteristics of HTS 2 are shown in Table 1.

CONCLUSIONS
Transcriptional reprogramming is involved in cancer initiation, progression, and metastasis; thus, it is a potential target for anticancer drug development. L1000 and HTS 2 are geneexpression signature-based high-throughput approaches, suitable for drug discoveries targeting transcriptional reprogramming. Notably, all these two technologies are based on bulk RNA. Recently, the gene expression changes in single cells are making significant impact on the understanding of almost all the processes of life. Meanwhile, single cell RNA sequencing was also reported facilitating drug discovery (53)(54)(55). So far, these technologies of single cell RNA sequencing are limited by high cost or low multiplex for the application in the high throughput drug discovery. However, the measurement of gene expression changes in single cells represents another potential highthroughput approach for the drug discovery to target transcriptional reprogramming. Both of them show advantages as well as limitations. Undoubtedly, these two technologies offer powerful and effective platforms for large-scale genetics and chemical genetics studies, and anticancer drug discovery.

AUTHOR CONTRIBUTIONS
DW and XB designed and supervised the article. LH summarized literatures about transcriptional reprogramming, L1000 and HTS 2 technology, and drafted the manuscript. XHY, XKY, YW, CZ, LQ, DG, SZ, GZ and YD contribute to the writing of this manuscript. All authors contributed to the article and approved the submitted version.