REVIEW article

Front. Plant Sci., 15 November 2023

Sec. Plant Biotechnology

Volume 14 - 2023 | https://doi.org/10.3389/fpls.2023.1252166

Artificial intelligence-driven systems engineering for next-generation plant-derived biopharmaceuticals

  • 1. Plant Genetic Engineering Laboratory, Department of Biotechnology, Bharathiar University, Coimbatore, India

  • 2. Tecnologico de Monterrey, School of Engineering and Sciences, Centre of Bioengineering, Queretaro, Mexico

Article metrics

View details

28

Citations

9,2k

Views

5,1k

Downloads

Abstract

Recombinant biopharmaceuticals including antigens, antibodies, hormones, cytokines, single-chain variable fragments, and peptides have been used as vaccines, diagnostics and therapeutics. Plant molecular pharming is a robust platform that uses plants as an expression system to produce simple and complex recombinant biopharmaceuticals on a large scale. Plant system has several advantages over other host systems such as humanized expression, glycosylation, scalability, reduced risk of human or animal pathogenic contaminants, rapid and cost-effective production. Despite many advantages, the expression of recombinant proteins in plant system is hindered by some factors such as non-human post-translational modifications, protein misfolding, conformation changes and instability. Artificial intelligence (AI) plays a vital role in various fields of biotechnology and in the aspect of plant molecular pharming, a significant increase in yield and stability can be achieved with the intervention of AI-based multi-approach to overcome the hindrance factors. Current limitations of plant-based recombinant biopharmaceutical production can be circumvented with the aid of synthetic biology tools and AI algorithms in plant-based glycan engineering for protein folding, stability, viability, catalytic activity and organelle targeting. The AI models, including but not limited to, neural network, support vector machines, linear regression, Gaussian process and regressor ensemble, work by predicting the training and experimental data sets to design and validate the protein structures thereby optimizing properties such as thermostability, catalytic activity, antibody affinity, and protein folding. This review focuses on, integrating systems engineering approaches and AI-based machine learning and deep learning algorithms in protein engineering and host engineering to augment protein production in plant systems to meet the ever-expanding therapeutics market.

1 Introduction

Plant molecular pharming refers to the recombinant expression of biologics including vaccines, hormones, therapeutics and diagnostic reagents in plant-based systems. The field is gaining attention since the biologics produced from plants are efficient and similar to products from other conventional systems with the advantage of eukaryotic host performing post-translational modifications. Some of these recombinant biologics produced in plant systems are SARS-CoV2 virus-like particle (VLPs), spike antigen, anti-SARS-CoV2 mAb H4 and B38, anti-EBV (Ebola virus) mAb 6D8, 4H2 IgG and IgM (against Coccidioides), antimicrobial peptide (AMP) LL-37 and human apolipoprotein A-IMilano (Apo A-IMilano) (Fulton et al., 2015; Holásková et al., 2018; Ali and Kim, 2019; Shanmugaraj et al., 2020; Jugler et al., 2022; Zhao et al., 2023). Various model plant systems have been used as stable or transient heterologous expression hosts for recombinant protein production that include, tobacco (Nicotiana benthamiana and Nicotiana tabacum), Arabidopsis, tomato, potato, rice, maize, soybean, etc. (Ghag et al., 2021; Lobato Gómez et al., 2021). The plant host systems are useful in many aspects such as cost-effectiveness, multimeric protein assembly, scale-up and safety (minimal/no risk of human pathogen contaminations). Even with the listed advantages, there are few limitations to use plants as expression systems such as lack of humanized N-glycosylation post-translational modification which is needed for antibody production and stability of plant-produced proteins are still a concern (Sethi et al., 2021). Recombinant biologics production is dependent on several factors such as vector construction, codon optimization, regulatory components, protein localization and glycosylation (Amack and Antunes, 2020; Jin et al., 2022; Mirzaee et al., 2022; Moon et al., 2022; Zhao et al., 2023).

Systems Engineering in biology can be defined as a holistic approach that analyzes, models, alters, optimizes, and regulates the complex processes of biological systems resulting in desired functions. Artificial Intelligence (AI) refers to the development of machines and systems that use algorithms and statistical models to analyze data, identify patterns and can perform/outperform tasks that demand human intelligence in learning, reasoning, planning, communicating, and problem-solving (Russell, 2010). Machine Learning (ML) is a subset of AI that enables the systems to learn by providing abundant training datasets and is classified into supervised, unsupervised and semi-supervised learning algorithms. Supervised algorithms are the most used of the three since they are developed using labelled datasets from databases with minimum data redundancy, feature extraction, analysis & selection of main traits, prediction methods, and performance evaluation. They provide an excellent prospect for biologists in identifying patterns of gene expression and relevant features, thereby governing the identification through deep understanding of different combinations of the responsible factors (Singh et al., 2016; Silva et al., 2019). Deep Learning (DL) is a network-based supervised learning method with multiple layers of simple modules pooled and arrayed for learning, computing, and mapping a big dataset through each layer. It takes advantage over other AI-based ML algorithms in exploring complex structures of high-dimensional data built from the simplest layers (Lecun et al., 2015). Industry 4.0 revolutionizes traditional practices of manufacturing in industrial settings with the integration of digital technologies, automation, and data exchange, which concourses physical and digital systems leading to increased efficiency, productivity and innovation. Intervention of automation, cyber-physical systems, internet of things (IoT) and big data analytics would prove to be efficient and robust in plant-based biologics production (Dubey et al., 2018; Chen et al., 2020).

AI has been used in recombinant biologics production in host systems such as mammalian cells (CHO and HEK293), yeast (Saccharomyces cerevisiae and Pichia pastoris) and bacterial (Escherichia coli and Bacillus subtilis) systems (Van Brempt et al., 2020; Smiatek et al., 2021; Feng et al., 2022a; Li et al., 2022a; Packiam et al., 2022). Application of AI or ML algorithms include protein engineering, protein-protein interaction, stability, localization, solubility, functional motif prediction and catalytic activity which increases the production and functionality of recombinant proteins (Han et al., 2019; Jiang et al., 2021; Feng et al., 2022a; LaFleur et al., 2022; Masson et al., 2022; Kalemati et al., 2023). Till date, AI finds very least or no intervention in plant molecular pharming. In this review, we discuss about the systems biology concepts with the introduction of AI, as shown in Figure 1, in different aspects of recombinant biologics production to increase the stability, functionality and applications of AI-based ML algorithms in engineering systems to overcome the challenges and to enhance the production of next generation plant-based biologics.

Figure 1

2 Advantages of plant expression system

The market size of plant-based biologics was valued at $116.1 million during the year 2021, and with the compound annual growth rate (CAGR) at 4.8%, it is being estimated to reach $182.9 million by the year 2031. Few of the major plant-based production firms include Leaf Expression Systems, Zea Biosciences, Plant Biotechnology Inc., InVitria, Mapp Biopharmaceutical and PlantForm (Allied Market Research, 2023). Very few plant-based recombinant therapeutics have been commercialized following development and many are under clinical trials (He et al., 2021; Lobato Gómez et al., 2021). Elelyso, taliglucerase alfa, produced in carrot cell culture by ProtalixBio Therapeutics was approved by FDA in 2012 to treat Gaucher disease and has been commercialized (Mor, 2015). ZMapp – an antibody cocktail produced in N. benthamiana by Leaf Biopharmaceutical (commercialization arm of Mapp Biopharmaceutical) was used to treat Ebola outbreak under emergency use authorization during 2014 in Africa (Qureshi, 2016). Recombinant growth factors were produced in the endosperm of barley grain by ORF Genetics and have been commercialized as skincare products (ORF Genetics, 2023). Covifenz, a plant-based SARS CoV2 VLP vaccine against COVID19, developed by Medicago was authorized by Health Canada during 2022 (Hager et al., 2022).

Protein-based pharmaceutical products are growing rapidly in recent years and most of them are produced in mammalian and microbial expression systems. Now-a-days, plant systems have emerged as an alternative platform for large scale production of recombinant proteins as they necessitate no capital-intensive infrastructure, bioreactors, or expensive culture media, but may be quickly scaled in low-cost greenhouses using simple reagents (Chen and Davis, 2016). When compared with prokaryotic and other host systems, plants offer an alternative bioreactor system for recombinant expression due to their glycan profile and cost-effective management system (Schillberg et al., 2019). Apart from the advantages mentioned above, plant systems are human pathogen free, sterile conditions are not required during production and scalable due to open-field cultivation (Buyel, 2019). For all these reasons plant expression system has been established as a prominent bioreactor for the production of therapeutic proteins such as vaccines, therapeutic proteins and growth hormones (Limkul et al., 2016; Moon et al., 2022).

Each expression host has its advantages and limitations. For instance, mammalian cell systems are capable of inherently producing recombinant biologics in humanized form, but it is difficult to maintain cell lines free from human pathogens and contaminants (Sethi et al., 2021). Plant system has many advantages over other systems including rapid (production of recombinant protein starts at day 2-3 post infiltration), cost-effective (produced at a cost of $0.27 for 3 mg dose of recombinant AMP), scale-up (increasing the plant biomass as required and thereby protein yield), purity (up to 99%), safety (production without any contaminant interference and functionally safe in humans) and post-translational modifications (N-glycosylation in engineered tobacco plants, which prokaryotic host system lacks). These advantages can be briefed with an example each using N. benthamiana transient expression host system. SARS-CoV2 RBD (Receptor binding domain) Fc fusion vaccine candidate was expressed in N. benthamiana and was extracted 4 days post infiltration which gave an yield of 25 µg/g FW (Siriwattananon et al., 2021). Alam et al. (2018) were able to produce antiviral compound Griffithsin at 99% purity from tobacco plant. Two mAb isotypes, 4H2 IgG and 4H2 IgM antibodies against Coccidioides CTS1 (Valley Fever) antigen were expressed in N. benthamiana plants showing homogenous N-glycosylation profile with a dominant GnGn/GnM structure, highly similar to mammals. Techno-economic analysis by McNulty et al. (2020) of N. benthamiana-based recombinant protein production reveals that the plant can produce up to 4 g of protein per kg FW (g/kg FW) with the yield up to 300 kg of recombinant protein per year through transient expression.

3 Systems engineering approaches to produce recombinant biopharmaceuticals in plants

Plant-based biologics have emerged as a promising alternative for therapeutics production due to their low-cost and scalable nature. This is critical for meeting the demand for immunizations during pandemics. Production of recombinant therapeutics in plants can be achieved by either stable or transient expression. Stable expression systems are developed by nuclear transformation or chloroplast transformation through Agrobacterium-mediated or biolistic gene transfer (Gelvin, 2003; Tien et al., 2019; Bolaños-Martínez et al., 2020; Heenatigala et al., 2020; Kumar and Ling, 2021). Meanwhile, transient expression systems are developed by plant virus-based vectors or agroinfiltration. Stable expression systems possess advantages including scale-up, low storage costs, glycosylation patterns and reduced cross contamination of animal-borne agents; Transient expression systems are known for their rapid, cost-effective, increased protein accumulation and commercialization potential (Moon et al., 2019). Transient expression of recombinant biopharmaceuticals in plant system is the most preferred mode of production since the system accumulates large quantities of proteins quickly. Different immunogens and therapeutic agents have been produced through transient expression in leaves by agroinfiltration (Iyappan et al., 2018; Page et al., 2019; Rattanapisit et al., 2020).

Proteins reach functional state by proper folding, disulphide bond formation, subunit assembly and post-translational modifications. Prokaryotic host systems pose limitations such as lack of post-translational modifications (glycosylation and sialylation), signal peptide cleavage and pro-peptide processing (Gomord and Faye, 2004). Glycosylation is the most prevalent and diverse type of post-translational modification of proteins shared by all eukaryotic cells. A complex metabolic network and many glycosylation pathways are used during the enzymatic glycosylation of proteins to produce a wide variety of proteoforms (Schjoldager et al., 2020). For instance in humans, N-acetylglucosaminyl transferases IV and V present in Golgi functions in galactosylation, branch elongation and sialic acid capping, which is not found in plants (Strasser, 2022; Strasser, 2023). In order to produce therapeutic proteins of interest in plant with desired glycosylation pattern, β-1,4 galactosyl transferase co-expression and sub-cellular localization to Golgi is preferred (Navarre et al., 2017; Strasser, 2022). Recombinant glycoproteins produced in plants have residues of α1,3-fucose and β1,2-xylose linked to the same core N-glycan. These two sugar residues could be immunogenic since they are absent in human glycoproteins (Margolin et al., 2020a). In Arabidopsis, tobacco, and rice, multiplex CRISPR-Cas9 technology was used to knock out two glycosyl transferases, β1,2-xylosyltransferase and α1,3-fucosyltransferase, in order to humanize glycosylation patterns in plants and produced biopharmaceuticals. The results demonstrate that complete suppression of these two sugar residues was reported in Arabidopsis and tobacco, while the presence of Lewis structure in rice shows that the glycosylation pattern differs between dicots like Arabidopsis and tobacco and monocots like rice (Jansing et al., 2019; Jung et al., 2021). Many therapeutic proteins that are glycosylated need to be sialylated ultimately to fully activate their biological functions, however plants are not capable of N-glycan sialylation, in contrast to mammals. The ability to perform N-glycan sialylation is much sought after in the plant-based biopharmaceutical industry since sialic acids are a frequent terminal alteration on human N-glycans. Plants can be engineered across α2,6-sialylation or α2,3-sialylation pathways that showed active IgG with anti-inflammatory properties and increased pharmacokinetic activity of therapeutics produced in plants (Strasser, 2023). N-glycan sialylation is highly desirable due to its function in extended half-life, stability, solubility, and receptor binding (Bohlender et al., 2020; Chia et al., 2023). A whole mammalian biosynthetic pathway, including the coordinated expression of the genes for (i) biosynthesis, (ii) activation, (iii) transport, and (iv) transfer of Neu5Ac to terminal galactose, has been introduced into N. benthamiana in order to achieve in planta protein sialylation (Izadi et al., 2023).

Recombinant biologics expressed in plants are designed as fusion proteins to contain an N-terminal or C-terminal tag (His, FLAG, HA, CBM3 etc.) for easy purification and analysis. Immobilized metal-ion affinity chromatography is widely used for purification of hexahistidine tagged proteins (Vafaee and Alizadeh, 2018; Islam et al., 2019; Hanittinan et al., 2020; Islam et al., 2020; Marques et al., 2020; Soni et al., 2022). Other techniques such as one-step cation-exchange chromatography, Protein G-/A-based affinity chromatography, diafiltration (antibody purification) and polyelectrolyte precipitation (removal of plant proteins), hydrophobic interaction chromatography (HIC) followed by hydrophobic charge induction chromatography (HCIC) are employed in recombinant plant protein purification (Fulton et al., 2015; Park et al., 2015; Shi et al., 2019; Miura et al., 2020; Lim et al., 2022; Grandits et al., 2023).

4 AI-based ML algorithms in recombinant protein production

Gene designing and genetic engineering are key tools in molecular pharming, which enable the expression of protein of interest in host system, and development of genetically modified organisms with desirable traits. The design of gene and its expression cassette is the first step in getting desired protein in the plant system (Rozov and Deineko, 2019). Proper designing plays a major role in the production of biologics that includes selection of host system, codon optimization, regulatory components associated with foreign gene, host engineering, mode of expression, and purification of biopharmaceuticals (Webster et al., 2017; Peyret et al., 2019; Belcher et al., 2020; Sainsbury, 2020; Hassan et al., 2021; Vazquez-Vilar et al., 2023). AI-based ML algorithms are proven choice for cost-cutting and efficient designing of product manufacturing in different host systems. Few of the competent network models were built on Convolutional Neural Networks (CNNs), a DL architecture inspired from connectivity patterns of animal visual cortex to identify, locate and differentiate objects in any image (Barré et al., 2017). Different AI-based ML and DL algorithms have been developed to increase the recombinant biopharmaceutical production in the hosts by detecting, analyzing and optimizing the conditions such as screening and candidate selection, vector construction, codon optimization, protein modelling and design, growth condition optimization and protein solubilization and purification. A model architecture of CNN is shown in Figure 2.

Figure 2

4.1 AI in codon optimization

Introduction of native genes into alternate host system causes incompatibility in codon usage bias, sequence repeats, % of GC, negative cis-regulatory elements and Shine-Dalgarno sequence (Tuan-Anh et al., 2017; Constant et al., 2023; Jain et al., 2023). Codon bias affects the expression of transgene in the host plant which result in stopping at disfavored codons, truncation, misincorporation or frameshift. Site directed mutagenesis can resolve these problems by introducing silent mutations in coding region of the transgene and help the host species read transgene codon without any hindrance (Ma et al., 2003). Heterologous expression of recombinant proteins in different hosts needs optimization of coding sequences with synonymous codons as the host systems tend to remove heterologous proteins through proteolysis. Further, codon optimization renders the recombinant protein with structural and functional conformation at increased levels of expression in different host systems (Al-Hawash et al., 2017; Argentinian AntiCovid Consortium, 2020; Ding et al., 2022). The codon optimization percentage is proportional to the level of recombinant transgene expression. The amount of expression of the four variants of the bar gene with varying percentages of optimized codons was examined using experimental and in silico methods, and it was found that genes with 50–70% of optimized codons were expressed effectively in N. tabacum (Agarwal et al., 2019). Beta-defensin from chicken called chicken β Gallinacin-3 has demonstrated broad-spectrum antibacterial action against plant infections. Using DNAWORKS3.0 and the Genscript Rare Codon Analysis Tool, chicken β Gallinacin-3 gene sequences were codon optimized and tested. The results demonstrated constitutive expression in Medicago sativa and improved antibacterial activity against E. coli, S. aureus, and Salmonella typhi (Jin et al., 2022). Despite species difference, the codon optimizer program improved translation efficiency in tobacco and lettuce by using codon usage hierarchy of the psbA gene (Kwon et al., 2016). Adiponectin, an adipokine and a cell signaling protein, is produced as a secretory protein in Withania somnifera hairy root culture. Codon usage data, base composition and codon adaptive index (CAI) of W. somnifera were analyzed; the human adiponectin gene sequence was optimized and expressed as secretory product. Optimization of codons increased the expression levels of protein secretion (Dehdashti et al., 2020). The synthesis and expression of therapeutic proteins depend heavily on codon optimization. Effective methods are required to efficiently optimize codons for the generation of recombinant proteins in plants (Webster et al., 2017). Codon usage bias was utilized to optimize nucleotide sequences for host-specific expression in many systems including E. coli, Chinese Hamster Ovary (CHO) cells, HEK293, etc (Al-Hawash et al., 2017; Shayesteh et al., 2020; Lu et al., 2021). Till date, no AI tool has been designed to optimize codons for increasing the plant-based recombinant biologics production. The challenges posed by conventional methods include a vast possibility of codon combinations, irrational effects following transcription and translation, protein misfolding and loss of function (Constant et al., 2023).

Neural network (NN) models identify unexplored patterns in the native DNA sequences from the training set, predicts the most valid coding sequences using the test set and optimize DNA sequence for translation. The NN-optimization is found to be more efficient than conventional methods resulting in significantly higher yields of recombinant biologics (Goulet et al., 2023). Many sequence-based ML algorithms using deep neural networks (DNN) extract features from input codon data, predict and evaluate sequence data. Two major parameters that play a crucial role in codon optimization are 1) codon adaptation index (CAI) and 2) tRNA adaptation index (tAI). CAI is the frequency of codon usage in an organism’s coding DNA sequence (CDS) and tAI is the measure of intracellular tRNA to translate into proteins and individual codon-anticodon pairing efficiency (Sabi et al., 2017; Tuan-Anh et al., 2017; Fu et al., 2020; Constant et al., 2023; Goulet et al., 2023). A Recurrent Neural Network (RNN) model trained sequence was tested for its efficiency by transient transfection of unoptimized and optimized sequences in CHO (ExpiCHO) cells. The titres of model protein, human programmed death ligand 1 (PD-L1) extracellular domain, were quantitated nine days after transfection. The RNN-optimized sequence was expressed largely (179.5 ± 12.4 μg/mL) than the native sequence (104.5 ± 5.7 μg/mL). The RNN model was used in optimization of mAb and stable integration of mAb CDS in CHO-K1-derived cells. The RNN-optimization of CDS yielded 2030 μg/mL and the unoptimized sequence resulted in an yield of 960 μg/mL (Goulet et al., 2023). Influence of AI in bacterial expression system is more than any other eukaryotic systems and so codon optimization was widely carried out through ML-based models. Tuan-Anh et al. (2017) used neural network with CAI and GC content for optimizing codons expressing prochymosin, the chymosin-precursor in E. coli system. Codon optimization could preferably not just used for increasing heterologous recombinant expression, but also for increasing the protein solubility. MPEPE, a newly developed protein solubility prediction DNN model was built using convolution layers, pooling layers and long-short term memory (LSTM) layers. The architecture was built as embedded matrix, through ‘one-hot encoding’ technique using integers ‘1’ and ‘0’, to include synonymous codons of individual amino acids. Point mutation in sites was scrutinized through evolutionary analysis without interfering the protein function. The target nucleotides for expression studies were used as inputs in MPEPE for virtual screening and recombinant proteins were expressed in E. coli BL21 (DE3) cells with an increased level of soluble protein expression (Ding et al., 2022). Bidirectional LSTM Conditional Random Field (BiLSTM-CRF) model is a codon optimization model built for E. coli by H. Fu et al. (2020). The model converts codon optimization to sequence annotation and trains the data of E. coli gene set through word-embedding vector. The multivalent Plasmodium falciparum vaccine antigen FALVAC-1 and PTP4A3, a prognostic cancer biomarker optimized by BiLSTM-CRF were expressed in E. coli BL21 (DE3). The model efficiently optimized the low-expression candidate to higher expression levels, which proved the robustness of the model and the high expression candidate PTP4A3 was expressed in similar levels which proved the stability of algorithm. Jain et al. (2023) designed ICOR (Improving Codon Optimization with RNNs), a DL tool, built on BiLSTM architecture through ‘one-hot encoding’ method, with a large non-redundant dataset of E. coli genomes and upon correlation comparison with the mRNA expression in real-time based on a work by dos Reis et al. (2003), the improvement in expression observed was about 236%. The multilayer network model may be trained for other host systems including model plants (such as N. benthamiana or N. tabacum) as shown in Figure 3 with complete omics dataset through transfer learning approach to increase the yield. CO-BERTa, a deep contextual language model was trained with GFP (Green Fluorescent Protein) and anti-HER2 VHH CDSs on Enterobacterales dataset for functional protein measurement. The mCherry reporter protein which showed 28.7% pairwise identity to GFP and anti-SARS-CoV2 VHH which showed 73.7% pairwise identity to anti-HER2 VHH was chosen to test the model. These proteins differ in their length but share similar structural features, a major feature being β-barrel. ACE (Activity-specific Cell Enrichment) measurement of CO-BERTa codon optimized proteins in SoluPro™ E. coli B strain showed highest expression levels than commercial algorithms (except Genewiz, p<0.05) (Constant et al., 2023). Further, genome analysis and codon usage patterns of plant host systems through artificial neural networks (ANNs) could significantly increase the expression of recombinant biologics (Doyle et al., 2016).

Figure 3

Quantum computers can be used to optimize codons for high expression of proteins. Quantum Annealing (QA) algorithm uses quantum computers to give high-dimensional combinatorial optimization of codons using Binary Quadratic Model (BQM) built on ‘one-hot encoding’ technique. mRNA codons of peptide fragments and full length proteins of SARS-CoV2 spike glycoprotein were optimized using Quantum Approximate Optimization Algorithm (QAOA) (Fox et al., 2021).

Currently, there are no ML-based algorithms available for codon optimization of recombinant proteins to express in plants. The algorithms available for other host systems could be adapted, remodelled and designed for plant-based expression hosts since many of the model plants’ genome is available publicly.

4.2 AI in protein modelling and design

The recombinant proteins expressed in different systems are influenced majorly by factors including structure, solubility, catalytic activity, protein folding and stability. Vector and gene of interest is designed to overcome the challenges of recombinant protein expression. The components of protein modelling include host and expression vector selection, promoter, selectable marker, fusion tags. ML based algorithms enhance the expression and overcome the challenges in expression of recombinant biologics in multiple expression systems. These algorithms analyses and tests (either nucleotides – CDS/RNA-seq or amino acids) sequences and provides with the fitness of protein variants (Wittmann et al., 2021). Few ML models utilize structure along with sequences of amino acids for modelling of proteins. The RNNs and other neural network models are powerful than other ML models since these could learn from raw data directly without any sequence alignment and heuristic scoring (Deep RNN for Protein Function Prediction from Sequence). While molecular dynamics simulations for an antibody through supercomputers require hours and even days, neural networks such as CNN models take only seconds to get the work done in personal computers (Lai, 2022). Regulatory elements are one of the key components of recombinant protein production and synthetic promoters have been designed using ML models to increase the transcription efficiency. Highly functional Synthetic Promoters with Enhanced Cell-State Specificity (SPECS) were identified from a library of 6107 promoters using multiple ML regression algorithms, from which a generalized linear model with elastic net regularization (GLMNET) was chosen as the efficient model to predict highly active promoters. The spatiotemporal activity of each promoter was analyzed by expression of fluorescent protein in HEK-293T cells (Wu et al., 2019). In the work by Vo ngoc et al. (2020), human PolII core promoter was analyzed to create HARPE (high-throughput analysis of randomized promoter elements). The HARPE training dataset included 200,000 variants of promoter sequences and downstream core promoter region (DPR) models were generated by support vector regression (SVR) algorithm and tested in vitro and in HeLa cells. Designing protein includes predicting counterparts, which are involved in structural integrity and stability of proteins (Masson et al., 2022). These include epitope prediction, vaccine designing and remote homology detection, which utilize parts of the protein molecule to increase its activity (Mettu et al., 2016; Moss et al., 2019; Yang et al., 2021b; Koşaloğlu-Yalçın et al., 2022; Routray et al., 2022).

Using DeepLoc, a deep convolutional network Kraus et al. (2017) showed improved performance over traditional approaches in the automated classification of protein subcellular localization in yeast cells. Organelle targeting and sub-cellular localization increases the recombinant therapeutic protein expression in plants to higher levels. Localization of recombinant proteins in cytosol and different plant organelles such as nucleus, chloroplast, mitochondria and endoplasmic reticulum (ER) of plant tissues such as seeds and leaves are useful in increased accumulation and stability of expressed proteins (Vafaee and Alizadeh, 2018; Arcalis et al., 2019; Bidarigh fard et al., 2019; Islam et al., 2019; Shi et al., 2019; Hanittinan et al., 2020; Islam et al., 2020; Li et al., 2022b; Lim et al., 2022). Signal sequences are added to N-terminus or C-terminus of the biologics to increase the yield and a C-terminal ER retention signal is the most widely used strategy to accumulate higher amount of proteins in recombinant expression. Sahu et al. (2021) developed a tool, Plant-mSubP, based on integrated ML approaches with SVM as the model to predict localization of proteins to single and dual organelle targets.

Analysis of the enriched bococizumab yeast cell libraries along with similar library for antibody affinity was done using an ML model, which enabled the identification of rare variants with co-optimized levels of low self-association and high affinity (Makowski et al., 2022). Similarly, mAbs can be screened and optimized for production in specific host systems that could include plants as well (Feng et al., 2022a; Lai, 2022). Proteins such as toxins which are difficult to produce in certain hosts can be expressed easily using deep-learning based CNN algorithms (Pan et al., 2020). A wide range of ML algorithms used in various eukaryotic and prokaryotic systems for modelling different proteins is shown in Table 1.

Table 1

ComponentName of the programType of ML algorithmArchitectureFunction/ParameterModel system/training datasetReferences
mRNAAPARENT (APA REgressionNeT)CNNOne-hot encoded matrix system with two convolutional layers• mRNA isoform prediction and polyadenylation within +10 to +35 nt downstream of 6-base central sequence element (CSE)
• cleavage site prediction across polyA signal
HEK293Bogard et al. (2019)
6-mer Logistic Regression BaselineLinear logistic regressionOne-hot encoded matrix system with 6-mer counts• mRNA isoform prediction and polyadenylation
• cleavage site prediction
mRNA, gene enhancers and proteinDEN (Deep Exploration Network)Deep Convolutional Generative Adversarial Networks (DC-GANs)One-hot encoded matrix
Latent Seed
Sequence Tensor
• polyadenylation signals conformed to mRNA isoforms and 3’ cleavage sites
• differential splicing
• maximum transcriptional activation of gene enhancers
• functional variants of GFP (Green Fluorescent Protein)
HEK293
HeLa
MCF7
CHO
Linder et al. (2020)
APARENTCNN
-GP regression
APA VAE (Variational Autoencoder)Residual Neural Network (ResNet)
KL-bounded DENCNN
Gene interaction and expressionscCapsNetDNNCapsule Neural NetworkDiscovery of gene interactions; closely related in function but presenting differential gene expression pattern in single cell types (based on transcriptome analysis)scRNA-seq dataset including mouse retinal bipolar (mRBC) cells and human peripheral blood mononuclear cells (hPBMC)Wang et al. (2020)
Transcription factorIndependent Component Analysis (ICA)Unsupervised ML-Gene expression and transcriptional regulation in E. coli through transcriptome analysisE. coli K12 RNA-seq expression profilesSastry et al. (2019)
Transcription factor bindingFactorNetConvolutional RNNOne hot encoded 4-row bit matrix, LSTMTranscription Factor (TF) cell type specific binding site prediction. (Eg.TF E2F1 binding to GM12878 and HeLa-S3)DNase-seq, ChIp-seq and RNA-seq data of chromosomes X and 1-22 from ENCODE-DREAM challengeQuang and Xie (2019)
PromotersHybrid biophysical-ML approachRidge regression model-• Synthetic promoter designing
• Identification of -35 and -10 motifs and optimal spacer length
E. coliLaFleur et al. (2022)
Synthetic promoterDL modelDeep CNNTransformer model with BiLSTMDesign regulatory sequences including orthologous promotersRNA-seq data from S. cerevisiae and 10 other Ascomycota speciesVaishnav et al. (2022)
ProteinDeepRHDDNNCNN based bidirectional GRU (Gated Recurrent Units)Remote homology prediction of protein sequences using physico chemical properties and evolutionary informationSCOP1.67 datasetRoutray et al. (2022)
ProteinProtT5pLM (protein language models)
Logistic Regression
Attention based deep dilated residual networks consisting of convolution layers (ResNet CNN)Protein (transmembrane beta barrel proteins – OmpX and variants) structure prediction from sequencesHigh resolution protein 3D structure dataset from ProteinNet12Weissenow et al. (2022)
ProteinML modelLinear regression models including glmnet, partial least squares, averaged neural network, SVM with radial basis function kernel, stochastic gradient boosting, boosted generalized linear model, random forest, cubist and naïve Bayes modelsCaret package in RFactors influencing recombinant protein stability including Molecular weight, cysteine residues and N-linked glycosylationCHO cells expressing human secretomeMasson et al. (2022)
ProteinASPIRERDL modelXGBoost and N-terminal sequence-based CNNPrediction of Non-classical secreted proteins (NCSPs)Gram positive bacteria NCSPs dataset from UniProtWang et al. (2022)
ProteineUniRepDL NNUniRep multiplicative LSTMProtein, avGFP and TEM-1 β-lactamase, engineering (Low-N engineering) using small number of functional variantsE. coli DH5αBiswas et al. (2021)
ProteinUniRepSVMRNNPrediction of recombinant gene expression and protein solubilityB. subtilisMartiny et al. (2021)
LR
Random Forest (RF)
ANN
ProteinECNetRNNBiLSTM, Transformer architecture with TAPE integrationProtein fitness prediction based on evolutionary context, engineered TEM-1 β-lactamase variants showing enhanced ampicillin resistanceE. coli DH5α
Diverse large-scale deep mutational scanning (DMS) datasets and random mutagenesis datasets
Luo et al. (2021)
ProteinEPSOLKeras based DL modelMultidimensional Embedding, multi-convolutional-pooling module and a Multi-layer Perceptron (MLP)Protein solubility predictionHeterologous expressed E. coli soluble and insoluble protein dataset compiled by Smialowski et al. (2012)Wu and Yu (2021)
ProteinDEEPredMulti-layered perceptrons (MLPs)Feed-forward multitask DNNSequence/Gene Ontology (GO) based functional definition prediction of proteinsPseudomonas aeruginosa strain reference genome and UniProtKB/Swiss-Prot datasetSureyya Rifaioglu et al. (2019)
ProteinML modelsGANsGenerator Neural Network and Discriminator Neural NetworkPrediction of Protein solubilityeSol database datasetHan et al. (2019)
Logistic regression
Decision Tree
SVM
Naïve Bayes
Cforest
XGboost
ANNs
ProteinDeepSolDL modelCNN, non-linear high-dimensional k-mer vector spaces, deep feed-forward neural network (FFNN)Protein solubility predictionHeterologous expressed E. coli soluble and insoluble protein dataset compiled by Smialowski et al. (2012)Khurana et al. (2018)
ProteinMLRNNBiLSTM, One-hot encoded matrixIdentification and function prediction of protein homologs including iron sequestering proteins, cytochrome P450, serine and cysteine proteases and G-Protein coupled receptors, detection through fluorescence (GFP)E. coliLiu (2017)
ProteinSPIDER2Deep learning neural networkStacked sparse autoencoderProtein secondary structure, solvent accessible surface area, main chain torsion angle predictionNon-redundant high resolution protein structures datasetYang et al. (2017)
Amyloidogenic proteinsAbsoluRATESVMSequence-based regressionAggregation kinetics prediction of amyloidogenic proteinsCPAD 2.0 database datasetRawat et al. (2021)
AntibodyDeepAbDeep residual convolutional network (Deep RCN) with Rosetta-based protocolRNN, BiLSTM, LSTMAntibody Fv structure prediction from sequenceObserved Antibody Space (OAS) database, SAbDab databaseRuffolo et al. (2022)
AntibodyDeepH3Deep residual networkOne dimensional and two dimensional convolutionsPrediction of de novo CDR H3 loop structuresRosetta and SAbDab datasetRuffolo et al. (2020)
mAbssolPredictESM1b-based Multilayer perceptron (MLP2Layer) transfer learning modelPretrained protein language model
EMS1b embedding
• Rapid, large-scale high throughput screening of mAb sequences (IgG1, IgG2 and IgG4) and quantitative solubility prediction eliminating precipitation in Histidine pH 6.0 (H6) buffer system
• Eliminates the need for 3D modelling
HEK293/CHOFeng et al. (2022a)
mAbs/IgG1DeepSCMScikit-learnCNN architectureMolecular dynamics simulation to screen high concentration antibody viscosity predictionSAbDab and AbYsis database datasetLai (2022)
Keras v2.7.0-
Multiepitope vaccineDeepVacPredDNN-VMulti-layer CNN and a 4-layer linear neural networkDesigning vaccine subunit containing both T- and B-cell epitopes of Spike glycoprotein against SARS-CoV2E. coli K12Yang et al. (2021b)
T-cell EpitopeAntigen eXpression based Epitope Likelihood-Function (AXEL-F)/NetMHCpan 4.1 combination-Neural networks• Expression of source antigen; T cell epitope prediction and peptide presentation to MHC Class I molecule
• SARS-CoV2 epitope prediction
IEDB HLA class I ligands dataset;
RNA-Seq data of HeLa cells;
SARS-CoV2 expression dataset from Finkel et al. (2021)
Koşaloğlu-Yalçın et al. (2022)
T-cell Epitope-Epitope likelihoodAggregate z-score, structure-based processing likelihoodP. aeruginosa endotoxin domain III (PE-III) epitope predictionP. aeruginosaMoss et al. (2019)
T-cell Epitope-Epitope likelihoodAggregate z-scoreCD4+ T-cell epitope prediction in bacterial and viral antigens without genotype information through antigen processing constraint modellingSequence data from different studies in C57BL/6 mice, HLA-DR4-transgenic mice and humansMettu et al. (2016)
Protein localizationMULocDeepBayesian optimization & Attention visualizationLSTMProtein localization in organelles such as nucleus, mitochondria, plastid and thylakoid and extracellular matrixMitochondrial proteome data of A. thaliana cell cultures, Solanum tuberosum tubers, Vicia faba rootsJiang et al. (2021)
Protein localizationPlant-mSubPSVMOvR (One-vs.-Rest)Single- and dual- organelle targeting/subcellular localization of proteins in plantsPlant protein sequence dataset from Uniprot DatabaseSahu et al. (2021)
Cytokines and peptidesProtConvTransfer learning CNNLSTM, ResNet and Transformer with TAPE embedding
LeNet-5 architecture
Function prediction of proinflammatory cytokines and anticancer peptidesIEDB and CancerPPD database datasetSara et al. (2021)
PeptideFBGAN (Feedback GAN)GANsRNN and Feedback loop training architecture• Generation of synthetic AMPs and α-helical peptide coding genes
• Optimization of secondary structure
Uniprot database datasetGupta and Zou (2018)
Peptide-MHC Class I bindingCapsNet-MHCCNNCapsule Neural NetworkPrediction of interaction between allelic variants of MHC and peptides with rare sequence lengthsIEDB datasetKalemati et al., (2023)
Peptide-HLA bindingDeepSeqPanIIPan-specific DNN with attention mechanismLSTMPrediction of Peptide-HLA Class II bindingIEDB datasets BD2013 and BD2016Liu et al. (2022b)
MHC Class II Antigen PresentationNNAlign_MACANNNNAlign_MA ML framework• CD4 T cell epitope prediction
• MHC class II antigen presentation prediction
• Prediction of protein-drug immunogenicity
Single allele and Multiple allele dataset & IEDB datasetBarra et al. (2020)
Signal PeptideXGBoostRegression model-Increasing the protein translocation rates to ER by optimizing synthetic signal peptide-protein (mAb/ScFv) complex formationCHO-K1 cellsO’Neill et al. (2023)
Signal peptideSequence-to-sequence modelAttention-based neural networkTransformer modelSignal peptide prediction from Amylase, lipase, protease and xylanase enzymesB. subtilisWu et al. (2020)
Signal peptideSignalP 5.0DL modelNon-linear PSSMs (position specific scoring matrix), BiLSTM and a conditional random fieldPeptide identification (three classes including Sec/SPI, Sec/SPII, Tat/SPI) in prokaryotesReference proteomes of E. coli K12 and S. cerevisiaeAlmagro Armenteros et al. (2019)
Toxic motifsToxDLDeep CNNBidirectional GRU, one-hot encoded matrixToxicity assessment of genetically engineered organisms by highlighting toxic motifs and alteration of toxicityToxic/venom protein dataset from Animal Toxin Annotation Project in UniProtPan et al. (2020)
Domain2VecSkip-gram model
NSAIDEnsemble Decision Tree (DT)Extremely Random Tree (ET)Multiple base trees with bagging strategyNon-steroidal anti-inflammatory drug, Oxaprozin, solubility in supercritical CO2 fluidOxaprozin solubility dataset from Khoshmaram et al. (2021)Alshehri et al. (2022)
Random Forest (RF)
Gradient BoostingSequence of base predictors

AI in protein modelling and design.

4.3 ML models in engineering strains for recombinant protein production

A large repertoire of omics data is obtained from the host system at different levels of replication (genome), transcription (transcriptome), translation (proteome), and regulation (metabolome). These data can be used to engineer host cells to improve recombinant protein yield (Ramzi et al., 2020; Samoudi et al., 2021). ML algorithms can be implemented in understanding the genome-scale metabolic models (GEMs), which encompasses hundreds of metabolic pathways and thousands of metabolic reactions. ML can be a stand-alone or a complementary approach, in learning regulatory levels of complex pathways in plants such as transcriptional, translational and allosteric regulation. These ML algorithms are shown to exhibit more robustness than statistical tools (Radivojević et al., 2020; Zhang et al., 2020; Strain et al., 2023).

Multilayer Perceptron (MLP), an NN model was used to analyse the human RNA-seq data from ARCHS4 database based on secretory index (SI) and extrapolated to engineer CHO cells(Zaragoza, 2022). In order to predict yeast cell growth Culley et al. (2020) proposed ML–based data integration techniques, combining gene expression profiles that rigorously assess and compare with computationally generated metabolic flux. A total of 1,143 S. cerevisiae mutants were tested and 27 machine learning methods were analyzed.

ART (Automated Recommendation Tool) and EVOLVE algorithm are ML-based Bayesian ensemble optimization tools used in increasing the production of tryptophan in yeast, S. cerevisiae. These ML algorithms were used to design 30 different promoter combinations from the transcriptome dataset, which were used to predict engineered strains to show increased productivity. The engineered strain SP606 was found to possess higher synthesis rate of proxy GFP than other strains designed using ML and library preparation. Also, the engineered yeast strain SP606 was identified to have an increased titre and productivity of tryptophan (Zhang et al., 2020). ART was also trained with concentration dataset of proteins/enzymes involved in heterologous pathway for the production of limonene. New strain design sets of E. coli for enhanced production of limonene were provided by ART(Radivojević et al., 2020).

Similarly, supervised learning algorithms have predicted pathway dynamics with the use of multiomics data (proteome and metabolome data) in E. coli for enhancing limonene production (Costello and Martin, 2018). In contrast, an unsupervised ML approach termed as HybridFBA, was proposed by Ramos et al. (2022) that combined GEM and metabolic flux balance analysis (FBA) using principle component analysis (PCA) in CHO cells (Strain et al., 2023). Machine Learning Predictions Having Amplified Secretion (MaLPHAS) by Eden Bio Ltd is an ML algorithm that predicted knock out of five genes, out of which Component of Oligomeric Golgi Complex (cog6) knockout strain resulted in doubled secretion of recombinant protein in the host Komagataella phaffii (P. pastoris) compared with the bgs7 supersecretor strain (Markova et al., 2022).

DCell is a virtual eukaryotic cell composed of 2,526 subsystems embedded as VNNs (visible neural networks), a deep ANN, in hierarchy. The model was built using the hierarchical architecture of subsystems of S. cerevisiae. Being trained on several million genotypes, during simulation, DCell generates patterns of molecular activities based on genotype to phenotype relationship (Ma et al., 2018). DCell can identify gene deletions/knockouts using Gene Ontology (GO), which will result in phenotype change (Ma et al., 2018; Kim et al., 2020).

The ML algorithms and tools can be used to introduce or remove genes from a pathway to direct the increased production of humanized recombinant biologics in plant system. Knock-out approach of removing plant-specific glycans [β(1,2)-Xyl and α(1,3)-Fuc] or knock-in strategy to express human [β(1,4)-Gal]and addition of sialic acid residues in specific host plants result in humanized protein expression. Such mechanisms could be explored and analyzed through ML tools such as ART (Sethi et al., 2021). Also, metabolic flux of host plant systems can be studied to generate stable lines with optimized metabolic pathways for desired post translational modifications of recombinant biologics.

4.4 Automation and AI in plant growth monitoring and biomass production

One of the big attributes of plant molecular pharming for recombinant biologics production, next to host selection and engineering is plant growth and maintenance. Plants are efficient biofactories for the manufacture of recombinant proteins and growth monitoring is a vital aspect when it comes to both laboratory scale and commercial production. Several automation technologies including affordable sensors built on Raspberry Pi, robotics and high-definition cameras work based on image acquisition (Jahnke et al., 2016; Jolles, 2021; Banerjee et al., 2022; Wan et al., 2022). The camera sensors have been deployed to analyze the plant growth patterns, phenotypes such as plant morphology, height, canopy, temperature, leaf biomass, leaf area index, greenness, age and different stresses. Similarly, seed count, shape, size and color, parameters for plant growth such as temperature, photoperiod, grow light color, etc. were studied by robot-assisted systems. A large training dataset of raw images captured in the camera sensors are analyzed through DNN modules and processed for color correction and segmentation for analysis (Jahnke et al., 2016; Ubbens and Stavness, 2017; Tovar et al., 2018; Zheng et al., 2019; Tausen et al., 2020; Bose and Hautop Lund, 2022). The efficient analysis of images are carried out by models based on CNNs that include U-Net, R-CNN and ResNet (Ubbens and Stavness, 2017; Lin et al., 2019; Zheng et al., 2019; Tausen et al., 2020; Bose and Hautop Lund, 2022). The IoT based sensors and programs are not limited to phenotyping the growth and morphology of plants but could detect plant nutrient deficiencies, diseases and soil parameters, thereby reduce the labor intensive maintenance and increase the sustainability (Dhivya et al., 2021; Monteiro et al., 2021; Bose and Hautop Lund, 2022). Plant monitoring and phenotyping using integrated automation and ML approaches is illustrated in Figure 4.

Figure 4

With the wider and large-scale biologics production environment, a large number of sensors in plant monitoring are needed and it becomes highly difficult to build the architecture for plant maintenance. Hence remote sensing using unmanned aerial vehicles (UAVs) is used in place at low altitudes to acquire high-resolution multispectral images of plants grown in agricultural field and greenhouses. The UAV high-throughput phenotyping platform, working on support vector machine (SVM) and SVM-derived models, processes the spectral information of optical images for the identification of plant growth, biomass, stress and disease stages (Maimaitijiang et al., 2020; Fu et al., 2021; Yang et al., 2021a; Aslan et al., 2022; Jiang et al., 2022a; Bai et al., 2023a). Several plants used as hosts in production of recombinant biopharmaceuticals such as Glycine max (L.) Merr. (soybean), Triticum aestivum (wheat), Hordeum vulgare (barley), Oryza sativa (rice), Zea mays (maize), Arachis hypogaea L. (peanut), Arabidopsis thaliana (Arabidopsis), Brassica napus (rapeseed), Lycopersicon esculentum Mill. (Tomtato), Cucumis Linn. (cucumber), L. sativa Linn. (lettuce), Brassica oleracea linn. (cabbage), Raphanus sativus linn. (turnip), Apium graveliens Linn. (celery) and Spinacia oleracea Linn. (spinach) and N. tabacum (tobacco) can be monitored using the sensors for high product yield (Minervini et al., 2015; Jahnke et al., 2016; Minervini et al., 2017; Ubbens and Stavness, 2017; Zheng et al., 2019; Maimaitijiang et al., 2020; Fu et al., 2021; Sangjan et al., 2021; Sarkar et al., 2021; Yang et al., 2021a; Banerjee et al., 2022; Bai et al., 2023a; Bai et al., 2023b; Sun et al., 2023). A detailed list of automation and AI-based tools used in plant monitoring is listed in Table 2. These technologies are not limited to monitoring the mentioned plants but can be extended to all the plant host systems used in expression of recombinant biologics.

Table 2

PlatformAutomation TechnologyImaging DevicePhenotype/ParameterPlant SpeciesReferences
UAV remote sensingMultirotor UAV with CNN architectureXIMEAMQ022MG-CM Camerawith CMOS sensor and 16 mm lens and Sony NEX-7 CameraDisease severity at 25m altitudeO. sativa (rice)Bai et al. (2023a)
High throughput UAV remote sensingDJI Phantom 4 Advanced quadcopterDrone RGB cameraAccurate plant count, location and size determination to distinguish in paddy field at 7m altitudeO. sativa(rice)Bai et al. (2023b)
RiceNetDeep Learning Network
Edge-computing based network monitoringIoT monitoring with deep learning algorithm-based Edge Image Processing ArchitectureRaspberry Pi Camera with 5MP sensor• Plant growth
• Environment and Water quality
-Wan et al. (2022)
GrowBotRobotic system with U-Net: CNNOV5647 CMOS image sensor with Raspberry Pi4Plant growth based on nutrient deficiency and temperature stressOcimum basilicum
(basil)
Bose and Hautop Lund (2022)
AscTec Navigator 3.4.5UAV with built-in GPSAscTec Falcon 8 octocopter (Ascending technologies, Germany) Sony α6000 24.3 MP camera with 20mm f/2.8 lens• Leaf Area Index at 20m altitude
• Leaf/biomass growth
• Vegetation indices
• Chlorophyll index
A. hypogaea L.
(peanut)
Sarkar et al. (2021)
WEKA (Waikato Environment for Knowledge Analysis) software v3.8.4ANN
WOFOSTUAV imaging integration-Leaf area index (LAI), biomass, yieldT. aestivum
(winter wheat)
Yang et al. (2021a)
Hyperspectral ReflectanceMLP, SVM and RF with remote sensingUniSpec-DC Spectral Analysis System (PP Systems International Inc., USA)• Biomass yield
• Plant growth and development stages
G. max
(soybean)
Yoosefzadeh-Najafabadi et al. (2021)
GreenotyperU-Net: CNNsRPi3 Model B with RPi Camera module v2.1• Plant area
• Greenness
• Overlapping growth patterns
Trifolium repens
(white clover)
Tausen et al. (2020)
KerasU-Net based CNN segmentation model2592 x 1944 x 3 resolution camera (5 MP)Powdery mildew disease detectionCucumis sativus
(cucumber)
Lin et al. (2019)
CropDeepRetNet with ResNet50 CNNIoT cameras, Autonomous Spray robots, Autonomous Picking Robots, Mobicamera and Smartphone camera• Precision farming
• Plant identification, growth and location
• Different plant variety monitoring
• Fruit and vegetable health status
25 plant varieties including L. sativa Linn. (lettuce), A. graveliens Linn. (celery), Cucumis Linn. (cucumber), B. oleracea Linn. (cabbage), S. oleracea Linn. (spinach), L. esculentum Mill. (tomato), R. sativus Linn. (turnip)Zheng et al. (2019)
AlexnetCNN-Long-Short Term Memories (LSTM) architectureCanon EOS 650DPlant growth pattern of different genotypesA. thalianaTaghavi Namin et al. (2018)
Persistent Homology based topological methodsDIRT (Digital Imaging of Root Traits)
Gaussian kernel density estimator
Elliptical Fourier descriptors
-• Leaf shape, serrations and root architecture
• Discrimination between genotypes
Solanum pennellii
(wild tomato)
Li et al. (2018)
PlantCVU-Net based CNNRaspberry Pi CameraPlant convex hull, width and lengthA. thalianaTovar et al. (2018)
Nikon COOLPIX L830 CameraSeed size, shape, count and colorChenopodium quinoa Willd. (Quinoa)
LeafNetCaffe framework based Deep Learning CNNLeafSnap, Flavia and Foliage dataset images using Mobile cameras (iPhones mostly)Species identification through leaf features like edges and venationsLeafSnap, Flavia and Foliage datasetBarré et al. (2017)
Deep Plant Phenomics (DPP)Deep CNN with PlantCV moduleCanon PowerShot SD1000 7 MP camera, Model B with Raspberry Pi 5 MP camera moduleLeaf size, shape and leaf countA. thaliana
N. tabacum
(tobacco)
Ubbens and Stavness (2017)
Minervini et al. (2015)
phenoSeederKR 10 scara R600-Z300 robot (KUKA Roboter GmbH, Germany)Oscar F-810C Camera (Allied-Vision Technologies, GmbH, Germany)Seed projected area, length, width and colorB. napus (rapeseed), H. vulgare (barley) and A. thalianaJahnke et al. (2016)
Grasshopper GRAS-50S5M-C Camera (Point Grey, Canada) with 35mm lensSeed volume
UAV remote sensing
SAMPLINGTSPN
UAV and
GPML (Gausian Processes for Machine Learning) Toolbox
MikroKopter, Hexa XL with Multispectral Tetracam CameraNitrogen level prediction at 30m altitudeZ. mays (maize)Tokekar et al. (2016)
DIRT (Digital Imaging of Root Traits)--Root angles (top and bottom), stem diameter, width of root systemZ. mays (maize)Das et al. (2015)
GARNICSRobotic system with ML-based algorithmsRobot head with 4 x Point Grey Grasshopper, 3.45 μm pixels Camera and Schneider KreuznachXenoplan 1.4/17-0903 lenses
Canon PowerShot SD1000 7 MP camera, Model B with Raspberry Pi 5 MP camera module
• Plant detection and localization
• Plant and leaf segmentation
• Leaf shade, appearance and difference detection
• Leaf counting
• Leaf growth tracking
• Classification based on mutant and treatment recognition and age regression
A. thaliana
N. tabacum
(tobacco)
Minervini et al. (2015)

Automation and AI Tools in plant monitoring.

4.5 ML approaches in cell suspension cultures and bioreactors

Plant cell suspension cultures offer a unique platform for the production of recombinant proteins due to their ability to perform post-translational modifications similar to mammalian cells (Gutierrez-valdes et al., 2020). Plant cell suspension cultures are usually prepared from callus tissue in shaker flasks or fermenters to form single cells and small aggregates and growing plant cells in a liquid medium in a controlled environment, such as bioreactor, where various factors like temperature, pH, and ratio of nutrient are to be optimized for cell growth and protein production (Cardon et al., 2019). Several proteins have been produced in bioreactor using cell suspension cultures including ORF8, an accessory protein of SARS-CoV2 in suspension cultured tobacco BY-2 cells (Imamura et al., 2021), rrBChE, rice recombinant butyrylcholinesterase in rice cell suspension culture (Macharoen et al., 2021), LBT-Syn protein in carrot cell suspension culture (Carreño-Campos et al., 2022), taliglucerase (ELELYSO), a recombinant version of human glucocerebrosidase in carrot cell cultures (Mor, 2015) etc.

Large scale production of plant-expressed recombinant proteins can be achieved by growing the transformed plant cell in different bioreactor shapes, however, there are diverse problems to be addressed such as pH of media, minerals, growth regulators, cell density, gaseous atmosphere, agitation system and sterilization conditions (Ruffoni et al., 2010).

Now-a-days AI techniques are increasingly being applied to bioreactors, which are essential tools in bioprocessing for the production of various biological products such as recombinant proteins, vaccines, and biofuels. ML models can identify the optimal operating conditions, such as temperature, pH, dissolved oxygen, and nutrient concentrations, to maximize product yield and quality. By integrating with sensors, data acquisition systems and control algorithms, AI models can analyze data in real time and automatically adjust process parameter accordingly. AI can adapt and adjust process parameters for optimal performance, reducing the need for manual intervention.

Optimizing plant tissue culture media is a complicated and time-consuming process, which is influenced by genotype, mineral nutrients, plant growth regulators, vitamins and other factors. ML approaches such as multilayer perceptron neural network (MLPNN), k-nearest neighbors (KNN) and gene expression programming (GEP) were used for developing prediction models in optimizing plant tissue culture media composition (Hosseini et al., 2022). In another work, three ANN models: CIPnet, CWnet and DCnet were developed to predict the best media composition for callus weight (CW), callus induction percentage (CIP) and days to callus initiation (DC). The performance was satisfactory and showed the R2 values of 0.95, 0.95 and 0.88 for CIPnet, CW, and DCnet respectively (Munasinghe et al., 2020). The formation of foam in bioreactor is another major issue in pharmaceutical industry and creates operational issues. To address the issue in bioreactor, a CNN-based model was developed for the real-time identification of foam formation (Austerjost et al., 2021). Cell proliferation could be monitored through ML based algorithms. An ML model was trained for monitoring insect cell proliferation and viability percentage upon baculovirus infection in the bioreactor (Altenburg et al., 2023).

ANN based ML algorithm was used to control the micro-aerobic conditions to achieve a satisfactory product yield. Metabolic flux-based control strategy technique (SUPERSYS_MCU) was used to address the issue. To generate a surrogate model in the form of an ANN, the control strategy used simulations of a genome-scale metabolic model. The meta-model provided setpoints to the controller, allowing adjustment of the inlet airflow to control oxygen uptake rate (Zangirolami et al., 2021). Application of ANN models in predicting the system performance of osmotic membrane bioreactors (OMBRs) was investigated and such models developed showed good performance for the prediction of water flux and membrane fouling simulations (Viet and Jang, 2021).

Deep learning techniques in a hybrid semi metric modelling contest, such as deep feed forward neural network with varying depths, the rectified linear unit (ReLU) activation function, dropout regularization of network weights, and stochastic training with the ADAM method were explored (Mestre et al., 2022). Performance of ML algorithms was analyzed to predict n-caproate and n-caprylate productivities in bacteria using 16S rRNA amplicons in a bioreactor. The bioreactor performance was analyzed quantitatively and accurately from the dataset generated from different bioreactors. ML models were trained independently and tested with 16S rRNA amplicon sequencing data to predict n-caproate and n-caprylate productivities. The tests concluded that random forest was the best algorithm producing more consistent results with low error rate and more than 90% accuracy in the prediction of n-caproate and n-caprylate (Liu et al., 2022a). To predict the accuracy of real-time liquid level four ML algorithms, multiple linear regression (MLR), artificial neural network (ANN), random forest (RF), and support vector machine (SVM) with radial basis kernel were analyzed and found that ANN and RF models performed well (Yu et al., 2022).

4.6 AI in downstream processing

The market demand of biopharmaceutical products is constantly increasing every year and there is an increasing pressure on price reduction for global access to biological drugs. In order to meet the market demand, significant improvement has been carried out in upstream processes, however the productivity in downstream has not increased accordingly (Ötes et al., 2017). The most challenging phase of therapeutic protein production in industries is the downstream processing (DSP) and DSP is accounting for a large portion of the total production costs. The growing demand and developments in upstream processing of therapeutics have burdened the downstream purification processes, due to high cost and insufficient processing capacity (Li et al., 2019). DSP of recombinant therapeutic proteins involves a series of operation such as filtration, followed by capture, purification, and polishing steps mainly done by chromatography (Gaughan, 2016). Chromatography is considered as the workhorse of DSP because it can selectively enrich the target proteins while eliminating impurities and this is achieved by exploiting differences in molecular properties, such as size, charge and hydrophobicity (Bernau et al., 2022). The development of product specific chromatography-based purification techniques is time consuming and expensive because target proteins make up a small portion of the total protein in the initial plant extract. To address this issue, Buyel and Fischer (2014) created a general downstream procedure for the purification of recombinant proteins produced in plants with diverse features. This was done by concentrating on the resin’s ability to bind tobacco host cell proteins (HCPs) under various conditions such as pH and conductivity.

Recent developments in ML and DL based programs can be utilized to overcome the challenges in downstream processing (Bernau et al., 2022). ML has been applied to chromatography system to monitor real time processing, process optimization, retention time prediction and peak monitoring. In order to predict the chromatographic conditions (i.e., solvents and solvent ratio), three vectorization types such as learned embedding, extended-connectivity fingerprints (ECFP), ECFP encoder+FFNN and three machine learning approaches (FFNN, LSTM and CNN), DNN architectures and a set of hyperparameter values were investigated. The best results were achieved for the prediction of solvents and solvent ratio with ECFP LSTM auto-encoder with FFNN as the supervised machine-learning method with an accuracy of 0.95 for first task and 0.982 for second task respectively (Vaškevičius et al., 2021). Several ML models have been developed so far to address some of the challenges in downstream processing such as XGboost for the prediction of column performance (Jiang et al., 2022b), PeakBot for chromatographic peak prediction (Bueschl et al., 2022), DeepRT for peptide retention time prediction (Ma et al., 2017) and an algorithm to predict the HCPs elution behavior (Buyel et al., 2013).

5 Challenges and current limitations

Plant-based expression systems have several advantages for producing proteins, however, also come with limitations and challenges. Here are few limitations and challenges in plant-based expression systems such as low productivity, post-translational modification, protein stability, biosafety concerns, high costs of downstream processing, regulatory approval, and slow translation to applications (Schillberg et al., 2019; Schillberg and Finnern, 2021; Sethi et al., 2021). Even though the plant expression system is cheaper and more scalable than conventional expression systems, expression yields and appropriate post-translational modifications along the plant secretory pathway remain a challenge for many proteins. For instance, fusion viral glycoproteins often expressed in plants give low yield and may not be properly processed in some cases (Margolin et al., 2020b). In comparison to mammalian systems, plant-based expression systems introduce different glycosylation patterns which could have an effect on the immunogenicity and functionality of proteins. Although difficult, methods for achieving human-like glycosylation patterns in plants are being explored by engineering host systems using CRISPR/Cas9-based technologies. The intellectual property (IP) and regulatory body approval is one of the main hurdles in the adoption of molecular farming compared to commercial microbial and mammalian cell expression systems which have a proven track record, particularly in the field of biopharmaceutical manufacture. As a result, the industry continues to view molecular farming as risky and chooses to depend on its tried-and-true systems in most circumstances (Schillberg and Finnern, 2021). The possible hazards posed by genetically modified (GM) plants or animals, including the effect on biodiversity, ecological interactions, and possibility of unforeseen effects, must be carefully evaluated. There is a risk that the transgenes may unintentionally spread to other organisms through gene flow, such as cross-pollination or horizontal gene transfer. For molecular pharming processes and products to be safe, it is crucial to implement effective containment strategies, risk assessment and mitigation measures. Techniques such as chloroplast expression and transient expression in closed culture systems could circumvent the environmental risk of transgene transmission through pollen (Moon et al., 2019; Feng et al., 2022b).

AI-based tools have been developed and deployed for various microbial expression systems such as E. coli, P. pastoris, S. cerevisiae and mammalian cell expression systems including CHO, HEK293, HeLa and MCF7 (Linder et al., 2020; Van Brempt et al., 2020; Smiatek et al., 2021; Feng et al., 2022a; Li et al., 2022a; Packiam et al., 2022). Plant host system remains an unexplored arena for AI incorporation. Creation and maintenance of AI-based training models is mainly hindered by lack of abundant experimental dataset that include but not limited to genome, transcriptome and metabolome sequences; plant cell culture, plant growth and bioreactor conditions; protein extraction and optimization, purification strategies and relative parameters such as protein localization, structure, stability, catalytic activity and solubility. Such limited training dataset renders the ML approaches overfitting (Feng et al., 2020; van Dijk et al., 2021). Intervention of automation and AI models discussed in Tables 1, 2 to predict the conditions and maintenance for the large-scale production in plants is yet to be established as illustrated in Figure 4. Data integration of multiple parameters discussed in Table 1 is needed for optimal protein expression. Further the generation of training dataset for plant cell culture condition optimization necessitates a large collection of data (van Dijk et al., 2021); and in vitro testing of enormous experimental procedures in different test conditions for an individual recombinant protein production in real-time is laborious; time-consuming; requires well-equipped research facility and investment for growth optimization, plant maintenance and downstream processing (Schillberg et al., 2019; Hesami et al., 2020; Sarker, 2021; van Dijk et al., 2021; Packiam et al., 2022). Even with the available omics data of model plants used in recombinant biologics production, expression training datasets are insufficient for AI-based host engineering and host selection, vector and gene designing, protein modelling, solubility and stability prediction as they are not integrated yet (van Dijk et al., 2021). A large number of data for each parameter (more than 10,000 data points if required) is needed to perform as an effective training dataset (Barré et al., 2017; Hesami et al., 2020; LaFleur et al., 2022; Yang et al., 2023). The illustration in Figure 5 highlights the requirement of training datasets available globally that could build a web of AI-based prediction and optimization tools to tackle the challenges and increase the production of highly active next generation biologics. Several algorithms have been under-utilized or unutilized to increase the recombinant protein yield. ML algorithm could predict the signal peptides and increase the ER translocation rates in CHO cells (O’Neill et al., 2023), and yet not used in exploring recombinant biologics production in plants. CNN-based prediction models have been used effectively for increased protein expression in microbial systems (Zrimec et al., 2020) and so far no tool has been adapted for plant-based expression systems.

Figure 5

6 Conclusion and future directions

Plant molecular pharming offers efficient alternate host systems for expression of recombinant biologics. Moreover, the system is robust and cost-effective compared to other hosts. In this review, the concepts of AI in systems engineering for improved production of recombinant biologics were discussed. Several prediction and optimization parameters are known to increase the yield in different expression hosts and integration of machine learning algorithms is new to the plant molecular pharming field. Such plant-based expression parameters include host engineering, growth and maintenance, protein model designing, glycosylation, sialylation, epitope prediction, antibody identification& optimization, regulatory element prediction & optimization and protein stability and activity. Neural network-based ML models when integrated with systems engineering approaches could be advantageous during the manufacture of humanized forms of biologics at various stages of production including seed selection, germination, plant growth parameter optimization, monitoring, recombinant protein modelling, expression, extraction, purification and downstream processing. GEMs and other omics data availability favor the process of designing and optimization of protein production yet more omics (genomics, proteomics, transcriptomics and metabolomics) based studies are needed for complete utilization of ML tools. Transcriptome and metabolome profiles of specific plant hosts in the form of large training data sets need to be fed into neural networks, which then can be used to test the desired function (such as gene knock-out or knock-in). Similarly, parameters of protein production solely based on plant system are to be created as codes using language models and integrated as hierarchical architectures using neural networks. Datasets trained with the discussed parameters using ML models for protein expression in plants could aid in an effective modelling of recombinant biologics and prediction of accurate conditions for protein expression in different plant hosts including but not limited to N. benthamiana, N. tabacum, L. sativa and O. sativa. Such ML-based techniques will reduce the time frame and cost of reagents in all the levels of plant-based biologics production rendering functional and active products.

Statements

Author contributions

RS proposed the idea of application of AI in plant molecular pharming; SP designed the review manuscript. TG drafted systems biology; SP and TV drafted AI integration concepts and improvised systems biology concepts. RS and BS revised and corrected the manuscript. AS gave expert comments on the technical aspects. All authors contributed to the article and approved the submitted version.

Funding

The authors would like to acknowledge the funding support of University Grants Commission-UK-India Research Initiative (UGC-UKIERI), No.F 184-9/2018(IC), and RashtriyaUchchatar Shiksha Abhiyan (RUSA) 2.0, No. BU/RUSA2.0/BCTRC/2020/BCTRC-CD06, Bharathiar University, India.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

  • 1

    AgarwalP.GautamT.SinghA. K.BurmaP. K. (2019). Evaluating the effect of codon optimization on expression of bar gene in transgenic tobacco plants. J. Plant Biochem. Biotechnol.28, 189202. doi: 10.1007/s13562-019-00506-2

  • 2

    AlamA.JiangL.KittlesonG. A.SteadmanK. D.NandiS.FuquaJ. L.et al. (2018). Technoeconomic modeling of plant-based griffithsin manufacturing. Front. Bioeng. Biotechnol.6. doi: 10.3389/fbioe.2018.00102

  • 3

    Al-HawashA. B.ZhangX.MaF. (2017). Strategies of codon optimization for high-level heterologous protein expression in microbial expression systems. Gene Rep.9, 4653. doi: 10.1016/j.genrep.2017.08.006

  • 4

    AliS.KimW. C. (2019). A fruitful decade using synthetic promoters in the improvement of transgenic plants. Front. Plant Sci.10. doi: 10.3389/fpls.2019.01433

  • 5

    Allied Market Research. (2023). Plant-Based Biologics Market by Product Type (Leaf-based, Seed-Based, Fruit-based, Others), by Source (Carrot, Tobacco, Rice, Duckweed, Others), by Target Disease (Gaucher Disease, Fabry Disease, Others): Global Opportunity Analysis and Industry Forecast. Available at: https://www.alliedmarketresearch.com/plant-based-biologics-market-A74549#:~:text.

  • 6

    Almagro ArmenterosJ. J.TsirigosK. D.SønderbyC. K.PetersenT. N.WintherO.BrunakS.et al. (2019). SignalP 5.0 improves signal peptide predictions using deep neural networks. Nat. Biotechnol.37, 420423. doi: 10.1038/s41587-019-0036-z

  • 7

    AlshehriS.AlqarniM.NamaziN. I.NaguibI. A.VenkatesanK.MosaadY. O.et al. (2022). Design of predictive model to optimize the solubility of Oxaprozin as nonsteroidal anti-inflammatory drug. Sci. Rep.12, 110. doi: 10.1038/s41598-022-17350-5

  • 8

    AltenburgJ. J.KlaverdijkM.CabosartD.DesmechtL.Brunekreeft-TerlouwS. S.BothJ.et al. (2023). Real-time online monitoring of insect cell proliferation and baculovirus infection using digital differential holographic microscopy and machine learning. Biotechnol. Prog.39, e3318. doi: 10.1002/btpr.3318

  • 9

    AmackS. C.AntunesM. S. (2020). CaMV35S promoter – A plant biology and biotechnology workhorse in the era of synthetic biology. Curr. Plant Biol.24, 100179. doi: 10.1016/j.cpb.2020.100179

  • 10

    ArcalisE.IblV.HilscherJ.RademacherT.AvesaniL.MorandiniF.et al. (2019). Russell-like bodies in plant seeds share common features with prolamin bodies and occur upon recombinant protein production. Front. Plant Sci.10. doi: 10.3389/fpls.2019.00777

  • 11

    Argentinian AntiCovid Consortium. (2020). Structural and functional comparison of SARS-CoV-2-spike receptor binding domain produced in Pichia pastoris and mammalian cells. Sci. Rep.10, 21779. doi: 10.1038/s41598-020-78711-6

  • 12

    AslanM. F.DurduA.SabanciK.RopelewskaE. (2022). A comprehensive survey of the recent studies with UAV for precision agriculture in open fields and greenhouses. Appl. Sci.12, 1047. doi: 10.3390/app12031047

  • 13

    AusterjostJ.SöldnerR.EdlundC.TryggJ.PollardD.SjögrenR. (2021). A machine vision approach for bioreactor foam sensing. SLAS Technol.26 (4), 408414. doi: 10.1177/24726303211008861

  • 14

    BaiX.FangH.HeY.ZhangJ.TaoM.WuQ.et al. (2023a). Dynamic UAV phenotyping for rice disease resistance analysis based on multisource data. Plant Phenomics5, 113. doi: 10.34133/plantphenomics.0019

  • 15

    BaiX.LiuP.CaoZ.LuH.XiongH.YangA.et al. (2023b). Rice plant counting, locating, and sizing method based on high-throughput UAV RGB images. Plant Phenomics5, 116. doi: 10.34133/plantphenomics.0020

  • 16

    BanerjeeB. P.SpangenbergG.KantS. (2022). CBM: an ioT enabled liDAR sensor for in-field crop height and biomass measurements. Biosensors12, 16. doi: 10.3390/bios12010016

  • 17

    BarraC.AckaertC.ReynissonB.SchockaertJ.JessenL. E.WatsonM.et al. (2020). Immunopeptidomic data integration to artificial neural networks enhances protein-drug immunogenicity prediction. Front. Immunol.11. doi: 10.3389/fimmu.2020.01304

  • 18

    BarréP.StöverB. C.MüllerK. F.SteinhageV. (2017). LeafNet: A computer vision system for automatic plant species identification. Ecol. Inform.40, 5056. doi: 10.1016/j.ecoinf.2017.05.005

  • 19

    BelcherM. S.VuuK. M.ZhouA.MansooriN.Agosto RamosA.ThompsonM. G.et al. (2020). Design of orthogonal regulatory systems for modulating gene expression in plants. Nat. Chem. Biol.16, 857865. doi: 10.1038/s41589-020-0547-4

  • 20

    BernauC. R.KnödlerM.EmontsJ.JäpelR. C.BuyelJ. F. (2022). The use of predictive models to develop chromatography-based purification processes. Front. Bioeng. Biotechnol.10. doi: 10.3389/fbioe.2022.1009102

  • 21

    Bidarigh fardA.Dehghan NayeriF.Habibi AnbuhiM. (2019). Transient expression of etanercept therapeutic protein in tobacco (Nicotiana tabacum L.). Int. J. Biol. Macromol.130, 483490. doi: 10.1016/j.ijbiomac.2019.02.153

  • 22

    BiswasS.KhimulyaG.AlleyE. C.EsveltK. M.ChurchG. M. (2021). Low-N protein engineering with data-efficient deep learning. Nat. Methods18, 389396. doi: 10.1038/s41592-021-01100-y

  • 23

    BogardN.LinderJ.RosenbergA. B.SeeligG. (2019). A deep neural network for predicting and engineering alternative polyadenylation. Cell178, 91106.e23. doi: 10.1016/j.cell.2019.04.046

  • 24

    BohlenderL. L.ParsonsJ.HoernsteinS. N. W.RempferC.Ruiz-MolinaN.LorenzT.et al. (2020). Stable protein sialylation in physcomitrella. Front. Plant Sci.11. doi: 10.3389/fpls.2020.610032

  • 25

    Bolaños-MartínezO. C.Govea-AlonsoD. O.Cervantes-TorresJ.HernándezM.FragosoG.Sciutto-CondeE.et al. (2020). Expression of immunogenic poliovirus Sabin type 1 VP proteins in transgenic tobacco. J. Biotechnol.322, 1020. doi: 10.1016/j.jbiotec.2020.07.007

  • 26

    BoseR.Hautop LundH. (2022). Convolutional neural network for studying plant nutrient deficiencies. Proc. Int. Conf. Artif. Life Robot.27, 2529. doi: 10.5954/icarob.2022.is2-2

  • 27

    BueschlC.DopplerM.VargaE.SeidlB.FlaschM.WarthB.et al. (2022). PeakBot: Machine-learning-based chromatographic peak picking. Bioinformatics38, 34223428. doi: 10.1093/bioinformatics/btac344

  • 28

    BuyelJ. F. (2019). Plant molecular farming – Integration and exploitation of side streams to achieve sustainable biomanufacturing. Front. Plant Sci.9. doi: 10.3389/fpls.2018.01893

  • 29

    BuyelJ. F.FischerR. (2014). Generic chromatography-based purification strategies accelerate the development of downstream processes for biopharmaceutical proteins produced in plants. Biotechnol. J.9, 566577. doi: 10.1002/biot.201300548

  • 30

    BuyelJ. F.WooJ. A.CramerS. M.FischerR. (2013). The use of quantitative structure-activity relationship models to develop optimized processes for the removal of tobacco host cell proteins during biopharmaceutical production. J. Chromatogr. A1322, 1828. doi: 10.1016/j.chroma.2013.10.076

  • 31

    CardonF.PallisseR.BardorM.CaronA.VanierJ.PierreJ.et al. (2019). Brassica rapa hairy root based expression system leads to the production of highly homogenous and reproducible profiles of recombinant human alpha-L-iduronidase. Plant Biotechnol. J.17 (2), 505516. doi: 10.1111/pbi.12994

  • 32

    Carreño-CamposC.Arevalo-VillalobosJ. I.VillarrealM. L.Ortiz-CaltempaA.Rosales-MendozaS. (2022). Establishment of the carrot-made LTB-syn antigen cell line in shake flask and airlift bioreactor cultures. Planta Med.88, 10601068. doi: 10.1055/a-1677-4135

  • 33

    ChenQ.DavisK. R. (2016). The potential of plants as a system for the development and production of human biologics [version 1; referees: 3 approved]. F1000Research5, 18. doi: 10.12688/F1000RESEARCH.8010.1

  • 34

    ChenY.YangO.SampatC.BhalodeP.RamachandranR.IerapetritouM. (2020). Digital twins in pharmaceutical and biopharmaceutical manufacturing. Processes8, 133. doi: 10.3390/pr8091088

  • 35

    ChiaS.TayS. J.SongZ.YangY.WalshI.PangK. T. (2023). Enhancing pharmacokinetic and pharmacodynamic properties of recombinant therapeutic proteins by manipulation of sialic acid content. Biomed. Pharmacother.163, 114757. doi: 10.1016/j.biopha.2023.114757

  • 36

    ConstantD. A.GutierrezJ. M.SastryA. V.ViazzoR.SmithN. R.HossainJ.et al. (2023). Deep learning-based codon optimization with large-scale synonymous variant datasets enables generalized tunable protein expression. bioRxiv2023, 2.11.528149. doi: 10.1101/2023.02.11.528149

  • 37

    CostelloZ.MartinH. G. (2018). A machine learning approach to predict metabolic pathway dynamics from time-series multiomics data. NPJ Syst. Biol. Appl.4, 114. doi: 10.1038/s41540-018-0054-3

  • 38

    CulleyC.VijayakumarS.ZampieriG.AngioneC. (2020). A mechanism-aware and multiomic machine-learning pipeline characterizes yeast cell growth. Proc. Natl. Acad. Sci.117, 1886918879. doi: 10.1073/pnas.2002959117

  • 39

    DasA.SchneiderH.BurridgeJ.AscanioA. K. M.WojciechowskiT.ToppC. N.et al. (2015). Digital imaging of root traits (DIRT): A high-throughput computing and collaboration platform for field-based root phenomics. Plant Methods11, 112. doi: 10.1186/s13007-015-0093-3

  • 40

    DehdashtiS. M.AcharjeeS.NomaniA.DekaM. (2020). Production of pharmaceutical active recombinant globular adiponectin as a secretory protein in Withania Somnifera hairy root culture. J. Biotechnol.323, 302312. doi: 10.1016/j.jbiotec.2020.07.012

  • 41

    DhivyaS.PriyaS. H.SathishkumarR. (2021). “Opportunities in Agriculture, Biomedicine, and Healthcare,” in Artificial Intelligence Theory, Models, and Applications. Eds. KalirajP.DeviT. (Boca Raton, FL, Oxon, OX: CRC Press), 121.

  • 42

    DingZ.GuanF.XuG.WangY.YanY.ZhangW.et al. (2022). MPEPE, a predictive approach to improve protein expression in E. coli based on deep learning. Comput. Struct. Biotechnol. J.20, 11421153. doi: 10.1016/j.csbj.2022.02.030

  • 43

    dos ReisM.WernischL.SavvaR. (2003). Unexpected correlations between gene expression and codon usage bias from microarray data for the whole Escherichia coli K-12 genome. Nucleic Acids Res.31, 69766985. doi: 10.1093/nar/gkg897

  • 44

    DoyleF.LeonardiA.EndresL.TenenbaumS. A.DedonP. C.BegleyT. J. (2016). Gene- and genome-based analysis of significant codon patterns in yeast, rat and mice genomes with the CUT Codon UTilization tool. Methods107, 98109. doi: 10.1016/j.ymeth.2016.05.010

  • 45

    DubeyK. K.LukeG. A.KnoxC.KumarP.PletschkeB. I.SinghP. K.et al. (2018). Vaccine and antibody production in plants: Developments and computational tools. Brief. Funct. Genomics17, 295307. doi: 10.1093/bfgp/ely020

  • 46

    FengJ.JiangM.ShihJ.ChaiQ. (2022a). Antibody apparent solubility prediction from sequence by transfer learning. iScience25, 105173. doi: 10.1016/j.isci.2022.105173

  • 47

    FengZ.LiX.FanB.ZhuC.ChenZ. (2022b). Maximizing the production of recombinant proteins in plants: from transcription to protein stability. Int. J. Mol. Sci.23, 13516. doi: 10.3390/ijms232113516

  • 48

    FengL.ZhangZ.MaY.DuQ.WilliamsP.DrewryJ.et al. (2020). Alfalfa yield prediction using UAV-based hyperspectral imagery and ensemble learning. Remote Sens.12, 2028. doi: 10.3390/rs12122028

  • 49

    FinkelY.MizrahiO.NachshonA.Weingarten-GabbayS.MorgensternD.Yahalom-RonenY.et al. (2021). The coding capacity of SARS-CoV-2. Nature589, 125130. doi: 10.1038/s41586-020-2739-1

  • 50

    FoxD. M.BransonK. M.WalkerR. C. (2021). mRNA codon optimization with quantum computers. PloS One16, 116. doi: 10.1371/journal.pone.0259101

  • 51

    FuH.LiangY.ZhongX.PanZ.HuangL.ZhangH.et al. (2020). Codon optimization with deep learning to enhance protein expression. Sci. Rep.10, 17617. doi: 10.1038/s41598-020-74091-z

  • 52

    FuY.YangG.SongX.LiZ.XuX.FengH.et al. (2021). Improved estimation of winter wheat aboveground biomass using multiscale textures extracted from UAV-based digital images and hyperspectral feature analysis. Remote Sens.13, 122. doi: 10.3390/rs13040581

  • 53

    FultonA.LaiH.ChenQ.ZhangC. (2015). Purification of monoclonal antibody against Ebola GP1 protein expressed in Nicotiana benthamiana. J. Chromatogr. A1389, 128132. doi: 10.1016/j.chroma.2015.02.013

  • 54

    GaughanC. L. (2016). The present state of the art in expression, production and characterization of monoclonal antibodies. Mol. Divers.20, 255270. doi: 10.1007/s11030-015-9625-z

  • 55

    GelvinS. B. (2003). Agrobacterium-mediated plant transformation: the biology behind the “gene-jockeying” tool. Microbiol. Mol. Biol. Rev.67, 1637. doi: 10.1128/MMBR.67.1.16-37.2003

  • 56

    GhagS. B.AdkiV. S.GanapathiT. R.BapatV. A. (2021). Plant platforms for efficient heterologous protein production. Biotechnol. Bioprocess Eng.26, 546567. doi: 10.1007/s12257-020-0374-1

  • 57

    GomordV.FayeL. (2004). Posttranslational modification of therapeutic proteins in plants. Curr. Opin. Plant Biol.7, 171181. doi: 10.1016/j.pbi.2004.01.015

  • 58

    GouletD. R.YanY.AgrawalP.WaightA. B.MakA. N. S.ZhuY. (2023). Codon optimization using a recurrent neural network. J. Comput. Biol.30, 7081. doi: 10.1089/cmb.2021.0458

  • 59

    GranditsM.Grünwald-GruberC.GastineS.StandingJ. F.ReljicR.TehA. Y.-H.et al. (2023). Improving the efficacy of plant-made anti-HIV monoclonal antibodies for clinical use. Front. Plant Sci.14. doi: 10.3389/fpls.2023.1126470

  • 60

    GuptaA.ZouJ. (2018). Feedback GAN (FBGAN) for DNA: a novel feedback-loop architecture for optimizing protein functions. arXiv Prepr. arXiv1804, 1694. doi: 10.48550/arXiv.1804.01694

  • 61

    Gutierrez-valdesN.HäkkinenS. T.LemassonC.GuilletM.RitalaA.CardonF. (2020). Hairy root cultures — A versatile tool with multiple applications. Front. Plant. Sci.11, 1–11. doi: 10.3389/fpls.2020.00033

  • 62

    HagerK. J.Pérez MarcG.GobeilP.DiazR. S.HeizerG.LlapurC.et al. (2022). Efficacy and safety of a recombinant plant-based adjuvanted covid-19 vaccine. N. Engl. J. Med.386, 20842096. doi: 10.1056/nejmoa2201300

  • 63

    HanX.WangX.ZhouK. (2019). Develop machine learning-based regression predictive models for engineering protein solubility. Bioinformatics35, 46404646. doi: 10.1093/bioinformatics/btz294

  • 64

    HanittinanO.OoY.ChaothamC.RattanapisitK.ShanmugarajB.PhoolcharoenW. (2020). Expression optimization, purification and in vitro characterization of human epidermal growth factor produced in Nicotiana benthamiana. Biotechnol. Rep.28, e00524. doi: 10.1016/j.btre.2020.e00524

  • 65

    HassanM. M.ZhangY.YuanG.DeK.ChenJ. G.MucheroW.et al. (2021). Construct design for CRISPR/Cas-based genome editing in plants. Trends Plant Sci.26, 11331152. doi: 10.1016/j.tplants.2021.06.015

  • 66

    HeW.BaysalC.Lobato GómezM.HuangX.AlvarezD.ZhuC.et al. (2021). Contributions of the international plant science community to the fight against infectious diseases in humans—part 2: Affordable drugs in edible plants for endemic and re-emerging diseases. Plant Biotechnol. J.19, 19211936. doi: 10.1111/pbi.13658

  • 67

    HeenatigalaP. P. M.SunZ.YangJ.ZhaoX.HouH. (2020). Expression of lamB vaccine antigen in wolffia globosa (Duck weed) against fish vibriosis. Front. Immunol.11. doi: 10.3389/fimmu.2020.01857

  • 68

    HesamiM.NaderiR.TohidfarM.Yoosefzadeh-NajafabadiM. (2020). Development of support vector machine-based model and comparative analysis with artificial neural network for modeling the plant tissue culture procedures: Effect of plant growth regulators on somatic embryogenesis of chrysanthemum, as a case study. Plant Methods16, 115. doi: 10.1186/s13007-020-00655-9

  • 69

    HoláskováE.GaluszkaP.MičúchováA.ŠebelaM.ÖzM. T.FrébortI. (2018). Molecular farming in barley: development of a novel production platform to produce human antimicrobial peptide LL-37. Biotechnol. J.13, 1700628. doi: 10.1002/biot.201700628

  • 70

    HosseiniM. S.ArabM. M.SoltaniM.EftekhariM. (2022). Predictive modeling of Persian walnut (Juglans regia L.) in vitro proliferation media using machine learning approaches : a comparative study of ANN, KNN and GEP models. Plant Methods18 (1), 124. doi: 10.1186/s13007-022-00871-5

  • 71

    ImamuraT.IsozumiN.HigashimuraY.OhkiS.MoriM. (2021). Production of ORF8 protein from SARS − CoV − 2 using an inducible virus − mediated expression system in suspension − cultured tobacco BY − 2 cells. Plant Cell Rep.40, 433436. doi: 10.1007/s00299-020-02654-5

  • 72

    IslamM. R.ChoiS.MuthamilselvanT.ShinK.HwangI. (2020). In vivo removal of N-terminal fusion domains from recombinant target proteins produced in nicotiana benthamiana. Front. Plant Sci.11. doi: 10.3389/fpls.2020.00440

  • 73

    IslamM. R.KwakJ. W.LeeJ.-s.HongS. W.KhanM. R. I.LeeY.et al. (2019). Cost-effective production of tag-less recombinant protein in Nicotiana benthamiana. Plant Biotechnol. J.17, 10941105. doi: 10.1111/pbi.13040

  • 74

    IyappanG.ShanmugarajB. M.InchakalodyV.MaJ. K.-C.RamalingamS. (2018). Potential of plant biologics to tackle the epidemic like situations - case studies involving viral and bacterial candidates. Int. J. Infect. Dis.73, 363. doi: 10.1016/j.ijid.2018.04.4236

  • 75

    IzadiS.KunnummelV.SteinkellnerH.WernerS.CastilhoA. (2023). Assessment of transient expression strategies to sialylate recombinant proteins in N. benthamiana. J. Biotechnol.365, 4853. doi: 10.1016/j.jbiotec.2023.02.004

  • 76

    JahnkeS.RousselJ.HombachT.KochsJ.FischbachA.HuberG.et al. (2016). phenoSeeder - A robot system for automated handling and phenotyping of individual seeds. Plant Physiol.172, 13581370. doi: 10.1104/pp.16.01122

  • 77

    JainR.JainA.MauroE.LeShaneK.DensmoreD. (2023). ICOR: improving codon optimization with recurrent neural networks. BMC Bioinf.24, 132. doi: 10.1186/s12859-023-05246-8

  • 78

    JansingJ.SackM.AugustineS. M.FischerR.BortesiL. (2019). CRISPR/Cas9-mediated knockout of six glycosyltransferase genes in Nicotiana benthamiana for the production of recombinant proteins lacking β-1,2-xylose and core α-1,3-fucose. Plant Biotechnol. J.17, 350361. doi: 10.1111/pbi.12981

  • 79

    JiangJ.JohansenK.StanschewskiC. S.WellmanG.MousaM. A. A.FieneG. M.et al. (2022a). Phenotyping a diversity panel of quinoa using UAV-retrieved leaf area index, SPAD-based chlorophyll and a random forest approach. Precis. Agric.23, 961983. doi: 10.1007/s11119-021-09870-3

  • 80

    JiangQ.SethS.ScharlT.SchroederT.JungbauerA.DimartinoS. (2022b). Prediction of the performance of pre-packed purification columns through machine learning. J. Sep. Sci.45, 14451457. doi: 10.1002/jssc.202100864

  • 81

    JiangY.WangD.YaoY.EubelH.KünzlerP.MøllerI. M.et al. (2021). MULocDeep: A deep-learning framework for protein subcellular and suborganellar localization prediction with residue-level interpretation. Comput. Struct. Biotechnol. J.19, 48254839. doi: 10.1016/j.csbj.2021.08.027

  • 82

    JinL.WangY.LiuX.PengR.LinS.SunD.et al. (2022). Codon optimization of chicken β Gallinacin-3 gene results in constitutive expression and enhanced antimicrobial activity in transgenic Medicago sativa L. Gene835, 146656. doi: 10.1016/j.gene.2022.146656

  • 83

    JollesJ. W. (2021). Broad-scale applications of the Raspberry Pi: A review and guide for biologists. Methods Ecol. Evol.12, 15621579. doi: 10.1111/2041-210X.13652

  • 84

    JuglerC.GrillF. J.EidenbergerL.KarrT. L.GrysT. E.SteinkellnerH.et al. (2022). Humanization and expression of IgG and IgM antibodies in plants as potential diagnostic reagents for Valley Fever. Front. Plant Sci.13. doi: 10.3389/fpls.2022.925008

  • 85

    JungJ. W.ShinJ. H.LeeW. K.BegumH.MinC. H.JangM. H.et al. (2021). Inactivation of the β (1, 2)-xylosyltransferase and the α (1, 3)-fucosyltransferase gene in rice (Oryza sativa) by multiplex CRISPR/Cas9 strategy. Plant Cell Rep.40, 10251035. doi: 10.1007/s00299-021-02667-8

  • 86

    KalematiM.DarvishiS.KoohiS. (2023). CapsNet-MHC predicts peptide-MHC class I binding based on capsule neural networks. Commun. Biol.6, 492. doi: 10.1038/s42003-023-04867-2

  • 87

    KhoshmaramA.ZabihiS.PelalakR.PishnamaziM.MarjaniA.ShirazianS. (2021). Supercritical process for preparation of nanomedicine: oxaprozin case study. Chem. Eng. Technol.44, 208212. doi: 10.1002/ceat.202000411

  • 88

    KhuranaS.RawiR.KunjiK.ChuangG. Y.BensmailH.MallR. (2018). DeepSol: A deep learning framework for sequence-based protein solubility prediction. Bioinformatics34, 26052613. doi: 10.1093/bioinformatics/bty166

  • 89

    KimG. B.KimW. J.KimH. U.LeeS. Y. (2020). Machine learning applications in systems metabolic engineering. Curr. Opin. Biotechnol.64, 19. doi: 10.1016/j.copbio.2019.08.010

  • 90

    Koşaloğlu-YalçınZ.LeeJ.GreenbaumJ.SchoenbergerS. P.MillerA.KimY. J.et al. (2022). Combined assessment of MHC binding and antigen abundance improves T cell epitope predictions. iScience25, 103850. doi: 10.1016/j.isci.2022.103850

  • 91

    KrausO. Z.GrysB. T.BaJ.ChongY.FreyB. J.BooneC.et al. (2017). Automated analysis of high-content microscopy data with deep learning. Mol. Syst. Biol.13, 115. doi: 10.15252/msb.20177551

  • 92

    KumarA. U.LingA. P. K. (2021). Gene introduction approaches in chloroplast transformation and its applications. J. Genet. Eng. Biotechnol.19 (1), 110. doi: 10.1186/s43141-021-00255-7

  • 93

    KwonK. C.ChanH. T.LeónI. R.Williams-CarrierR.BarkanA.DaniellH. (2016). Codon optimization to enhance expression yields insights into chloroplast translation. Plant Physiol.172, 6277. doi: 10.1104/pp.16.00981

  • 94

    LaFleurT. L.HossainA.SalisH. M. (2022). Automated model-predictive design of synthetic promoters to control transcriptional profiles in bacteria. Nat. Commun.13, 5159. doi: 10.1038/s41467-022-32829-5

  • 95

    LaiP. K. (2022). DeepSCM: An efficient convolutional neural network surrogate model for the screening of therapeutic antibody viscosity. Comput. Struct. Biotechnol. J.20, 21432152. doi: 10.1016/j.csbj.2022.04.035

  • 96

    LecunY.BengioY.HintonG. (2015). Deep learning. Nature521, 436444. doi: 10.1038/nature14539

  • 97

    LiF.ChenY.QiQ.WangY.YuanL.HuangM.et al. (2022a). Improving recombinant protein production by yeast through genome-scale modeling using proteome constraints. Nat. Commun.13, 113. doi: 10.1038/s41467-022-30689-7

  • 98

    LiM.FrankM. H.ConevaV.MioW.ChitwoodD. H.ToppC. N. (2018). The persistent homology mathematical framework provides enhanced genotype-to-phenotype associations for plant morphology. Plant Physiol.177, 13821395. doi: 10.1104/pp.18.00104

  • 99

    LiX.LiX.FanB.ZhuC.ChenZ. (2022b). Specialized endoplasmic reticulum-derived vesicles in plants: Functional diversity, evolution, and biotechnological exploitation. J. Integr. Plant Biol.64, 821835. doi: 10.1111/jipb.13233

  • 100

    LiY.SternD.LinL.MillsJ.OuS.MorrowM.et al. (2019). Emerging biomaterials for downstream manufacturing of therapeutic proteins. Acta Biomaterialia95, 7390. doi: 10.1016/j.actbio.2019.03.015

  • 101

    LimC. Y.KimD. S.KangY.LeeY. R.KimK.KimD. S.et al. (2022). Immune responses to plant-derived recombinant colorectal cancer glycoprotein epCAM-fcK fusion protein in mice. Biomol. Ther.30, 546552. doi: 10.4062/biomolther.2022.103

  • 102

    LimkulJ.IizukaS.SatoY.MisakiR.OhashiT.OhashiT.et al. (2016). The production of human glucocerebrosidase in glyco-engineered Nicotiana benthamiana plants. Plant Biotechnol. J.14, 16821694. doi: 10.1111/pbi.12529

  • 103

    LinK.GongL.HuangY.LiuC.PanJ. (2019). Deep learning-based segmentation and quantification of cucumber powdery mildew using convolutional neural network. Front. Plant Sci.10. doi: 10.3389/fpls.2019.00155

  • 104

    LinderJ.BogardN.RosenbergA. B.SeeligG. (2020). A generative neural network for maximizing fitness and diversity of synthetic DNA and protein sequences. Cell Syst.11, 4962.e16. doi: 10.1016/j.cels.2020.05.007

  • 105

    LiuX. (2017). Deep recurrent neural network for protein function prediction from sequence. arXiv Prepr. doi: 10.48550/arXiv.1701.08318

  • 106

    LiuZ.JinJ.CuiY.XiongZ.NasiriA.ZhaoY.et al. (2022b). DeepSeqPanII: an interpretable recurrent neural network model with attention mechanism for peptide-HLA class II binding prediction. IEEE/ACM Trans. Comput. Biol. Bioinforma.19, 21882196. doi: 10.1109/TCBB.2021.3074927

  • 107

    LiuB.SträuberH.SaraivaJ.HarmsH.SilvaS. G.KasmanasJ. C. (2022a). Machine learning-assisted identification of bioindicators predicts medium-chain carboxylate production performance of an anaerobic mixed culture. Microbiome10, 121. doi: 10.1186/s40168-021-01219-2

  • 108

    Lobato GómezM.HuangX.AlvarezD.HeW.BaysalC.ZhuC.et al. (2021). Contributions of the international plant science community to the fight against human infectious diseases – part 1: epidemic and pandemic diseases. Plant Biotechnol. J.19, 19011920. doi: 10.1111/pbi.13657

  • 109

    LuC.LiuC.SunX.WanP.NiJ.WangL.et al. (2021). Bioinformatics analysis, codon optimization and expression of ovine pregnancy associated Glycoprotein-7 in HEK293 cells. Theriogenology172, 2735. doi: 10.1016/j.theriogenology.2021.05.027

  • 110

    LuoY.JiangG.YuT.LiuY.VoL.DingH.et al. (2021). ECNet is an evolutionary context-integrated deep learning framework for protein engineering. Nat. Commun.12, 5743. doi: 10.1038/s41467-021-25976-8

  • 111

    MaJ. K. C.DrakeP. M. W.ChristouP. (2003). The production of recombinant pharmaceutical proteins in plants. Nat. Rev. Genet.4, 794805. doi: 10.1038/nrg1177

  • 112

    MaJ.YuM. K.FongS.OnoK.SageE.DemchakB.et al. (2018). Using deep learning to model the hierarchical structure and function of a cell. Nat. Methods15, 290298. doi: 10.1038/nmeth.4627

  • 113

    MaC.ZhuZ.YeJ.YangJ.PeiJ.XuS.et al. (2017). DeepRT: deep learning for peptide retention time prediction in proteomics. arXiv Prepr. doi: 10.48550/arXiv.1705.05368

  • 114

    MacharoenK.DuM.JungS.McDonaldK. A.NandiS. (2021). Production of recombinant butyrylcholinesterase from transgenic rice cell suspension cultures in a pilot-scale bioreactor. Biotechnol. Bioeng.118, 14311443. doi: 10.1002/bit.27638

  • 115

    MaimaitijiangM.SaganV.SidikeP.DaloyeA. M.ErkbolH.FritschiF. B. (2020). Crop monitoring using satellite/UAV data fusion and machine learning. Remote Sens.12, 1357. doi: 10.3390/RS12091357

  • 116

    MakowskiE. K.ChenH.LambertM.BennettE. M.EschmannN. S.ZhangY.et al. (2022). Reduction of therapeutic antibody self-association using yeast-display selections and machine learning. MAbs14, 2146629. doi: 10.1080/19420862.2022.2146629

  • 117

    MargolinE.OhY. J.VerbeekM.NaudeJ.PonndorfD.MeshcheriakovaY. A.et al. (2020b). Co-expression of human calreticulin significantly improves the production of HIV gp140 and other viral glycoproteins in plants. Plant Biotechnol. J.18, 21092117. doi: 10.1111/pbi.13369

  • 118

    MargolinE. A.StrasserR.ChapmanR.WilliamsonA.-L.RybickiE. P.MeyersA. E. (2020a). Engineering the plant secretory pathway for the production of next-generation pharmaceuticals. Trends Biotechnol.38, 10341044. doi: 10.1016/j.tibtech.2020.03.004

  • 119

    MarkovaE. A.ShawR. E.ReynoldsC. R. (2022). Prediction of strain engineerings that amplify recombinant protein secretion through the machine learning approach MaLPHAS. Eng. Biol.6, 8290. doi: 10.1049/enb2.12025

  • 120

    MarquesL. É. C.SilvaB. B.DutraR. F.FloreanE. O. P. T.MenassaR.GuedesM. I. F. (2020). Transient expression of dengue virus NS1 antigen in nicotiana benthamiana for use as a diagnostic antigen. Front. Plant Sci.10. doi: 10.3389/fpls.2019.01674

  • 121

    MartinyH. M.ArmenterosJ. J. A.JohansenA. R.SalomonJ.NielsenH. (2021). Deep protein representations enable recombinant protein expression prediction. Comput. Biol. Chem.95, 107596. doi: 10.1016/j.compbiolchem.2021.107596

  • 122

    MassonH. O.KuoC.-C.MalmM.LundqvistM.SievertssonÅ.BerlingA.et al. (2022). Deciphering the determinants of recombinant protein yield across the human secretome. bioRxiv2022, 12.12.520152. doi: 10.1101/2022.12.12.520152

  • 123

    McNultyM. J.GlebaY.TuséD.Hahn-LöbmannS.GiritchA.NandiS.et al. (2020). Techno-economic analysis of a plant-based platform for manufacturing antimicrobial proteins for food safety. Biotechnol. Prog.36, e2896. doi: 10.1002/btpr.2896

  • 124

    MestreM.RamosJ.CostaR. S.StriednerG.OliveiraR. (2022). A general deep hybrid model for bioreactor systems : Combining first principles with deep neural networks. Amsterdam: Elsevier. Vol. 165. doi: 10.1016/j.compchemeng.2022.107952

  • 125

    MettuR. R.CharlesT.LandryS. J. (2016). CD4+ T-cell epitope prediction using antigen processing constraints. J. Immunol. Methods432, 7281. doi: 10.1016/j.jim.2016.02.013

  • 126

    MinerviniM.FischbachA.ScharrH.TsaftarisS. A. (2015). Finely-grained annotated datasets for image-based plant phenotyping. Pattern Recognit. Lett.81, 8089. doi: 10.1016/j.patrec.2015.10.013

  • 127

    MinerviniM.GiuffridaM. V.PerataP.TsaftarisS. A. (2017). Phenotiki: an open software and hardware platform for affordable and easy image-based phenotyping of rosette-shaped plants. Plant J.90, 204216. doi: 10.1111/tpj.13472

  • 128

    MirzaeeM.OsmaniZ.FrébortováJ.FrébortI. (2022). Recent advances in molecular farming using monocot plants. Biotechnol. Adv.58, 107913. doi: 10.1016/j.bioteChadv.2022.107913

  • 129

    MiuraK.YoshidaH.NosakiS.KanekoM. K.KatoY. (2020). RAP tag and PMab-2 antibody: A tagging system for detecting and purifying proteins in plant cells. Front. Plant Sci.11. doi: 10.3389/fpls.2020.510444

  • 130

    MonteiroA.SantosS.GonçalvesP. (2021). Precision agriculture for crop and livestock farming—Brief review. Animals11, 118. doi: 10.3390/ani11082345

  • 131

    MoonK.-B.JeonJ.-H.ChoiH.ParkJ.-S.ParkS.-J.LeeH.-J.et al. (2022). Construction of SARS-CoV-2 virus-like particles in plant. Sci. Rep.12, 1005. doi: 10.1038/s41598-022-04883-y

  • 132

    MoonK.ParkJ.ParkY.SongI.LeeH.ChoH. S.et al. (2019). Development of systems for the production of plant-derived biopharmaceuticals. Plants9, 30. doi: 10.3390/plants9010030

  • 133

    MorT. S. (2015). Molecular pharming’s foot in the FDA’s door: Protalix’s trailblazing story. Biotechnol. Lett.37, 21472150. doi: 10.1007/s10529-015-1908-z

  • 134

    MossD. L.ParkH. W.MettuR. R.LandryS. J. (2019). Deimmunizing substitutions in Pseudomonas exotoxin domain III perturb antigen processing without eliminating T-cell epitopes. J. Biol. Chem.294, 46674681. doi: 10.1074/jbc.RA118.006704

  • 135

    MunasingheS. P.SomaratneS.WeerakoonS. R. (2020). Prediction of chemical composition for callus production in Gyrinops walla Gaetner through machine learning. Inf. Process. Agric.7, 511522. doi: 10.1016/j.inpa.2019.12.001

  • 136

    NavarreC.SmargiassoN.DuvivierL.NaderJ.FarJ.De PauwE.et al. (2017). N-Glycosylation of an IgG antibody secreted by Nicotiana tabacum BY-2 cells can be modulated through co-expression of human β-1,4-galactosyltransferase. Transgenic Res.26, 375384. doi: 10.1007/s11248-017-0013-6

  • 137

    O’NeillP.MistryR. K.BrownA. J.JamesD. C. (2023). Protein-specific signal peptides for mammalian vector engineering. bioRxiv, 532380. doi: 10.1101/2023.03.14.532380

  • 138

    ORF Genetics. (2023). Available at: https://www.orfgenetics.com/.

  • 139

    ÖtesO.FlatoH.WinderlJ.HubbuchJ.CapitoF. (2017). Feasibility of using continuous chromatography in downstream processing : Comparison of costs and product quality for a hybrid process vs. a conventional batch process. J. Biotechnol.259, 213220. doi: 10.1016/j.jbiotec.2017.07.001

  • 140

    PackiamK. A. R.OoiC. W.LiF.MeiS.TeyB. T.OngH. F.et al. (2022). PERISCOPE-Opt: Machine learning-based prediction of optimal fermentation conditions and yields of recombinant periplasmic protein expressed in Escherichia coli. Comput. Struct. Biotechnol. J.20, 29092920. doi: 10.1016/j.csbj.2022.06.006

  • 141

    PageM. T.ParryM. A. J.Carmo-SilvaE. (2019). A high-throughput transient expression system for rice. Plant Cell Environ.42, 20572064. doi: 10.1111/pce.13542

  • 142

    PanX.ZuallaertJ.WangX.ShenH. B.CamposE. P.MarushchakD. O.et al. (2020). ToxDL: Deep learning using primary structure and domain embeddings for assessing protein toxicity. Bioinformatics36, 51595168. doi: 10.1093/bioinformatics/btaa656

  • 143

    ParkS. R.LimC. Y.KimD. S.KoK. (2015). Optimization of ammonium sulfate concentration for purification of colorectal cancer vaccine candidate recombinant protein GA733-Fck isolated from plants. Front. Plant Sci.6. doi: 10.3389/fpls.2015.01040

  • 144

    PeyretH.BrownJ. K. M.LomonossoffG. P. (2019). Improving plant transient expression through the rational design of synthetic 5′ and 3′ untranslated regions. Plant Methods15, 113. doi: 10.1186/s13007-019-0494-9

  • 145

    QuangD.XieX. (2019). FactorNet: A deep learning framework for predicting cell type specific transcription factor binding from nucleotide-resolution sequential data. Methods166, 4047. doi: 10.1016/j.ymeth.2019.03.020

  • 146

    QureshiA. I. (2016). “Chapter 11 - Treatment of Ebola Virus Disease: Therapeutic Agents,” in Ebola Virus Disease: From Origin to Outbreak. Ed. QureshiA. (London, San Diego, Cambridge, Oxford: Academic Press), 159166. doi: 10.1016/B978-0-12-804230-4.00011-X

  • 147

    RadivojevićT.CostelloZ.WorkmanK.Garcia MartinH. (2020). A machine learning Automated Recommendation Tool for synthetic biology. Nat. Commun.11, 114. doi: 10.1038/s41467-020-18008-4

  • 148

    RamosJ. R. C.OliveiraG. P.DumasP.OliveiraR. (2022). Genome-scale modeling of Chinese hamster ovary cells by hybrid semi-parametric flux balance analysis. Bioprocess Biosyst. Eng.45, 18891904. doi: 10.1007/s00449-022-02795-9

  • 149

    RamziA. B.BaharumS. N.BunawanH.ScruttonN. S. (2020). Streamlining natural products biomanufacturing with omics and machine learning driven microbial engineering. Front. Bioeng. Biotechnol.8. doi: 10.3389/fbioe.2020.608918

  • 150

    RattanapisitK.ShanmugarajB.ManopwisedjaroenS.PurwonoP. B.SiriwattananonK.KhorattanakulchaiN.et al. (2020). Rapid production of SARS-CoV-2 receptor binding domain (RBD) and spike specific monoclonal antibody CR3022 in Nicotiana benthamiana. Sci. Rep.10, 17698. doi: 10.1038/s41598-020-74904-1

  • 151

    RawatP.PrabakaranR.KumarS.GromihaM. M. (2021). AbsoluRATE: An in-silico method to predict the aggregation kinetics of native proteins. Biochim. Biophys. Acta - Proteins Proteomics1869, 140682. doi: 10.1016/j.bbapap.2021.140682

  • 152

    RoutrayM.VipsitaS.SundarayA.KulkarniS. (2022). DeepRHD: An efficient hybrid feature extraction technique for protein remote homology detection using deep learning strategies. Comput. Biol. Chem.100, 107749. doi: 10.1016/j.compbiolchem.2022.107749

  • 153

    RozovS. M.DeinekoE. V. (2019). Strategies for optimizing recombinant protein synthesis in plant cells: classical approaches and new directions. Mol. Biol.53, 157175. doi: 10.1134/S0026893319020146

  • 154

    RuffoloJ. A.GuerraC.MahajanS. P.SulamJ.GrayJ. J. (2020). Geometric potentials from deep learning improve prediction of CDR H3 loop structures. Bioinformatics36, I268I275. doi: 10.1093/BIOINFORMATICS/BTAA457

  • 155

    RuffoloJ. A.SulamJ.GrayJ. J. (2022). Antibody structure prediction using interpretable deep learning. Patterns3, 100406. doi: 10.1016/j.patter.2021.100406

  • 156

    RuffoniB.PistelliL.BertoliA.PistelliL. (2010). Plant cell cultures: Bioreactors for industrial production. Adv. Exp. Med. Biol.698, 203221. doi: 10.1007/978-1-4419-7347-4_15

  • 157

    RussellS. J. (2010). Artificial intelligence a modern approach (New Jersey: Pearson Education, Inc).

  • 158

    SabiR.DanielR. V.TullerT. (2017). StAIcalc: tRNA adaptation index calculator based on species-specific weights. Bioinformatics33, 589591. doi: 10.1093/bioinformatics/btw647

  • 159

    SahuS. S.LoaizaC. D.KaundalR. (2021). Plant-mSubP: A computational framework for the prediction of single- And multi-target protein subcellular localization using integrated machine-learning approaches. AoB Plants12, 110. doi: 10.1093/AOBPLA/PLZ068

  • 160

    SainsburyF. (2020). Innovation in plant-based transient protein expression for infectious disease prevention and preparedness. Curr. Opin. Biotechnol.61, 110115. doi: 10.1016/j.copbio.2019.11.002

  • 161

    SamoudiM.MassonH. O.KuoC. C.RobinsonC. M.LewisN. E. (2021). From omics to cellular mechanisms in mammalian cell factory development. Curr. Opin. Chem. Eng.32, 100688. doi: 10.1016/j.coche.2021.100688

  • 162

    SangjanW.CarterA. H.PumphreyM. O.JitkovV.SankaranS. (2021). Development of a raspberry pi-based sensor system for automated in-field monitoring to support crop breeding programs. Inventions6, 42. doi: 10.3390/inventions6020042

  • 163

    SaraS. T.HasanM. M.AhmadA.ShatabdaS. (2021). Convolutional neural networks with image representation of amino acid sequences for protein function prediction. Comput. Biol. Chem.92, 107494. doi: 10.1016/j.compbiolchem.2021.107494

  • 164

    SarkarS.CazenaveA. B.OakesJ.McCallD.ThomasonW.AbbottL.et al. (2021). Aerial high-throughput phenotyping of peanut leaf area index and lateral growth. Sci. Rep.11, 117. doi: 10.1038/s41598-021-00936-w

  • 165

    SarkerI. H. (2021). Machine learning: algorithms, real-world applications and research directions. SN Comput. Sci.2, 121. doi: 10.1007/s42979-021-00592-x

  • 166

    SastryA. V.GaoY.SzubinR.HefnerY.XuS.KimD.et al. (2019). The Escherichia coli transcriptome mostly consists of independently regulated modules. Nat. Commun.10, 114. doi: 10.1038/s41467-019-13483-w

  • 167

    SchillbergS.FinnernR. (2021). Plant molecular farming for the production of valuable proteins - Critical evaluation of achievements and future challenges. J. Plant Physiol.258–259, 153359. doi: 10.1016/j.jplph.2020.153359

  • 168

    SchillbergS.RavenN.SpiegelH.RascheS.BuntruM. (2019). Critical analysis of the commercial potential of plants for the production of recombinant proteins. Front. Plant Sci.10. doi: 10.3389/fpls.2019.00720

  • 169

    SchjoldagerK. T.NarimatsuY.JoshiH. J.ClausenH. (2020). Global view of human protein glycosylation pathways and functions. Nat. Rev. Mol. Cell Biol.21, 729749. doi: 10.1038/s41580-020-00294-x

  • 170

    SethiL.KumariK.DeyN. (2021). Engineering of plants for efficient production of therapeutics. Mol. Biotechnol.63, 11251137. doi: 10.1007/s12033-021-00381-0

  • 171

    ShanmugarajB.RattanapisitK.ManopwisedjaroenS.ThitithanyanontA.PhoolcharoenW. (2020). Monoclonal Antibodies B38 and H4 Produced in Nicotiana benthamiana Neutralize SARS-CoV-2 in vitro. Front. Plant Sci.11. doi: 10.3389/fpls.2020.589995

  • 172

    ShayestehM.GhasemiF.TabandehF.YakhchaliB.ShakibaieM. (2020). Design, construction, and expression of recombinant human interferon beta gene in CHO-s cell line using EBV-based expression system. Res. Pharm. Sci.15, 144153. doi: 10.4103/1735-5362.283814

  • 173

    ShiX.CorderoT.GarriguesS.MarcosJ. F.DaròsJ. A.CocaM. (2019). Efficient production of antifungal proteins in plants using a new transient expression vector derived from tobacco mosaic virus. Plant Biotechnol. J.17, 10691080. doi: 10.1111/pbi.13038

  • 174

    SilvaJ. C. F.TeixeiraR. M.SilvaF. F.BrommonschenkelS. H.FontesE. P. B. (2019). Machine learning approaches and their current application in plant molecular biology: A systematic review. Plant Sci.284, 3747. doi: 10.1016/j.plantsci.2019.03.020

  • 175

    SinghA.GanapathysubramanianB.SinghA. K.SarkarS. (2016). Machine learning for high-throughput stress phenotyping in plants. Trends Plant Sci.21, 110124. doi: 10.1016/j.tplants.2015.10.015

  • 176

    SiriwattananonK.ManopwisedjaroenS.ShanmugarajB.RattanapisitK.PhumiamornS.SapsutthipasS.et al. (2021). Plant-produced receptor-binding domain of SARS-coV-2 elicits potent neutralizing responses in mice and non-human primates. Front. Plant Sci.12. doi: 10.3389/fpls.2021.682953

  • 177

    SmialowskiP.DooseG.TorklerP.KaufmannS.FrishmanD. (2012). PROSO II - A new method for protein solubility prediction. FEBS J.279, 21922200. doi: 10.1111/j.1742-4658.2012.08603.x

  • 178

    SmiatekJ.ClemensC.HerreraL. M.ArnoldS.KnappB.PresserB.et al. (2021). Generic and specific recurrent neural network models: Applications for large and small scale biopharmaceutical upstream processes. Biotechnol. Rep.31, e00640. doi: 10.1016/j.btre.2021.e00640

  • 179

    SoniA. P.LeeJ.ShinK.KoiwaH.HwangI. (2022). Production of recombinant active human TGFβ1 in nicotiana benthamiana. Front. Plant Sci.13. doi: 10.3389/fpls.2022.922694

  • 180

    StrainB.MorrisseyJ.AntonakoudisA.KontoravdiC. (2023). Genome-scale models as a vehicle for knowledge transfer from microbial to mammalian cell systems. Comput. Struct. Biotechnol. J.21, 15431549. doi: 10.1016/j.csbj.2023.02.011

  • 181

    StrasserR. (2022). Recent developments in deciphering the biological role of plant complex N-glycans. Front. Plant Sci.13. doi: 10.3389/fpls.2022.897549

  • 182

    StrasserR. (2023). Plant glycoengineering for designing next-generation vaccines and therapeutic proteins. Biotechnol. Adv.67, 108197. doi: 10.1016/j.bioteChadv.2023.108197

  • 183

    SunX.YangZ.SuP.WeiK.WangZ.YangC.et al. (2023). Non-destructive monitoring of maize LAI by fusing UAV spectral and textural features. Front. Plant Sci.14. doi: 10.3389/fpls.2023.1158837

  • 184

    Sureyya RifaiogluA.DoğanT.Jesus MartinM.Cetin-AtalayR.AtalayV. (2019). DEEPred: automated protein function prediction with multi-task feed-forward deep neural networks. Sci. Rep.9, 116. doi: 10.1038/s41598-019-43708-3

  • 185

    Taghavi NaminS.EsmaeilzadehM.NajafiM.BrownT. B.BorevitzJ. O. (2018). Deep phenotyping: Deep learning for temporal phenotype/genotype classification. Plant Methods14, 114. doi: 10.1186/s13007-018-0333-4

  • 186

    TausenM.ClausenM.MoeskjærS.ShihavuddinA. S. M.DahlA. B.JanssL.et al. (2020). Greenotyper: image-based plant phenotyping using distributed computing and deep learning. Front. Plant Sci.11. doi: 10.3389/fpls.2020.01181

  • 187

    TienN. Q. D.HuyN. X.KimM. Y. (2019). Improved expression of porcine epidemic diarrhea antigen by fusion with cholera toxin B subunit and chloroplast transformation in Nicotiana tabacum. Plant Cell. Tissue Organ Cult.137, 213223. doi: 10.1007/s11240-019-01562-1

  • 188

    TokekarP.Vander HookJ.MullaD.IslerV. (2016). Sensor planning for a symbiotic UAV and UGV system for precision agriculture. IEEE Trans. Robot.32, 14981511. doi: 10.1109/TRO.2016.2603528

  • 189

    TovarJ. C.HoyerJ. S.LinA.TielkingA.CallenS. T.Elizabeth CastilloS.et al. (2018). Raspberry Pi–powered imaging for plant phenotyping. Appl. Plant Sci.6, 112. doi: 10.1002/aps3.1031

  • 190

    Tuan-AnhT.LyL. T.VietN. Q.BaoP. T. (2017). Novel methods to optimize gene and statistic test for evaluation - an application for Escherichia coli. BMC Bioinf.18, 110. doi: 10.1186/s12859-017-1517-z

  • 191

    UbbensJ. R.StavnessI. (2017). Deep plant phenomics: A deep learning platform for complex plant phenotyping tasks. Front. Plant Sci.8. doi: 10.3389/fpls.2017.01190

  • 192

    VafaeeY.AlizadehH. (2018). Heterologous production of recombinant anti-HIV microbicide griffithsin in transgenic lettuce and tobacco lines. Plant Cell. Tissue Organ Cult.135, 8597. doi: 10.1007/s11240-018-1445-2

  • 193

    VaishnavE. D.de BoerC. G.MolinetJ.YassourM.FanL.AdiconisX.et al. (2022). The evolution, evolvability and engineering of gene regulatory DNA. Nature603, 455463. doi: 10.1038/s41586-022-04506-6

  • 194

    Van BremptM.ClauwaertJ.MeyF.StockM.MaertensJ.WaegemanW.et al. (2020). Predictive design of sigma factor-specific promoters. Nat. Commun.11, 113. doi: 10.1038/s41467-020-19446-w

  • 195

    van DijkA. D. J.KootstraG.KruijerW.de RidderD. (2021). Machine learning in plant science and plant breeding. iScience24, 101890. doi: 10.1016/j.isci.2020.101890

  • 196

    VaškevičiusM.Kapočiūtė-DzikienėJ.ŠlepikasL. (2021). Prediction of chromatography conditions for purification in organic synthesis using deep learning. Molecules26, 2474. doi: 10.3390/molecules26092474

  • 197

    Vazquez-VilarM.SelmaS.OrzaezD. (2023). The design of synthetic gene circuits in plants: new components, old challenges. J. Exp. Bot.74, 37913805. doi: 10.1093/jxb/erad167

  • 198

    VietN. D.JangA. (2021). Journal of Environmental Chemical Engineering Development of artificial intelligence-based models for the prediction of filtration performance and membrane fouling in an osmotic membrane bioreactor. J. Environ. Chem. Eng.9, 105337. doi: 10.1016/j.jece.2021.105337

  • 199

    Vo ngocL.HuangC. Y.CassidyC. J.MedranoC.KadonagaJ. T. (2020). Identification of the human DPR core promoter element using machine learning. Nature585, 459463. doi: 10.1038/s41586-020-2689-7

  • 200

    WanS.ZhaoK.LuZ.LiJ.LuT.WangH. (2022). A modularized ioT monitoring system with edge-computing for aquaponics. Sensors22, 9260. doi: 10.3390/s22239260

  • 201

    WangX.LiF.XuJ.RongJ.WebbG. I.GeZ.et al. (2022). ASPIRER: A new computational approach for identifying non-classical secreted proteins based on deep learning. Brief. Bioinform.23, 112. doi: 10.1093/bib/bbac031

  • 202

    WangL.NieR.YuZ.XinR.ZhengC.ZhangZ.et al. (2020). An interpretable deep-learning architecture of capsule networks for identifying cell-type gene expression programs from single-cell RNA-sequencing data. Nat. Mach. Intell.2, 693703. doi: 10.1038/s42256-020-00244-4

  • 203

    WebsterG. R.TehA. Y. H.MaJ. K. C. (2017). Synthetic gene design—The rationale for codon optimization and implications for molecular pharming in plants. Biotechnol. Bioeng.114, 492502. doi: 10.1002/bit.26183

  • 204

    WeissenowK.HeinzingerM.RostB. (2022). Protein language-model embeddings for fast, accurate, and alignment-free protein structure prediction. Structure30, 11691177.e4. doi: 10.1016/j.str.2022.05.001

  • 205

    WittmannB. J.JohnstonK. E.WuZ.ArnoldF. H. (2021). Advances in machine learning for directed evolution. Curr. Opin. Struct. Biol.69, 1118. doi: 10.1016/j.sbi.2021.01.008

  • 206

    WuM. R.NissimL.StuppD.PeryE.Binder-NissimA.WeisingerK.et al. (2019). A high-throughput screening and computation platform for identifying synthetic promoters with enhanced cell-state specificity (SPECS). Nat. Commun.10, 110. doi: 10.1038/s41467-019-10912-8

  • 207

    WuZ.YangK. K.LiszkaM. J.LeeA.BatzillaA.WernickD.et al. (2020). Signal peptides generated by attention-based neural networks. ACS Synth. Biol.9, 21542161. doi: 10.1021/acssynbio.0c00219

  • 208

    WuX.YuL. (2021). EPSOL: sequence-based protein solubility prediction using multidimensional embedding. Bioinformatics37, 43144320. doi: 10.1093/bioinformatics/btab463

  • 209

    YangZ.BogdanP.NazarianS. (2021b). An in silico deep learning approach to multi-epitope vaccine design: a SARS-CoV-2 case study. Sci. Rep.11, 121. doi: 10.1038/s41598-021-81749-9

  • 210

    YangY.HeffernanR.PaliwalK.LyonsJ.DehzangiA.SharmaA.et al. (2017). Spider2: A package to predict secondary structure, accessible surface area, and main-chain torsional angles by deep neural networks. Methods Mol. Biol.1484, 5563. doi: 10.1007/978-1-4939-6406-2_6

  • 211

    YangH. S.RhoadsD. D.SepulvedaJ.ZangC.ChadburnA.WangF. (2023). Challenges and considerations of developing and implementing machine learning tools for clinical laboratory medicine practice. Arch. Pathol. Lab. Med.147, 826836. doi: 10.5858/arpa.2021-0635-RA

  • 212

    YangT.ZhangW.ZhouT.WuW.LiuT.SunC. (2021a). Plant phenomics & precision agriculture simulation of winter wheat growth by the assimilation of unmanned aerial vehicle imagery into the WOFOST model. PloS One16, 19. doi: 10.1371/journal.pone.0246874

  • 213

    Yoosefzadeh-NajafabadiM.EarlH. J.TulpanD.SulikJ.EskandariM. (2021). Application of machine learning algorithms in plant breeding: predicting yield from hyperspectral reflectance in soybean. Front. Plant Sci.11. doi: 10.3389/fpls.2020.624273

  • 214

    YuS. I.RheeC.ChoK. H.ShinS. G. (2022). Comparison of different machine learning algorithms to estimate liquid level for bioreactor management. Environ. Eng. Res.28, 220037220030. doi: 10.4491/eer.2022.037

  • 215

    ZangirolamiT. C.CampaniG.HortaA. C. L.GiordanoR. C. (2021). Machine learning applied for metabolic flux - based control of micro - aerated fermentations in bioreactors. Biotechnol. Bioeng.118, 20762091. doi: 10.1002/bit.27721

  • 216

    ZaragozaJ. M. C. (2022). Data-Driven Cell Engineering of Chinese Hamster Ovary Cells through Machine Learning. Denmark: Technical University of Denmark.

  • 217

    ZhangJ.PetersenS. D.RadivojevicT.RamirezA.Pérez-ManríquezA.AbeliukE.et al. (2020). Combining mechanistic and machine learning models for predictive engineering and optimization of tryptophan metabolism. Nat. Commun.11, 4880. doi: 10.1038/s41467-020-17910-1

  • 218

    ZhaoW.ZhouL. Y.KongJ.HuangZ. H.GaoY.ZhangZ. X.et al. (2023). Expression of recombinant human Apolipoprotein A-IMilano in Nicotiana tabacum. Bioresour. Bioprocess.10 (1), 114. doi: 10.1186/s40643-023-00623-w

  • 219

    ZhengY. Y.KongJ. L.JinX. B.WangX. Y.SuT. L.ZuoM. (2019). Cropdeep: The crop vision dataset for deep-learning-based classification and detection in precision agriculture. Sensors (Switzerland)19, 1058. doi: 10.3390/s19051058

  • 220

    ZrimecJ.BörlinC. S.BuricF.MuhammadA. S.ChenR.SiewersV.et al. (2020). Deep learning suggests that gene expression is encoded in all parts of a co-evolving interacting gene regulatory structure. Nat. Commun.11, 6141. doi: 10.1038/s41467-020-19921-4

Summary

Keywords

artificial intelligence, molecular pharming, synthetic biology, deep learning, machine learning

Citation

Parthiban S, Vijeesh T, Gayathri T, Shanmugaraj B, Sharma A and Sathishkumar R (2023) Artificial intelligence-driven systems engineering for next-generation plant-derived biopharmaceuticals. Front. Plant Sci. 14:1252166. doi: 10.3389/fpls.2023.1252166

Received

03 July 2023

Accepted

17 October 2023

Published

15 November 2023

Volume

14 - 2023

Edited by

Ahmad Bazli Ramzi, National University of Malaysia, Malaysia

Reviewed by

Diego Orzaez, Polytechnic University of Valencia, Spain; Tsan-Yu Chiu, Beijing Genomics Institute (BGI), China; Johannes Felix Buyel, University of Natural Resources and Life Sciences, Austria

Updates

Copyright

*Correspondence: Ramalingam Sathishkumar, ; Ashutosh Sharma,

Disclaimer

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Outline

Figures

Cite article

Copy to clipboard


Export citation file


Share article

Article metrics