Application of machine learning technologies in biodiesel production process—A review

The search for renewable, affordable, sustainable, and ecologically benign fuels to substitute fossil-based diesel fuels has led to increased traction in the search for biodiesel production and utilization in recent times. Biodiesel, a form of liquid biofuel, has been found to alleviate environmental degradation, enhance engine performance, and reduce emissions of toxic gases in transportation and other internal combustion engines. However, biodiesel production processes have been dogged with various challenges and complexities which have limited its expected progression. The introduction of data-based technologies is one of the remedies aimed at deescalating the challenges associated with biodiesel synthesis. In this study, the application of machine learning (ML) –based technologies including artificial neural network (ANN), response surface methodology (RSM), adaptive network-based fuzzy inference system (ANFIS), etc. As tools for the prediction, modeling, and optimization of the biodiesel production process was interrogated based on the outcomes of previous studies in the research domain. Specifically, we review the influence of input variables like alcohol: oil molar ratio, catalyst concentration, reaction temperature, residence time, and agitation speed on the biodiesel yield (output variable). The outcome of this investigation shows that the usage of ANN, RSM, ANFIS, and other machine learning technologies raised biodiesel yield to between 84% and 98% while the statistical verification shows that the Pearson correlation coefficient and coefficient of determination are close to 1. Going forward, more targeted and collaborative research is needed to escalate the use of innovative technologies for the entire biodiesel value chain to enhance production efficiency, ensure economic feasibility, and promote sustainability.


Introduction
Demand for affordable and efficient energy to meet rising consumption has continued to constitute a potent hindrance to achieving universal social, economic, and industrial development over the past decades. There are legitimate concerns that the current energy supply scenario is grossly inadequate and unsustainably deficient to meet the energy need of the industrialized global community. Fossil-based (FB) fuels, majorly coal, oil, and gas, have been the mainstay of the global energy supply for over 150 years. The total world energy consumption rose from 109, 858 TW-hours (TWh) in 2000 to 127, 0232 TWh, 151, 100 TWh, and163, 709 TWh in 2005, 2015, and 2021, respectively (Ritchie et al., 2022). Over the same period, the global population rose from 6.1 billion in 2000 to 6.5 billion, 7.4 billion, and 7.9 billion in 20056.5 billion, 7.4 billion, and 7.9 billion in , 20156.5 billion, 7.4 billion, and 7.9 billion in , and 20216.5 billion, 7.4 billion, and 7.9 billion in , respectively (UN, 2022. Increased population growth is among the reasons for the sustained rise in global energy consumption ( Figure 1). As of 2019, about 84.3% of the global energy came from FB sources with oil, coal, and gas contributing 33.1%, 27%, and 24.3%, respectively, while 11.3% was supplied by renewable energy sources (Ritchie et al., 2022), as shown in Figure 2.
The process of exploration, exploitation, refining, and utilization of oil to generate liquid fuel for internal combustion engines has triggered environmental and health challenges. Diesel fuel is arguably the most commonly used fuel derived from petroleum. Diesel engines, also known as compression ignition (CI) are one of the most common reciprocating engines for diverse applications and offer better operational performance and strength when compared with spark ignition engines. Diesel engines are used for on-road, off-road applications, and non-automotive applications. Notable on-road application of CI engines includes light vehicles, commercial bus, trucks, etc., while power generation, marine applications, etc. Are examples of off-road applications. Generally, CI engines are used in transportation, industrial, construction, and agriculture equipment (Mejía et al., 2020). The fact that CI engines offer higher thermodynamic efficiency, better load-carrying capacity, and improved fuel economy when compared with spark ignition (SI) engines makes it a preferred choice for light and heavy-duty vehicles for transportation, agricultural machinery, and industrial applications. Consequently, the demand for diesel fuels was about 27.1 million barrels per day (mmb/d) rose to 30 mm b/d and has been predicted to become 31.6 mm b/d, 34.1 mm b/d, and 35.1 mm b/d by 2025, 2035, and 2040, respectively (Khan et al., 2019. In the same vein, there has been an increased in the use of diesel-fuel-powered engines particularly for power generation, construction, and agricultural sectors, and has increased the market share of diesel engines. Research and Markets (2019), a global research outfit predicted the global market value of diesel engines to reach USD 266.3 Billion in 2027 as against USD 291 Billion recorded in 2018.
Despite the enormous contributions of CI engines to social, economic, and industrial developments, increased avenues of utilization, and projections in the market value of CI engines, the use of CI engines results in enormous emissions that deteriorate air quality, impact human health, and exacerbate global climate. Heavyduty vehicles and power-generating sets fuelled with diesel fuels emit carbon oxides, nitrogen oxides, particulate matter, sulfur oxides, and The utilization of biodiesel and biodiesel blends as a low-cost and sustainable replacement of FB fuels in compression ignition (CI) engines has been investigated severally in recent years. The outcome of these studies revealed the multiple benefits derivable from running CI engines on blended and unblended biodiesel. Despite these outcomes, several other techniques have been adopted to improve the performance of biodiesel fuel in CI engines. Khan et al. (2020); Soudagar et al. (2019) investigated the effect of graphene oxide nanoparticle as additives to improve the performance and emission characteristics of CI engines. The outcome of their studies revealed that the brake thermal efficiency increased while the exhaust gas temperature, smoke, hydrocarbon, and carbon monoxide reduced by almost 35%. Other methodologies to improve the performance and emission characteristics of CI engines fuelled with biodiesel include blending with octanol and the use of combustion and emission strategies such as reactivity controlled compression ignition (Onuh et al., 2021;Wategave et al., 2021;Soudagar et al., 2022). ML technologies can also be applied to predict, model, and optimize engine performance and emission characteristics of CI engine fuelled by biodiesel.
In a different research, Verma and Sharma (2016), Sitepu et al. (2020), and Topare et al. (2022) examined the critical parameters that significantly influence biodiesel production by transesterification. They reported that the free fatty acid content of the feedstock, feedstock type, process temperature, type of alcohol, reaction time, catalyst type and concentration, the molar ratio of alcohol and oil, and agitation speed affect feedstock conversion efficiency and product yield. They, however, recommended the application of appropriate tools and technologies to optimize the effect of the identified parameters. Garg and Jain (2020), Selvaraj et al. (2019), and Ayoola et al. (2019) utilized machine learning (ML) technologies including response surface methodology (RSM) and artificial neural networks (ANN) to model, predict, and optimize process parameters for the transesterification of algal oil, waste cooking oil, and waste groundnut oil, respectively, to biodiesel. They concluded the robustness, superiority, accuracy, and capability of the RSM and ANN to predict and optimize production process parameters for improved biodiesel yield. The application of generic algorithm (GA) for the prediction and optimization of process parameters for biodiesel production was investigated by Betiku et al. (2015) and Srivastava et al. (2018) when they converted Shea butter oil and microalgae oil into quality biodiesel. They recommended more studies on the use of GA for obtaining optimal process parameters for biodiesel production.
To be able to appropriately fill the research gap and advance the trajectory of research in this field, the relevant question awaiting solutions is whether sufficient research has been accomplished to properly interrogate and situate the application of ML technologies in the biodiesel production research space. The aim of the study, therefore, is to examine the avenues for the application of ML technologies for the prediction and optimization of process parameters for sustainable biodiesel production. This is to attain optimal process parameters with the capability to stimulate cost-effective, fast, environmentally benign, and sustainable pathways for biodiesel synthesis. The outcome of the current intervention will deepen scholarship by delivering updated information that will stimulate the application of ML and other similar models in biodiesel research. This study will also expose more avenues for the utilization of ML in biodiesel research to achieve accelerated biodiesel production. This will further contribute towards an improved application of biodiesel, environmental sustainability, and green economy. The study is limited to the review of various ML technologies in predicting and optimizing process parameters for sustainable biodiesel generation. The application of mathematical and statistical backgrounds as well as the fundamental information and actual details relating to the modelling, prediction, and optimization of ML models are beyond the scope of this work but have been adequately presented elsewhere (Basile et al., 2017;Kubat, 2017;Mohri et al., 2018). Going forward, there is a need to escalate the application of statistical approaches, mathematical and numerical solutions, modelling, simulation, and optimization models, and other relevant innovative technologies for the enhancement of biodiesel production in all its ramifications. Multidisciplinary and collaborative efforts are needed for the intensification and advancement of costeffective and sustainable biodiesel production pathways. The novelty of this study is derived from the investigation of the effect of the process parameters on biodiesel yield and to bring to the fore how ML technologies, mathematical techniques, and statistical tools can be applied to enhance biodiesel yield and adequately measure the influence of such methodologies on biodiesel yield. engines. Due to its renewability, safe handling, easy production techniques, and conformity with the existing transport infrastructure of petroleum diesel biodiesel have gained wide acceptability among renewable fuel consumers. These factors have greatly influenced its production and consumption. To meet up with the global biodiesel demand, there has been an escalation in the quantum of biodiesel production. The worldwide biodiesel that was 38 billion litres in 2016 rose to 45.6 billion litres, 48.3 billion litres, and 53 billion litres in 2018, 2019, and 2021 respectively. The annual production has been estimated to reach 60.7 billion litres at the end of 2022 ( Figure 3) (Statista, 2022). The negative production growth rate witnessed in 2020 was a result of the impacts of the global restriction occasioned by the outbreak of the COVID-19 pandemic (Awogbemi and Kallon 2022a). As shown in Figure 4, the global market size of biodiesel that was put at USD 32.09 Billion in 2021 and has been predicted to climb to about USD 127.13 Billion USD 161.64 Billion, and USD 189.7 Billion by 2025, 2028, and 2030, respectively (Precedence Research, 2022a. The expected growth in market size is propelled by increasing demand for renewable and clean fuel, especially biodiesel for power generation, and powering automotive, agriculture, and marine engines in global scale. The increased investment in biodiesel production and utilization is seen as feasible strategies to reverse the trend of the worrisome emission of greenhouse gases (Awogbemi and Kallon, 2022b).

Feedstock for biodiesel production
The choice of feedstock for biodiesel production is a major deciding factors that affect the production cost, infrastructure, conversion technique, conversion efficiency, the pump price, and sustainability of biodiesel production. For example, the feedstock accounted for about 75%-88% of production cost and is a key consideration in fixing the pump price of biodiesel . The selection of feedstock also greatly influences the type of catalyst, process parameters, intensification techniques, and purification methods (Athar and Zaidi, 2020;Sitepu et al., 2020;Mohiddin et al., 2021;Awogbemi and Kallon, 2022c). The use of non-edible feedstock, particularly waste cooking oil significantly reduces the cost of biodiesel production, ensures an appropriate strategy for disposing of waste vegetable oil, and averts pollution of aquatic and terrestrial ecosystems. On the other hand, employing edible vegetable oils as feedstock for biodiesel production impacts food security, sparks the food vs. fuel debate, and significantly leads to increased food prices. Algal biomass possesses high volatile fatty acid content, requires no land, water, and fertilizer for cultivation, easily convertible to biodiesel (Yaashikaa et al., 2022). Though the conversion of algae, microalgae, and seaweeds to biodiesel requires advanced technologies, and cannot be achieved at the household level, large-scale production and commercialization are still challenging (Liu et al., 2022;Taft and Canchaya, 2022). The feedstocks for biodiesel production can be categorized based on generations and edibility. Table 1 shows the examples, benefits, and drawbacks of different generations of feedstock for biodiesel generation.

Techniques for biodiesel production
Diverse pathways have been utilized to convert feedstock to biodiesel notable for direct use, micro-emulsification, pyrolysis, transesterification, and supercritical. These techniques are broadly categorized as physical techniques and chemical techniques (Awogbemi and Kallon, 2022a).
The direct use or dilution of vegetable oils as a replacement for FB diesel fuel in CI engines is the simplest avenue of alternative fuel utilization and was first used by Rudolph Diesel, a renowned German inventor of diesel engines, around the 1890s (Elghariani and Eshoul, 2021). Though this method is characterized by low production and capital costs, some of the properties of the vegetable oils make their use as direct fuel challenging, unsuitable, and impracticable. Some vegetable oils are highly acidic and have high viscosity, high density, low volatility, and reactivity of the unsaturated hydrocarbon chains. Therefore, when used in unmodified CI engines, vegetable oils result in incomplete combustion, engine deterioration, high carbon deposit, and congealing of the engine lubricating oil. Other shortcomings of the direct application of vegetable oils as a substitute fuel for unmodified CI engines include poor atomization, increased emission of pollutants, an accelerated rate of engine wear, high cost of engine maintenance, and poor engine performance (Dabi and Saha, 2019).
The micro-emulsification method is designed to remedy the drawback of the high viscosity of vegetable oils as CI engine fuel.
During this process, appropriate solvents, surfactants or cetane improvers are mixed with vegetable oils and animal fats to achieve a clear, single-phase, and thermodynamically stable fluid as fuel. These emulsified oils form isotropic fluids which are stable, well dispersed, and with the required microstructures and droplet diameter. Notable solvents used for micro-emulsion include 1butanol, 2-octanol, butanol, propanol, ethanol, hexanol, and methanol. Higher alcohols, sorbitan monooleate, octanol, and rhamnolipid are been used as surfactants while alkyl nitrates, nitroalkanes, and nitrocarbonates are commonly applied cetane improvers during micro-emulsification of vegetable oils (Mishra and Goswami, 2018). Emulsified vegetable oils and animal fats exhibit enhanced cold flow properties, improved stability, acceptable viscosity, and shorter ignition delay in CI engines (Abbaszadeh et al., 2016). However, there are reported cases of incomplete combustion, accumulation of carbon residue in the combustion chamber, thickening of the lubricating oil, and occasional blockage of the injector needle when products of micro-emulsification of oils were used in CI engines (Sanli et al., 2022).
Pyrolysis involves the amalgamation of thermal and chemical processes when feedstocks are converted into fuels and other useful products with the application of heat and in the presence or absence of some chemicals (catalysts) (Awogbemi et al., 2022;Nayab et al., 2022).
Pyrolysis is one of the commonly used reforming techniques for the degradation of vegetable oils, animal fats, and other feedstocks by cracking their chemical bonds to generate renewable fuels with properties and structures comparable to petroleum diesel fuels. Biodiesel synthesized from the pyrolysis of vegetable oils demonstrates satisfactory physicochemical properties and improved engine performance. However, high production cost, complex and expensive infrastructural requirements, the low oxygen content of the fuels, and the production of chemically gasoline-like short-chain molecules are some of the factors limiting the application of the process .  (Kesharvani and Dwivedi, 2021;Shaah et al., 2021;Zulqarnain et al., 2021).

Classification
Examples Benefits Drawbacks Transesterification is arguably the most widely used thermochemical technique for the conversion of triglycerides to biodiesel. During the process, 1 mol of triglycerides in oils and fats stoichiometrically reacts with 3 mol of alcohol to form 1 mol of alky ester and 1 mol of glycerol, as shown in Figure 5. The entire transesterification reaction proceeds in a three-step successive reversible reaction, as depicted in Figure 6. The produced crude biodiesel is purified to meet the required ASTM D6751 and EN 14214 standards while the glycerol is separated to prevent the formation of acetaldehyde or formaldehyde and other hazardous gases during combustion (Zhang et al., 2022). The glycerol can be utilized as fuel additives and raw materials for the production of bioethanol, cosmetics, livestock feeds, cosmetics, and other chemical products (Chilakamarry et al., 2022). The reversible reaction takes place with or without catalysts but under appropriate reaction parameters of temperature, pressure, reaction time, alcohol/oil ratio, and mixing speed of the reactants. Though methanol and ethanol are the most popular alcohols used in transesterification reactions, the lower cost and other physical features of methanol make it a preferred choice. For example, when compared with ethanol, methanol disperses more in homogeneous catalysts and reacts more efficiently with triglycerides (Awogbemi et al., 2021a). However, since the boiling point of methanol is 60°C, the maximum reaction temperature for a transesterification reaction involving methanol is usually 60°C to avert evaporation of methanol. The reaction time is indirectly related to the reaction temperature while the reaction takes place at atmospheric pressure, in most cases. The transesterification reaction is easily achievable under moderate process conditions and can be done domestically without the use of sophisticated infrastructure and technical manpower. However, the application of the transesterification process has been limited by poor mass transfer during the process, unpredictable product quality, generation of a high volume of wastewater, and involvement of multiple separation processes (Awogbemi et al., 2021a;Nayab et al., 2022).

Generations Edibility
The supercritical technique of biodiesel is a non-catalytic process of the biodiesel production process to replace the catalytic method of biodiesel production. During this process, compressed fluids that exhibit the properties of both gas and liquid are maintained and used above their critical temperature and pressure. Superfluids such as methanol, ethanol, and acetone maintained at 239.2°C and 8.09 MPa, 240.9°C and 6.14 MPa, and 235.1°C and 4.70 MPa, respectively are used for biodiesel production (Awogbemi and Kallon, 2022a). At supercritical conditions, methanol witnesses an increase in density, solubility, mass transfer characteristics, and reduced polarity which results in better dissolution of the triglyceride in methanol and subsequent formation of biodiesel and glycerol (Farobie and Matsumura, 2017). The method requires no catalyst, consumes less energy, exhibits a high reaction rate, and requires easier product separation and purification processes. Supercritical techniques allow esterification and transesterification reactions to proceed simultaneously, can be applied to a wide range of feedstocks, and result in high conversion efficiency. Some of the drawbacks of the supercritical biodiesel production process include high process temperature and pressure, use of a high volume of alcohol, expensive production infrastructure, and possible denaturing of the products (Qadeer et al., 2021;Baydir and Aras, 2022).

Catalysts for biodiesel production
To further advance conversion efficiency and enhance the rate of biodiesel production, catalysts are often used. Though biodiesel can be synthesized without the use of catalyst and the cost of catalysts increase the total cost of biodiesel production, catalytic biodiesel production is advantageous and is mostly adopted. The use of catalysts reduces reaction time, improves conversion efficiency, and helps in lowering the reaction activation energy. Catalysts for biodiesel production are divided into three broad categories namely, heterogeneous catalysts, homogeneous catalysts, and enzymatic catalysts. The choice of an appropriate catalyst is dependent on the type of feedstock, free fatty acid (FFA) content, acid value, water content of feedstocks, etc. (Velusamy et al., 2021;Mukhtar et al., 2022).

Frontiers in Energy Research
frontiersin.org Homogeneous catalysts are usually in liquid form and they maintain the same phase to the reacting materials during transesterification. This category of catalyst is used in converting triglycerides with high FFA content, acid value and is sometimes miscible with both glycerol and crude biodiesel. This partial miscibility with glycerol and biodiesel creates challenges in separating and recovery of the catalysts from the products (Tan S. X. et al., 2019). Homogeneous catalysts used for the transesterification process can either be acid or base catalysts. Notable examples of acidic homogeneous catalysts for transesterification reactions include H 2 SO 4 , H 3 PO 4 , HCl, C 2 HF 3 O 2 , and sulfonated acids. Examples of base homogeneous catalysts for transesterification reaction include NaOH, KOH, NaOH, NaOCH 3 , etc. (Changmai et al., 2020). Because homogeneous catalysts are usually in the same liquid phase as glycerol and biodiesel, there is the problem of separation and catalyst recovery. Several washing stages are involved to ensure complete catalysts removal, requires large volume of water for washing, and generate a large volume of wastewater in the process of purification. Similarly, homogeneous catalysts are relatively expensive and reuse is almost impossible (Rizwanul Fattah et al., 2020;Mukhtar et al., 2022).
Conversely, heterogeneous catalysts usually occur in a non-liquid state and are in a different phase from the reacting materials. They possess active sites with the reacting materials that ensure fast reaction and are subdivided into two types, namely acid heterogeneous catalysts and base heterogeneous catalysts (CaO, SrO, BaO, etc.). Generally, metal oxides, mixed metal oxides, sulfated metal oxides, zeolites, and some cation exchange resins have been successfully used as heterogeneous catalysts for biodiesel production (Gupta et al., 2020). Heterogeneous catalysts exhibit excellent catalytic activity under reasonable reaction conditions, high reusability, high recovery, easy to separate, low cost of purification, and generation of a low volume of wastewater. However, heterogeneous catalysts are only suitable with feedstocks with low FFA content and a high rate of soap formation (Changmai et al., 2020;Mandari and Devarai, 2021). Heterogeneous base catalysts can be derived from wastes such as chicken eggshell, quail eggshell, crab shell, crop residues, and food wastes to lower the production cost, ensure resource recovery and contribute to environmental sanitation and sustainability (Awogbemi and Kallon, 2022a;Awogbemi and Kallon, 2022b).
Greater research interest has been focused on the application of enzyme catalysts or biocatalysts for biodiesel production over the past few decades. Enzymatic catalytic biodiesel production involves using living and biological organisms to stimulate chemical reactions without the organisms being chemically affected in the process. The commonest and most widely used enzymes for biodiesel production is lipase. In most cases, lipases are invented from microorganisms such as Burkholderia cepacia, Thermomyces lanuginosus, Candida antarctica, Aspergillus niger, Anoxybacillus gonensins etc. During fermentation under specially monitored conditions (Toldrá-Reig et al., 2020;Altinok et al., 2023). Lipases are subdivided into intracellular lipases and extracellular lipases. Intracellular lipases involve utilizing complete cell of microorganisms or yeasts and are domicile within the cell-producing walls. Intracellular lipases are easy and economical to extract, and require minimal purification but suffer from low conversion efficiency and relatively low biodiesel yield. On the other hand, extracellular lipases are derived from microorganism broth, and subsequently separated and purified before application. Unlike intracellular lipase, the downstream processing cost of extracellular lipase is high and suffers from complex separation procedures (Toldrá-Reig et al., 2020;Chuengcharoenphanich et al., 2023). Table 2 compares the examples, advantages, and drawbacks of homogeneous, heterogeneous, and enzyme catalysts in biodiesel production while Table 3 shows some examples of the application of various categories of catalysts for biodiesel generation.

Introduction to machine learning in biodiesel research
Machine learning (ML), also known as predictive analytics, is a domain of computer science dedicated to imitating and predicting the working of humans towards improving its accuracy. It is often referred to as a branch of artificial intelligence (AI) that empowers machines to learn from previous data and past experiences to discover recurring patterns and make informed predictions with the least possible interventions and involvements from humans (El Naqa and Murphy, 2015). It involves the utilization of algorithms and statistical models in the training of the computer system to evaluate supplied data and draw inferences for key decision-making with a view to progressively improving accuracy. The development and training of ML algorithms can be achieved through supervised learning, unsupervised learning, semi-supervised learning, or reinforcement learning. No doubt, ML is among the fastest growing applications of data science with increasing practical applications in the healthcare industry, financial sector, retail sector, social media, travel industry, and research (Anand et al., 2022). As a result of the continued relevance and applications of ML technologies in our contemporary world, the market share has continued to increase steadily. The global ML service that was worth USD 2.4 billion in 2019 became USD 15.47 in 2021, and has been predicted to rise to USD 58.26, USD 157.49, and USD 305.62 in 2025, 2028, and 2030 (Figure 7) (Precedence Research, 2022a). The relevance of ML became more pronounced in the aftermath of the recent COVID-19 global pandemic which restricted movements and person-to-person contact. The driving factor behind the increased investment and market value is the utilization of ML and other data science technologies in decision making in retail, automotive, transportation, e-commerce, manufacturing, banking, diagnostic, therapeutic, cyber security, and education sectors.
Major benefits of the adoption of ML technologies include its reliability, creativity, accuracy, and multidimensional applications. The design and development of models and algorithms have resulted into reduction in workload and time thereby encouraging faster and more reliable decision making. The use of ML algorithms has led to accuracy and efficiency in service delivery without human intervention. Many multifaceted tasks in most uncertain and inhuman environmental conditions are easily handled by ML technologies. ML algorithms have the capacity to handle large quantities of data, identify trends, and utilize the information to make informed decisions with optimal accuracy (Wuest et al., 2016;Wazid et al., 2022). However, for improved accuracy and reliability, ML requires enormous unbiased data sets to train on and sourcing for these data can be a herculean task. Another drawback for the adoption of ML technologies is the requirement of time and computer infrastructure to allow the algorithms to learn, understand, and develop enough capacity to correctly interpret the generated outcomes and render reliable decisions that suit the purpose. Though ML makes decisions after rigorous training, there is still high susceptibility of errors, especially when there are no sufficient wide-ranging data sets to train from. Such inaccuracies can lead to other chains of errors which may take long time to detect and rectify (Aliramezani et al., 2022). Due to increased application of ML algorithms, diverse approaches including Artificial Neural Networks (ANN), Genetic Algorithms (GA), Linear regression, Random Forests regression (RF), and Support Vector Machines (SVM), etc. Have been exploited to monitor, predict, optimize, control, and take decisions in critical sectors of biodiesel research. ANN, a domain of AI, is a computational network constructed based on neurobiology networks and works like the human nervous system (Kukreja et al., 2016;Techopedia, 2022). Just like the human brain, ANN consists of neurons which are interconnected by some unidirectional communication channels called synapses ( Figures 8A,B). The input data are produced by neuron as input signals and transferred to the network where they are amplified by their associated synaptic weights and released as output data. Generally, the ANN has three layers, as shown in Figure 9. The input layer receives data in form of signals from an external body and gives out its output to an intermediary nodes called the hidden layer. The hidden neurons are joined to the output neuron which gives out the result of the trained data in the output layer (Aghbashlo et al., 2021). Simply put, the input layer receives input signal which is trained and processed in the hidden layer and the outcome of the training is released to the output layer as the outcome of the entire network. Though there are many types of ANN models such as perception, feed forward neural network, multilayer perception, convolutional neural network, etc., but they all abide by the same working principle. ANN is arguably the most widely used ML algorithm due to some of its basic advantages which include faulty detection, simple adaptability, accurate approximation of continuous and non-linear function, capacity to perform multiple tasks simultaneously, huge data storage ability, and superb fault-tolerance properties. However, ANN is hardware dependent, network structure cannot be predetermined but achieved only through experience, trial, and error, and the duration of the network is unknown. Can approximate any continuous function to any desired accuracy (Walczak, 2019).
In recent years, researchers have carried out meaningful investigations of the applications of ML technologies and other similar models for biodiesel research.  Gupta et al. (2021), and Suvarna et al. (2022) have shed some lights on the subject matter. However, only a handful of the aforementioned interventions has significantly touched on the application of ML for the modelling, prediction, and optimization of process parameters for biodiesel synthesis. Up till now, efforts to fill this obvious research gap has not yielded required results due partly to inadequate manpower, infrastructural deficit, and other challenges in the data science research space. This further justifies the current intervention.
The process for the application of ANN models usually involves data collection, data processing, building network architecture,   Aghbashlo et al. (2021) presented the major steps to be followed in developing a typical ANN model for biodiesel research (Figure 10). The collection and preparation of data is arguably the most important factor in determining the attainment of on effective modelling process. In order to ensure the efficiency and integrity of the model, enough data needs to be collected and the data must be from trusted sources (Bali and Singla, 2022). The data collected must be analysed to enhance the training process and boost the integrity of the ANN model. The precision and dependability of the outcome needs to be authenticated to ensure it meets the expectations. Various statistical parameters including standard deviation, standard deviation of error, mean square error, root mean square error, square of Pearson correlation coefficient, etc. Have been established to measure the correctness and reliability of the model result. Table 5 compiles some of the statistical parameters commonly used to verify the accuracy and reliability of ANN and other ML models.

Process parameters affecting biodiesel production
After making the choice of raw materials (feedstock, catalyst, and alcohol), production technique, and reactor the choice of process parameters is an important decision that must be made. The choice of production process parameters has a direct effect on the conversion efficiency, yield, and overall success of the biodiesel production. Factors such as alcohol to oil ratio, catalyst concentration/dosage, reaction temperature, and reaction time play significant roles in the success or otherwise of biodiesel synthesis. Making the right decision to ensure optimal conversion efficiency and product yield among these competing factors is not a straightforward task. The use of trial by error experimental methods to determine the optimal production process parameters is cumbersome, complicated, expensive, timeconsuming, and often results in acute material wastage. The application of data-based ML technology has been adopted to • Environmentally benign process Wancura et al.

Frontiers in Energy
Research frontiersin.org make informed decisions that will guarantee value for money on biodiesel production. The input variables to the ANN model used are alcohol to oil ratio, catalyst concentration/dosage, reaction temperature, reaction time, stirring/agitation speed while the output variable is conversion efficiency or biodiesel yield.

Alcohol to oil ratio
The alcohol to oil ratio is an important factor influencing the conversion efficiency or biodiesel yield. During the transesterification process, the oil and catalyst are dissolved and dispersed in the alcohol. Alcohol provides the medium for the triglycerides to react with catalyst for the formation of biodiesel and glycerol. As shown in Figure 4, 1 mol of glycerides is needed to react with 3 mol of alcohol to generate 3 mol of biodiesel and 1 mol of glycerol. Therefore, to increase the biodiesel yield, a higher alcohol to oil ratio is essential (Okolie et al., 2022). However, very high alcohol to oil ratio can result in having excess methanol in the system thereby leading to higher production cost, difficult separation and purification process, low conversion efficiency, and ultimately poor biodiesel yield during transesterification reaction. Alcohol, especially methanol possesses a polar hydroxyl group which stimulate the emulsification of glycerol and biodiesel during transesterification reaction. The emulsification of the products facilitates the backward reaction thereby reducing conversion efficiency and biodiesel yield (Gupta and Pal Singh, 2022). Generally, methanol to oil ratio of 6:1 and ethanol to oil ratio of 9:1 are recommended as the optimal molar ratio for optimal biodiesel production (Musa, 2016;Abusweireh et al., 2022). However, depending on the choice of feedstock, catalyst type, and other process parameters, ANN and other statistical tools are employed to simulate the process towards arriving at the optimal alcohol to oil molar ratio to ensure optimal conversion efficiency and biodiesel yield.

Catalyst type and concentration/dosage
Though biodiesel can be generated in the absence of catalysts. The application of catalyst in biodiesel production by transesterification helps to reduce the activation energy, minimize energy consumption, lower reaction time, improve conversion efficiency and ultimately significantly increase biodiesel yield (Abdelmigeed et al., 2021;Awogbemi and Kallon 2022b). Various classes of catalysts i.e. alkali, acid, enzyme, biobased, heterogeneous, homogeneous, etc. Have been utilized for the transesterification of glycerides to biodiesel. The choice and concentration of catalysts greatly affect the quality and purity of product, production cost, ease of separation and purification, conversion rate, conversion efficiency, and biodiesel yield. For example, biodiesel yields of between 92% and 98% are achievable even with low grade and high FA content feedstock. Whereas, lower biodiesel yields are recorded with the same feedstock when base catalysts were used, under the same production conditions. Similarly, the application of heterogeneous catalysts results in higher conversion efficiency and biodiesel yield when compared with most homogeneous catalysts (Sitepu et al., 2020).
In most cases, higher concentration/dosage of catalysts results in higher biodiesel yield. However, the disproportionate application of catalysts can be counterproductive as it results in emulsification the and creation of highly viscous products mixture. This exacerbates product separation and purification, promotes saponification reaction, and substantially reduces biodiesel yield (Nayab et al., 2022). Transesterification reactions with inadequate quantity and concentration of catalyst proceed slowly, witness unreacted feedstock, generates low quality biodiesel and low biodiesel yield. On the other hand, excess dosage of catalysts in transesterification reaction leads to agglomeration, reduces mass and surface interaction among the  ANN structure. Adapted from (Javapoint, 2021).

Frontiers in Energy Research
frontiersin.org reactants, and significantly impacts biodiesel yield (Mofijur et al., 2021;Xie and Li, 2023). While most researchers recommended between 2-10 wt% catalyst dosage will achieve between 75% and 100% biodiesel yield, the deployment of appropriate statistical tools to measure the reliability and validity of the ML models towards ensuring cost effective and optimal biodiesel yield (Teo et al., 2022;Xie and Li, 2023).

Reaction temperature
The temperature of a transesterification process plays a key role in the conversion of feedstock to biodiesel, biodiesel yield, and reaction kinetics. An increase in the reaction temperature reduces the viscosity and volatility of the reactants, encourages the miscibility of the reacting materials, and enhances the molecular interaction among the oil, alcohol and the catalyst. High temperature of the reacting medium also contributes to the thinning of the oil and enhances the dissociation of the biodiesel and glycerol (Bashir et al., 2022). However, excessive reaction temperature can also be counterproductive to the transesterification reaction. Apart from high energy cost, elevating the reaction temperature above the permissible threshold may result in saponification, complete hydrolysis of the esters into cognate acid and alcohol, and escalation of the backward reaction. The choice of alcohol should also be considered when determining the reaction temperature. Of the two alcohol commonly used in transesterification reactions, ethanol seems to tolerate higher temperatures than methanol. While a temperature of about 78°C is permissible with ethanol, 65°C is the recommended optimum reaction temperature for transesterification reaction involving methanol. Operating at temperature below 65°C will leads to slow reaction rate, poor conversion efficiency and low biodiesel yield while escalating the reaction temperature above 70°C will exacerbate methanol evaporation, lower the methanol to oil ratio, and hinder biodiesel synthesis reaction (Mathew et al., 2021;Stevanato et al., 2023).
The decision on reaction temperature must be taking along with other process parameters as the factors are interrelated and interdependent. In a separate research, Ahmad et al. (2019), Gimbun et al. (2013), and Al-Saadi et al. (2020) reported biodiesel yield of 98.6%, 96.9%, and 95.1% when they synthesized biodiesel by transesterification at reaction temperature of 59°C, 65°C, and 70°C, respectively. They concluded that other factors such as type of feedstock, catalyst, grade of feedstock, methanol to oil ratio, catalyst concentration, etc. Influenced the reaction temperature. One of the sustainable panacea to this conundrum is the deployment of ML models, simulation techniques and statistical tools to bring up an optimal reaction temperature for improved biodiesel yield.

Biodiesel research domain Aim (s) Model used References
Biodiesel property Estimation of the iodine value of biodiesel using fatty acid methyl ester profiles. LSSVM, ANFIS, MLPNN, and DT Huang et al. (2022) Biodiesel engine performance Determination of the biodiesel blend ratio for optimal engine performance and reduced emission ELM, LS-SVM and RBFNN Wong et al. (2013) Biodiesel engine performance Prediction of the combustion characteristics of a diesel engine fuelled with biodiesel ANN Can et al. (2022) Biodiesel characterization Characterization of biodiesel using machine learning models ML Chen et al. (2023) Biodiesel production Optimization of biodiesel production process parameters Huber regression, LASSO, SVR and ANN Abdelbasset et al. (2022) Biodiesel properties and combustion • Evaluation of properties of biodiesel feedstocks ANN Thangaraja et al. (2023) • Prediction of the thermo-physical properties of biodiesel Biodiesel production Optimization of biodiesel Linear regression, MLP, KNN Gautam et al. (2022) Production process Biodiesel production Modelling and optimization of biodiesel yield ANN Soltani et al. (2022) Biodiesel production Modelling of biodiesel production process ANN, GA Treve et al. (2022) Frontiers in Energy Research frontiersin.org

Reaction time
Reaction time measures the entire duration of the transesterification reaction. It is a principal consideration in any chemical reaction as it is contingent on the type of catalyst, grade of feedstock, size of the reactor, quantity of the reactants, and reaction temperature (Bashir et al., 2022). Choosing adequate reaction time ensures sufficient product formation and the entire reaction is completed before within the reaction duration. The duration of the reaction should be such that there is adequate time for the diverse reacting materials to mix, interact and react with each other for product formation. Biodiesel yield has been observed to increase with reaction time as more time is allowed for the reactions to take place. In a separate study, Elkelawy et al. (2020) and Narowska et al. (2019) reported a corresponding increment in biodiesel yield with increased reaction time. They attributed this trend to the availability of enough time for the reactions to be completed. With prolonged reaction time, however, they reported a decline in biodiesel yield due to the attainment of equilibrium and activation of the formation of soap.

Mixing/agitation speed
The intensity of mixing of the reactants significantly affects the formation of the product in any chemical reaction. During transesterification, the reacting materials are mixed and agitated to ensure that are well dispersed, homogeneously distributed, and uniform reaction across the reactor. In cases where the triglycerides are completely soluble in alcohol, the reaction might occur only at the interface and not in all sections of the reactor. This often leads to slow reaction, unreacted materials at some part of the reactor, and poor yield. Mixing of the reacting materials ensures improved solubility of the oil in methanol, better reaction at all sections of the reactor, and enhanced product formation. Tabatabaei et al. (2019) experimented the effect of mixing speed of 200 rpm, 400 rpm, and 600 rpm on biodiesel yield while Likozar and Levec (2014) studied 100-600 rpm mixing speed. They reported that increased mixing speed leads to better product yield and that a speed of about 400 rpm is the optimal speed for maximum biodiesel yield. Higher mixing speed favours soap formation and higher energy consumption.

Application of machine learning technologies in optimization of process parameters
The application of ML technologies such as ANN, RSM, Adaptive Neuro-Fuzzy Inference System (ANFIS), etc. For predicting and optimizing biodiesel production process parameters has gained traction in recent years. This is due to their capability to predict and optimize the process parameters accurately. In a separate study, Tan Y. H. et al. (2019), Fangfang et al. (2021), and Farobie et al. (2015 applied ANN to model and predict biodiesel yield using Jatropha oil, waste cooking oil, and canola oil, respectively, as feedstock. They reported the predicted results agreed with the experimental results and the application of the ANN model eradicated material wastage and saves time. The experience of other scholars in the application of ANN to modelling, prediction, and optimization of the various variables that affect the generation of biodiesel shows that the technique is accurate, can model linear and non-linear situations, and has the capacity to store information on the entire network. However, ANN has been criticized for poor interpretability, not easily explainable, and demand lots of data for training, validation, and testing. Also, ANN is hardware dependent and requires high degree of computational skill and power. These drawback has necessitated the use of other tools. Among several other modeling and optimization tools, RSM has gained prominence and wide acceptability. The RSM is a mathematical and statistical technique for the optimization of a process whose response or output is affected by various factors or variables. The dependent variables are the responses or output while the independent variables are the input or the predictor variable (Anggoro et al., 2022). The utilization of RSM technique for modeling, prediction, and optimization of biodiesel yield was investigated by Wahidin et al. (2018), Yesilyurt et al. (2019), and Anwar et al. (2018). They exploited RSM to model and optimize the process parameters in the transesterification of Nannochloropsis sp. Biomass, mustard seed oil, and stone fruit seed oil, respectively, to quality biodiesel. They reported R 2 = 0.96912 and Adj. R 2 D = 0.95059, R 2 = 0.9818 and Adj. R 2 D = 0.9649, and R 2 = 0.9781 and Adj. R 2 D = 0.9386, respectively, to confirm the accuracy and the agreement of the model with the experimental data. Similar studies by Hashemzehi et al. (2022) andSingh Pali et al. (2021) on the application of RSM architecture confirmed the that the predicted biodiesel yield was accurate and validated by the experimental results. These studies

Frontiers in Energy Research
frontiersin.org confirmed that RSM is a viable technique for optimizing the process reaction parameters in the transesterification process for maximum biodiesel yield. The trajectory of research has move to comparison between the ANN, RSM, and other similar techniques. This is to help researchers decide and select better and cost effective techniques for prediction, modelling, and optimization with a view to further advancing biodiesel production research (Samuel et al., 2020). investigated and compared the application of RSM and Grey wolf optimizer (GWO) to model and optimize biodiesel yield during the transesterification process of waste sunflower oil. The authors reported that two the two techniques were reliable in predicting waste sunflower biodiesel yield, GWO was found more accurate and user friendly. Similarly, Fayyazi et al. (2015); Kolakoti et al. (2020) compared RSM and genetic algorithm (GA) and reported that the GA was faster, more efficient, and displayed 4.96% more improvement over the RSM technique. The desirability of Adaptive neuro-fuzzy inference system (ANFIS) integrated with GA and RSM were investigated by (Chizoo et al., 2022). The authors reported that ANFIS-GA demonstrated better predictive capability, faster, and cheaper process than RSM. In the same vein, Jisieike et al. (2023) and Ishola et al. (2019) in their separate studies, confirmed that though ANN and RSM possess the capability to model and predict biodiesel yield, ANFIS is more robust, accurate and faster than ANN and RSM.
The application of both ANN and RSM for predicting and optimizing biodiesel production process has gained traction among researchers. This might be due to their accuracy, robustness, and easy usability. Several authors have combined the two techniques to model and predict biodiesel yield. In their respective works, Maran and Priya (2015), Sarve et al. (2015), Prakash Maran and Priya (2015), and Rajković et al. (2013) compared the capability and the predictive efficiency of ANN and RSM using muskmelon oil, sesame oil, neem oil, and sunflower oil, respectively. They unanimously concluded that ANN model is more reliable, precise, and outperformed RSM models and therefore widely used in modelling biodiesel production. Other techniques such as extreme learning machine (ELM) and a support vector machine (SVM), extreme learning machine (ELM), artificial bee colony (ABC), etc. Have been applied to predict and model biodiesel production. For example, ELM was applied to model and optimize biodiesel generation from Ceiba pentandra oil by microwave irradiation assisted transesterification process. The ELM technique was found effective and resulted in high biodiesel yield (Silitonga et al., 2020). Faizollahzadeh Ardabili et al. (2018) also reported successful utilization of SVM-RSM and ELM-RSM approaches for the optimization of the biodiesel production process. The hybrid methodologies performed creditably and showed high estimation capability in optimizing quality biodiesel production process. The application of ABC algorithm for the optimization of biodiesel  (Khair et al., 2017;Ofoefule et al., 2019;Galvan et al., 2020).

Parameters Abbreviation Description Formula
Standard Deviation SD Quantitative measurement of the amount of variation or dispersion of a set of values        Rostami et al. (2016). The ABC algorithm technique demonstrated high accuracy in estimating the optimal process parameters for biodiesel production. The summary of the application of ML technology in the optimization of process parameters for biodiesel production is shown in Table 6. 6 Implications, challenges, and future research direction Demand for affordable, easily accessible, and environmentally benign energy sources will continue to increase in the foreseeable future. There will be more pressures on the fuel refiners, energy producers, researchers, and all stakeholders to escalate energy production to meet up with the global energy demand. Therefore, researches geared towards making more renewable energy available will continue to be at the front burner. Biodiesel, a prominent member of renewable fuel, will continue to occupy a prime place in the renewable energy space. The current research is therefore relevant to the contemporary times as it provides an updated information in the research domain with a view to provoke the interest of researchers in the field. Bearing in mind the continuous application of innovative technologies is diverse fields, including R and D, the application of ML and similar technologies will simplify and create technology-based techniques for the expansion of the biodiesel research. The deployment of modelling, simulation, numerical, mathematical, statistical, and optimization techniques in biodiesel production will contribute towards democratizing biodiesel production and energy sufficiency.
Researchers and biodiesel refiners must evolve fast, cost effective, and efficient methodologies that will not waste materials (Goswami et al., 2022). Efficient conversion of various natural oil, waste oil, animal fats and other feedstocks into biodiesel will continue to attract the attention of researchers and biodiesel refiners alike. The application of ML and other novel technologies for the modelling, prediction, estimation, and optimization of biodiesel; production will continue to be in demand. Ml technologies will continue to be preferred to mathematical and numerical models in modelling biodiesel yield. The implication of continuous use of ML technologies in advancing biodiesel research will be manifested in the discovery and utilization of other novel and easy to use technologies to advance biodiesel research. The implication of the current research efforts in the application of ML technologies for biodiesel research will be escalation of more research particularly in cost effective production strategies and ecofriendly utilization pathways (Albuquerque et al., 2022).
Real life production of biodiesel is very challenging. There are several non-linear factors and unpredictable occurrences that can affect the process. The use of simulation, optimization and modelling tools to such as ANN, ML, numerical, mathematical, multiphysics, etc. Are also not straightforward and requires painstaking efforts and concentration to be able to survive with the high degree of uncertainty and complexity involved. Selection of raw materials, choice of production process and methods, choice of reaction conditions, optimization of input variables, improving the procedures, intensification and upscaling of the production and purification infrastructure are crucial to taking biodiesel production to the next phase. The economic, infrastructural, manpower, environmental factors, policy formulation and implementations, biodiesel mandates and standards and other factors connected to biodiesel value chain needs holistic interrogation and improvement. The infrastructural requirements of modelling, prediction, and optimization of biodiesel generation process must be met and developed to meet the demands of the new age.
There is no doubt that the introduction of ML technologies and other prediction and optimization tools have been beneficial to biodiesel research. However, it is not yet a flawless and total solution that can solve all the complexities involved in the research domain. There are obvious inherent shortcomings that must be tackled headlong. For instance, there are challenges in the interpretation of ANN, RSM, and other statistical results. Some of the results are not completed, cannot stand alone, and difficult to interpret. Some of these need to be fine-tuned, analysed, and calibrated according to the training data. ML technologies are designed to monitor biodiesel production, purification, characterization, and utilization processes in a real life. More studies are required to develop and upgrade ML technologies and similar tools to be able to monitor and control biodiesel quality and maintain standards. The impact of biodiesel production and utilization on the environment, ecosystem, humanity, bioeconomy and biodiversity needs further scrutiny.

Conclusion
Population explosion, need for renewable energy, and environmental sustainability have continued to ensure increased demand for renewable fuel. Biodiesel, a form of liquid biofuel is a renewable, cost effective, biodegradable and environmentally benign fuel capable of replacing the fossil-based diesel fuel. Production of biodiesel involved several non-linear factors that must be considered to ensure the effective conversion of the diverse feedstocks to quality biodiesel. Factors such as choice of feedstock, state of feedstcoks, choice of catalysts, method of conversion, reaction time, process temperature and pressure, stirring or agitation speed, alcohol: oil molar ratio, catalysts concentration, catalysts particle size, choice of reactor, etc. Affects not only the conversion efficiency, but also biodiesel yield, biodiesel properties, and application.
In this study, we have examined the use of ML technologies in modeling, predicting, estimating, and optimizing biodiesel production. We surveyed the trend in global biodiesel production and market size and other issues relating to biodiesel production including feedstock, techniques, and catalysts for biodiesel production. A brief introduction to ML and other technologies for predicting, estimating, and optimizing biodiesel production. Specifically, we considered the five major factors affecting biodiesel production such as methanol: oil ratio, reaction temperature, reaction time, catalyst concentration, and agitation speed. The use of technologies such as ANN, RSM, ANFIS, GWO, GA, SVM, and other ML technologies commonly used in biodiesel research space was studied. Published works on the applications of ML technologies for the prediction, estimation, and optimization of biodiesel yield were interrogated and summarized. The implications, challenges, and future research trajectory of ML application in optimizing biodiesel Frontiers in Energy Research frontiersin.org yield were offered. The outcome of this current study will help to stimulate further investigations on the application of novel and innovative technologies that can enhance biodiesel production. After a careful study, it is, therefore, safe to conclude as follows.
• The most influential factors commonly optimized to increase biodiesel yield are alcohol: oil molar ratio, reaction time, process temperature, catalysts concentration and dosage, and stirring or agitation speed. • The application of ML technologies, statistical techniques, modelling, and optimization tools in biodiesel research has opened a new vista in biodiesel production research. • Biodiesel is one of the most popular and easy-to-produce biofuels. Applying ML and other innovative technologies to its production will stimulate more interest in biodiesel synthesis and utilization. • As a way of recommendation, the deployment of ML technologies for monitoring, controlling, predicting, stimulating, and modelling of biodiesel production processes should first be implemented in a laboratory scale before escalating the methodology to large industrial scales. This way, challenges and problems are easier to mitigate and at lower cost.
There is a necessity for the development and implementation of simple, easy-to-use, and robust advanced technologies for the monitoring and controlling of the entire biodiesel production ecosystem. More training and capacity building are recommended to ensure better understanding and wider application of technologies in the entire biodiesel value chain, including raw materials selection, oil extraction, feedstock pre-treatment, reactor selection and configuration, biodiesel purification and quality assurance, biodiesel utilization, and emission reduction in biodiesel fuelled engines. Targets policies and programmes aimed are encouraging production and utilization of biodiesel for diverse applications towards decarbonisation of the environment should be enacted and consistently implemented. Government should democratise and encourage the use of biodiesel and other biofuels, particularly at the household levels to relieve the pressure on the fossil-based diesel fuels and its attendant implications.

Author contributions
OA, conceptualization, methodology, literature survey, original draft writing, reviewing, editing, project administration, corresponding author. DVVK, reviewing, editing, supervision, project administration, funding acquisition. Both authors approve the version of the article submitted.