ORIGINAL RESEARCH article

Front. Energy Effic., 25 November 2024

Sec. Energy Efficiency Applications

Volume 2 - 2024 | https://doi.org/10.3389/fenef.2024.1502854

A novel simulation and supervised machine learning-based prediction framework to predict the total transportation and energy costs for single-family households

  • 1. Management and Marketing, Texas A&M University Central Texas, Killeen, United States

  • 2. Information Systems and Operations Management, Ball State University, Muncie, IN, United States

  • 3. Paseka School of Business, Minnesota State University Moorhead, Moorhead, MN, United States

Abstract

This paper focuses on predicting the total transportation and energy costs (TTEC) for single-family households. A system boundary consisting of grid-powered electricity (GE) and solar-powered electricity (SE) as energy inputs and transportation vehicles that include Gasoline Vehicles (GV) and Electric Vehicles (EV) as transportation methods for energy outputs is studied. A novel three-stage evaluation framework is proposed to predict the TTEC under varying single-family household parameters. In the first stage, an energy balance simulation model is proposed to estimate the TTEC for an individual household. In the second stage, the simulation model is run several times under varying parameters to develop synthetic data that is used as input for the third stage supervised machine learning (SML) models. In the third stage, numerous SML models are trained and tested to determine the best SML model that enables us to predict the TTEC with high accuracy. This best SML model can be used as a substitute for simulation model, thereby reducing the computation burden of running the simulation model for each new single-family household. A case study of single-family households in Central Texas in the US is used as an application of the framework. The results indicate that regression SML models are best in predicting the total costs with an adjusted R-squared of 99.13% and 98.88% on training and testing datasets, respectively. In addition, the parameter analysis of regression SML models suggests that the house size, number of GVs, number of EVs, EV and GV ownership costs, and solar implementation at households are the most important parameters to predict TTEC for single-family households. Counterintuitively, number of residents, GV and EV mileage, solar system size, battery capacity and peak solar hours are not significant parameters that contribute to TTEC prediction.

1 Introduction

Over the past few decades, gasoline vehicles (GVs) have dominated transportation for single-family households. While GVs are cost-effective, they emit significant greenhouse (GHG) gas emissions. In addition, increasing transportation needs of public, limitations in fuel prices, fluctuations in transportation fuel prices along with increasing public calls for sustainability initiatives have driven governments across the world to seek alternatives for GVs (Falahi et al., 2013; Jones et al., 2023; IEA, 2024). Consequently, electric vehicles (EVs) have emerged as an environmentally friendly option, with many governments offering tax incentives or subsidies to promote their adoption and make them more economically competitive with GVs. As EV technology advances and costs decrease, households are gradually integrating EVs into their transportation choices (Ajanovic, 2015). However, due to charging time challenges, EVs are often used for short-distance travel, leading to a mix of transportation methods at the household level, including all GV, a combination of GV and EV, or all EV, depending on preferences.

These varying transportation combinations also result in different energy requirements. For example, households with only GVs need both electricity for residential use and gasoline for transportation, while those with only EVs require electricity for both purposes. Therefore, selecting the optimal transportation mix is important for meeting energy needs efficiently.

Historically, single-family households have relied on grid-powered electricity (GE), which is typically generated from non-renewable sources, contributing to environmental harm. With the rise in EV usage, electricity demands have surged, increasing GHG emissions and straining the grid, sometimes causing blackouts during peak times. In response, governments worldwide have promoted solar-powered electricity (SE) as a renewable, eco-friendly supplement to GE, reducing pressure on the grid. Tax incentives and subsidies are now available for households that adopt solar energy. Thus, it is crucial for households to find the right balance between grid and solar power to meet their electricity demands effectively.

Given the wide variety of options available for single-family households for fulfilling their transportation and electricity needs, single-family households typically make their transportation and electricity requirement decisions based on total transportation and energy costs (TTEC). The TTEC for a typical single-family household depends on several parameters that include, but not limited to number of residents, house size, number of GVs, number of EVs, GV and EV ownership costs, solar system implemented at a household or not, and solar system size. Consequently, it is important to understand which are important factors that contribute to the estimation of the TTEC. A comprehensive review of literature raises three important research questions as there is still ambiguity in estimation of TTEC for different combinations for single-family households. These research questions include:

  • 1. Is there really any significant difference in TTEC costs when different combinations of electricity inputs and transportation methods as energy outputs are considered for a single-family household?

  • 2. Is there a methodology that can help to predict TTEC for single-family households with a variety of parameters?

  • 3. What are the important factors that contribute to the prediction or estimation of the total costs for single-family households?

To address these research questions, in this study, we examine a holistic system that considers GE and SE systems as electricity inputs and EVs and GVs as transportation methods for energy outputs. Our study aims to predict the TTEC for any given single-family household with a specific set of input parameters. We propose a novel three-stage prediction framework in which we first develop an energy balance simulation model to estimate TTEC for individual single-family households. Then, in the second stage, we run the model several times with varying parameters and develop synthetic data to train supervised machine learning (SML) models. In the final stage, different SML are trained and tested to determine the best SML model that can be implemented in real-world for TTEC prediction. It is important to highlight that once the best SML is determined, running of simulation model can be eliminated as the SML model will serve as a substitute for simulation model that will automatically predict the TTEC with high degree of accuracy as that of simulation model, thereby reducing computational effort and complexity.

2 Literature review

Over the past decade, several studies have been conducted to compare the total cost of ownership for EVs and GVs. Wu et al. (2015) create a probabilistic simulation model to evaluate and contrast electric vehicles (EVs) with fuel-powered vehicles. They find that, while EVs are approaching conventional vehicles in terms of total cost of ownership, their performance superiority depends on a variety of favorable factors. Mitropoulos et al. (2017) conduct a life cycle cost analysis to compare ownership costs across conventional, hybrid, and electric vehicles. Their findings show that hybrid vehicles outperform both conventional and electric vehicles over a wide range of life cycle distances. However, a trade-off exists: for shorter life cycle distances, EVs are more advantageous, while conventional vehicles perform better over longer distances. Similar to Wu et al. (2015), Danielis et al. (2018) designs a probabilistic simulation model to compare the total cost of ownership between electric vehicles (EVs) and gasoline vehicles (GVs) considering both stochastic and non-stochastic factors. Unlike Wu et al. (2015), who expressed uncertainty about cost-competitiveness, Danielis et al. (2018) suggest that EVs could become cost-competitive with GVs if fuel prices continue to rise, and EV retail prices continue to decrease. Weldon et al. (2018) conducts an economic analysis of fuel vehicles and EVs using a decade of data, finding that EVs are already cost-competitive with fuel vehicles. However, this study notes that this competitiveness depends on multiple factors and recommends continuing government incentives to support EV adoption until the technology fully matures. Hassan et al. (2024) perform an economic analysis to calculate the cost of ownership per kilometer, finding that EVs have a lower per kilometer ownership cost compared to gasoline-powered cars, despite their higher upfront price and battery replacement costs. The per kilometer cost for EVs decreases further when electricity rates are favorable and clean car discounts are offered. Liu et al. (2021) compare ownership costs between electric and fuel vehicles using an economic model, finding that EVs tend to be more expensive than fuel vehicles. However, their study notes that EV ownership costs become comparable to those of fuel vehicles when EVs are driven shorter distances. While there are numerous studies comparing the total cost of ownership of GV and EV, their cost competitiveness is still ambiguous and questionable.

In recent years, the total cost of EV ownership compared to GVs has been assessed by using solar-generated electricity as the main source of energy for EV charging. Accordingly, several researchers have examined EVs and solar power as integrated systems. Coffman et al. (2017) perform a life cycle assessment for a single-family household, considering both grid and solar power alongside fuel and electric vehicles. The study reveals that EVs typically have higher ownership costs than GVs. However, with subsidies for EVs and incentives for photovoltaic (PV) systems, EV ownership can become more cost-effective than GVs. Fachrizal and Munkhammar (2020) developed a quadratic programming approach for communities in high-latitude regions, where photovoltaic power production is lower and EV travel distances are greater. Their findings suggest that EV smart charging schemes can help reduce the PV load in these high latitude areas. Cieslik et al. (2021) conducted an energy balance analysis to evaluate a single-family household system integrating solar power generation with an EV. Through various scenarios, the study found that using photovoltaic (PV) systems or solar energy can be economically viable for household power needs, including EV charging, in certain scenarios compared to others. Göhler et al. (2021) assessed a multifamily household powered by both grid and solar energy, using EVs for transportation. They developed a simulation model, and the findings show that the energy self-sufficiency of a multifamily building powered by photovoltaic (PV) systems drops from 100% to 91% when EV charging is factored in. Boström et al. (2021) created a simulation model to analyze the synergy between solar energy and EVs as supply-demand dynamics for the entire nation of Spain. This conceptual study investigates a scenario in which all vehicles are electric, and energy is exclusively generated from photovoltaic systems, resulting in a completely self-sufficient energy system. After conducting a series of simulations, the study concluded that solar energy could theoretically meet all the energy requirements for both EVs and residential needs in Spain, provided that EVs are also utilized as energy storage units. Liang et al. (2022) employed a difference-in-differences (DID) model at the community level, focusing on a system powered by both grid and solar energy for EV charging. Their research concludes that the combined adoption of photovoltaic systems and EVs reduces system loads more effectively than the adoption of EVs alone. Furthermore, photovoltaic solar systems provide considerable economic advantages to consumers who use EVs. Salles-Mardones et al. (2022) performed an economic assessment on single family households in Viña del Mar, Chile by considering both grid- and solar-based electricity supply for EV charging. Numerous scenarios for solar-based electricity generation based on with and without battery storage are studied. The study concludes that smaller photovoltaic systems with battery storage capacities can cost-effectively meet the electricity demands of EVs due to lower capital expenditure of implementing solar power. Martin et al. (2022) examined single-family households that utilized solar power for electricity and employed EVs for transportation. The study used empirical analysis, focusing on performance metrics related to EV charging demand met by photovoltaic (PV) systems and CO2 emissions. The results showed that photovoltaic systems could fulfill between 15% and 90% of the energy requirements for EVs, depending on household charging behaviors and the availability of battery storage. Kassem et al. (2023) conducted energy and economic assessments for single-family households in Northern Cyprus, focusing on solar energy to meet both residential and electric vehicle charging requirements. The findings suggest that, due to the ample solar radiation in Northern Cyprus, solar energy is both technically viable and economically feasible for fulfilling these needs. Furthermore, the study indicates that using EVs in conjunction with solar energy offers greater economic advantages compared to fuel-powered vehicles.

Even though several studies have been conducted, two important insights have been provided in literature:

  • 1. When EVs and GVs total cost of ownership are compared, GV seems to be better than EV even though EV in recent years has been closing the cost-competitiveness with GV.

  • 2. Solar-powered EV is better than GV, given the fact that numerous subsidies are available to both solar power installation and EV purchases.

However, these two notions are supported by literature with a caveat that several parameters or factors should fall in favor of EV for EV to outperform GV. The study of our literature raises three important research questions that need further investigation:

  • 1. Is there really any significant difference in TTEC costs when different combinations of electricity inputs and transportation methods as energy outputs are considered for a single-family household?

  • 2. Is there a methodology that can help to predict TTEC for single-family households with a variety of parameters?

  • 3. What are the important factors that contribute to the prediction or estimation of the total costs for single-family households?

To address these research questions, in this study, we examine a holistic system that considers GE and SE systems as electricity inputs and EVs and GVs as transportation methods for energy outputs. Our study aims to predict the TTEC for any given single-family household with a specific set of input parameters. We propose a novel three-stage prediction framework in which we first develop an energy balance simulation model to estimate TTEC for individual single-family households. Then, in the second stage, we run the model several times with varying parameters and develop synthetic data to train supervised machine learning (SML) models. In the final stage, different SML are trained and tested to determine the best SML model that can be implemented in real-world for TTEC prediction. It is important to highlight that once the best SML is determined, running of simulation model can be eliminated as the SML model will serve as a substitute for simulation model that will automatically predict the TTEC with high degree of accuracy as that of simulation model, thereby reducing computational effort and complexity.

3 Materials and methods

This section presents the materials and methods used in the proposed study. We first define the scope of the system boundary used in this study and then present a novel three-stage prediction framework required to predict the TTEC for single-family households and determine important parameters that contribute significantly towards predicting TTEC. The three-stage prediction model consists of supervised machine learning (SML) models that are trained and tested by using the TTEC predictions of energy balance simulation model. While the energy balance simulation model is same as studies performed by Wu et al. (2015) and Danielis et al. (2018), the training and testing the SML models is the unique contribution of this paper to literature. The benefit of training and testing SML model is that it supplements the use of simulation model, thereby reducing the computation burden and allows the best SML model automatically predict TTEC with high degree of accuracy.

3.1 Scope of the system boundary

This research focuses on predicting TTEC for any given single-family households with any specific set of input parameters. Figure 1 presents a generic system boundary considered in this study. It consists of a single-family household that uses grid-powered electricity (GE) and Solar-powered electricity (SE) as energy inputs and gasoline vehicles (GVs) and electric vehicles (EVs) as transportation methods for energy output. The parameters considered in the study are limited to Central Texas region in US.

FIGURE 1

In Central Texas, the typical number of residents ranges between one to six for single-family households. Therefore, the number of residents for different single-family households are modelled as uniform distribution between one to six residents. The house size in Central Texas typically ranges between 1,000 and 3,500 square feet. Therefore, house sizes for different houses are modelled as uniform distribution ranging between 1,000 and 3,500 square feet. The vehicles considered at each single-family household can be of any number between one to four which is typical of Central Texas region. In addition, the vehicles can be of any combination of GVs and/or EVs. For a typical household, several correlations exist between different parameters. The correlation between two different parameters is established by using Equations 1, 2 (Gonela et al., 2020). In Equations 1, 2, represents the desired correlation coefficient between two parameters, and represent the random variables for parameter 1 and parameter 2. In addition, represent the minimum and maximum values for parameters 1 and 2.

For each household, a correlation between number of residents and house size as well as correlation between number of residents and number of cars is established by considering . Once the number of cars is determined, we assume that the chance of cars being a GV and/or EV is 0.50. The correlation cost of ownership costs for EV and GV vehicles with mileage and resident size are also established using Equations 1, 2. It is to be noted that GVs with higher mileage are typically hybrid vehicles, which are also accommodated in the study.

On the energy supply side, the electricity needs of each household can be fulfilled by using a combination of GE and SE. For each household, the probability of having a SE is assumed to be 0.40. A photovoltaic solar system with battery storage is considered in this study. Given such a structure, a novel three stage prediction framework is proposed that aims to predict the TTEC for any single-family households with specific parameters and determine important parameters that contribute towards predicting TTEC. Supplementary Appendix A1, A2 provides the input parameters used in the proposed three stage prediction framework (Aggarwal and Walker, 2024; Allen and Tynan, 2024; Betterton et al., 2024; Fields et al., 2024; Fitzpatrick and Jordan, 2024; Petroleum and Other Liquids, 2024; Raman et al., 2024; Residential Average Monthly kWh and Bills, 2024; Residential Clean Energy Credit, 2024; Roof pitch angle and slope factor chart, 2024; Solar Panel Cost, 2024; United States Environmental Protection Agency, 2022; US Monthly Total Vehicle Miles Traveled, 2024; Walker and McDevitt, 2024; Zargary, 2023; Petroleum and Other Products, 2024).

3.2 A novel three stage prediction framework

This paper focuses on predicting the TTEC for a single-family household with specific parameters by considering a system boundary that consists of GE and SE as energy inputs and GV and EV as transportation methods for energy output. A three-stage prediction framework is proposed that aims to determine TTEC for any given households as well as determine the important parameters that significantly contribute towards predicting TTEC. Figure 2 shows the three stages of the prediction framework. In the first stage, an energy balance simulation model is developed by considering various system related parameters such as number of residents, house size, number of vehicles, solar system implemented or not and many more to estimate TTEC for individual households. In the second stage, the energy balance simulation model developed in the first stage is run multiple times with varying parameters to estimate the TTEC for different households. These simulation runs help to develop synthetic data for third stage SML models. In the third stage, numerous SML models are trained by using synthetic data and the performance of different SML models on the testing dataset is analyzed to determine the best SML model that can help automate the TTEC prediction process. The best SML model also helps to determine the important parameters that contribute significantly towards predicting TTEC. Once the best SML model is determined, this best model can be used for predicting TTEC instead of simulation model, thereby avoiding the computational burden of running simulation model for each single-family household.

FIGURE 2

3.2.1 First stage: Energy balance simulation model

The first stage of the three-stage prediction framework involves developing an energy balance simulation model. This section presents the mathematical formulation of the proposed energy balance simulation model to assess TTEC for individual single-family household by considering numerous system boundary related constraints. Table 1 presents the notations of the simulations. Equations 313 presents the mathematical formulations of the simulation model.

TABLE 1

NotationsDescription
Sets
Time horizon, indexed by
Parameters
The cost of battery per unit (kwh) of storage capacity
The capacity of the battery for storing solar power-based electricity
The seasonality index of distance travelled by a vehicle
The distance travelled by each EV in time period
The distance travelled by each GV in time period
The amount of electricity consumed per resident in time period
The seasonality index for residential electricity consumption
The ownership cost per mile travelled by EV.
The amount of electricity consumed by each EV per mile
The gasoline cost per unit
The unit cost of electricity obtained from the grid
The ownership cost per mile travelled by GV.
The mileage of each GV.
The single-family household base area in square feet.
The number of EVs at a single-family household
The number of GVs at a single-family household
The number of residents in a single-family household
The power output of each solar panel in watts
The proportion of rooftop area that can used for installing solar power
The percentage of tax credit obtained on total solar cost
The residential electricity consumed per resident in time period
The roof pitch of a household expressed in slope
The estimated life of the solar system
The area of a single solar panel
The cost of installing solar panel per watt
The loss in solar power-based electricity due to environmental factors
Unrestricted Variables
The total energy and transportation cost of an individual single-family household
Positive Variables
The amount of solar power-based electricity produced in time period
The amount of solar power-based electricity stored in battery in time period
The amount of grid power-based electricity obtained in time period
The amount of solar power-based electricity used in time period
The electricity consumed by each EV in time period
The amount of gasoline consumed in time period
The peak number of solar hours in time period
The number of solar panels that can be installed on the rooftop of a single-family households
The amount of residential electricity consumed in time period
The rooftop area of a single-family household
The system size of the solar power

Notations of the excel-based simulation model.

3.2.1.1 Total cost

Equation 3 represents the total annualized TTEC of a single-family household. It consists of the following costs: (1) the ownership cost of all GVs, (2) the cost of gasoline for GVs, (3) the ownership cost for EVs, (4); the cost of obtaining grid-powered electricity, and (5) the cost of generating solar-powered electricity. It is to be noted that the solar-powered electricity consists of solar system cost, battery cost, and the tax credit obtained from government. Since the total cost is annualized, the net cost of the solar system is spread over its estimated lifespan.

3.2.1.2 Residential electricity demand

Equation 4 represents the amount of residential electricity consumed by single-family households which depends on the energy usage per resident, the number of residents, and the seasonality factor.

3.2.1.3 EV electricity demand

Equation 5 suggests that the amount of electricity consumed by EVs in a single-family household depends on the number of EVs, EV electricity usage, and distance travelled by EV.

3.2.1.4 Gasoline demand

Equation 6 indicates that the amount of gasoline consumed by GVs at a single-family household depends on the number of GVs, GV mileage, and distance travelled by GV.

3.2.1.5 Solar-powered electricity

Equation 7 allows to determine the rooftop area for a single-family household. The rooftop area depends on the household depends on the household base area and the roof pitch.

Equation 8 indicates that the number of solar panels that can be put on the rooftop depends on the rooftop area and proportion of rooftop area that can be used for solar panel installation.

Equation 9 suggests that the amount of solar power-based electricity produced depends on the number of peak solar hours, number of solar panels installed, power output of each solar panel, and loss in electricity due to environmental factors.

Equation 10 is an electricity balance equation that states the total solar power-generated electricity in the current period, combined with the electricity stored in the battery from the previous period, must equal the sum of the solar power-based electricity consumed in the current period and the electricity stored in the battery.

Equation 11 constrains the amount of solar power-based electricity stored in the battery is less than battery capacity.

Equation 12 is a measure that estimates the solar system size in watts. The solar system size depends on the power output of each solar panel and the number of solar panels installed on the rooftop.

3.2.1.6 Adding grid power-based electricity to solar-powered electricity

Equation 13 ensures that the amount of solar power and grid power electricity supplied is equal to the residential and EV electricity demand.

3.2.2 Second stage: Simulation runs and synthetic data development

The second stage of the three-stage evaluation framework involves developing synthetic data that can be used by the supervised machine learning models. To develop synthetic data, the energy balance simulation model (Equations 313) is run times, with each run indexed as with varying parameters to estimate the TTEC for several single-family households. The parameters that were varied include: (1) number of residents, (2) house size, (3) number of EVs, (4) EV ownership cost, (5) EV efficiency, (6) Number of GVs, (7) GV ownership cost, (8) GV efficiency, (9) Solar implemented or not, (10) solar system size, (11) battery storage capacity, and (12) peak solar hours. Consequently, for a SML model, the TTEC parameter of the synthetic data becomes the dependent or response variable and the twelve parameters that are varied becomes the independent or predictor variables. Once synthetic data is generated, data cleaning and initial exploratory data analysis (EDA) is performed to understand the distribution of each parameter.

3.2.3 Supervised machine learning (SML) models

Once the synthetic data is cleaned and initial EDA is performed, numerous SML models are trained by splitting the data into training and testing data. The SML models that are trained and tested in this study are: (1) Linear Regression, (2) Ridge Regression, (3) Lasso Regression, (4) Decision Tree, (5) Bagging, (6) Random Forest, (7) Adaptive Boosting, and (8) Gradient Boosting. To select the best SML model, the performance metrics that are considered are: (1) Root Mean Square Error (RMSE), (2) Mean Absolute Percentage Error (MAPE), and (3) Adjusted R-squared.

4 Results

This section presents the results of the study. Section 4.1 presents the results of the first stage energy balance simulation model including sensitivity analysis for model validation. Section 4.2 presents the initial Exploratory Data Analysis (EDA) of second stage synthetic data, which is developed by running the energy balance simulation model several times by varying several input parameters. Section 4.3 presents the results of the third stage SML models.

4.1 First stage: energy balance simulation model results

This section presents the results of the first stage energy balance simulation model for individual households. In addition, we perform sensitivity analysis on few important parameters to validate the model, i.e., the model is providing insights as intended. These insights are cross validated with the literature as well as trials by multiple experts in this field of study. In this initial first stage study, we begin our analysis by examining a single-family household consisting of two residents, with a base area of 1781 square feet and two vehicles. The study focuses on comparing the TTEC performance of a single-family household by considering the system boundary configurations shown in Table 2. It is to be noted that, based on the configuration under consideration, the household utilizes a solar system size that fulfills the entire single-family household’s electricity demand. For example, in SE + GV, we consider a solar system size that fulfills residential electricity requirement, whereas in SE + EV, we consider a solar system size that fulfills both residential and EV charging electricity needs.

TABLE 2

System boundary configurationDescription
GE + GVThe single-family household uses purely grid-powered electricity, and both the vehicles are gasoline vehicles
GE + EVThe single-family household uses purely grid-powered electricity, and both the vehicles are electric vehicles
SE + GVThe single-family household uses purely solar-powered electricity, and both the vehicles are gasoline vehicles
SE + EVThe single-family household uses purely solar-powered electricity, and both the vehicles are electric vehicles
GE + GV + EVThe single-family household uses purely grid-powered electricity and has one gasoline and one electric vehicle
SE + GV + EVThe single-family household uses purely solar-powered electricity and has one gasoline and one electric vehicles

System boundary configuration.

4.1.1 Comparing TTEC for different system boundary configurations

This section presents TTEC comparison for different system boundary configurations. In this study, the energy balance simulation model is run thirty times for each configuration to establish a confidence level for TTEC. Figure 3 shows the boxplots of the simulation runs which depicts the TTEC comparison of different system boundary configurations. It indicates that there is significant difference in TTEC for different system boundary configurations. Consistent with the results of Kassem et al. (2023) and Liang et al. (2022), the total cost of ownership for EVs when used in combination with SE is best. However, the total cost of ownership for GVs when used in combination with SE is worst, indicating that installing solar to meet only residential electricity needs can be an expensive value proposition. Consistent with the results of Coffman et al. (2017), we further observe that EVs are less expensive compared to GVs irrespective of the electricity source used indicating that the total cost of ownership for purely EVs are less compared to purely GVs. This is because numerous subsidies are provided by the government for EVs compared to GVs. In essence, different system boundary configurations for this specific single-family household can be ranked from least expensive to highest expensive as follows: (1) SE + EV, (2) SE + GV + EV, (3) GE + EV, (4) GE + GV + EV, (5) GE + GV, and (6) SE + GV. It is to be noted that this ranking is exclusively valid for this specific single-family household and may vary for single-family households that have different parameters. Figure 4 shows the cost split for each configuration, and it can be observed that the ownership cost for vehicles significantly contributes towards total cost compared to other costs. Here, the ownership cost includes all the costs except gasoline cost for GVs and electricity cost for EVs.

FIGURE 3

FIGURE 4

4.1.2 The impact of solar system size on different configurations

In this section, we perform sensitivity analysis to validate the simulation model by varying the solar system size. Figures 5, 6 presents the TTEC performance and solar reliability analysis when solar system size is varied. They indicate that the TTEC for configurations decrease and then increase as solar system size is increased. The TTEC is least when all the electricity requirement for a household is met by SE. Furthermore, it is found that integrating SE with EV outperforms all the other configurations for wide range solar system size. However, breakeven points are observed for SE + GV and SE + GV + EV where TTEC is less than certain configurations when solar system sizes are lower and higher when solar system sizes are higher. This indicates that larger solar system sizes can increase costs when used in combination of GV. This result is consistent with Salles-Mardones et al. (2022) which suggests that smaller solar systems provide higher economic benefits compared to larger solar systems.

FIGURE 5

FIGURE 6

4.1.3 The impact of GV and EV mileage on different configurations

In this section, we further validate the energy balance simulation model by performing sensitivity analysis on GV and EV mileages. Figures 7, 8 present the results when GV and EV mileages are varied. They indicate that as mileage is increased, the TTEC decreases. In Figure 7, it can be observed that SE + GV + EV outperforms SE + EV for higher GV mileage indicating that a single-family household having solar system, EV and hybrid vehicles (typically, GVs with higher mileage are hybrid) are better than solar integration with EV, which is consistent with the results of Mitropoulos et al. (2017). In addition, it indicates that GV mileage significantly impacts TTEC. Figure 8 indicates that solar integration with EV has significantly lower TTEC compared to other configurations. In addition, the TTEC is stable for wide range of EV mileage indicating that the SE + EV adds stable value proposition to the owner. In both the Figures, break even points are observed where TTEC is higher at lower mileages and lower at higher mileages for certain configurations, which is consistent with the performed by Mitropoulos et al. (2017).

FIGURE 7

FIGURE 8

In summary, consistent with the results of the literature, we observed that even though there is significant difference in different system boundary configurations with solar integration with EV being best, numerous parameters such as subsidies, solar system size, and vehicle mileage makes the results inconclusive. Consequently, we seek to the understand the TTEC under varying conditions for different single-family households with different parameters and seek to determine the important parameters that contribute significantly to TTEC prediction.

4.2 Second stage: simulation runs and synthetic data analysis

In this second stage, the energy balance simulation model is run five hundred and fifty times to generate synthetic data that can be used to train the SML models in third stage. During this second stage, we estimate the TTEC for each run by varying the following parameters in the energy balance simulation model: (1) number of residents, (2) house size, (3) number of EVs, (4) EV ownership cost, (5) EV efficiency, (6) number of GVs, (7) GV ownership cost, (8) GV efficiency, (9) solar implemented or not, (10) solar system size, (11) battery storage capacity, and (12) peak solar hours. Therefore, we obtain synthetic data consisting of five hundred and fifty records and thirteen parameters (TTEC is also a parameter in synthetic data). Once synthetic data is developed, we perform initial Exploratory Data Analysis (EDA) to understand the distribution of various parameters.

Figure 9

presents the results of the initial EDA performed on the synthetic data. It illustrates the following:

  • • The number of residents ranges between one to six with a median of three residents per household.

  • • Majority of households seem to have one EV.

  • • Majority of households have either one or two GVs.

  • • Majority of households do not have a solar system implemented.

  • • The house size seems to be normally distributed with a range between 1,000 and 3,500 square feet.

  • • GV Ownership and EV ownership cost seem to be uniformly distributed with a median of 0.57 and 0.52 per mile respectively.

  • • GV efficiency seems to be normally distributed with a mean of 40 miles per gallon. However, EV efficiency seems to be right skewed with a median of 3.12 miles per kwh.

  • • Solar system size and solar battery capacity are right skewed as majority of the households do not have solar system implemented.

  • • Peak solar hours seem to be normally distributed with two modes with peak solar hours ranging between 3.50–6.01 h per day

  • • The TTEC seems to be normally distributed with multiple modes. The total cost ranges between $6,512 - $38344. The median of TTEC of $19784 and mean of TTEC is $21211 indicating that there is a slight right skewness in the TTEC distribution.

FIGURE 9

4.3 Results of supervised machine learning (SML) models

In the synthetic data, twelve parameters that include number of residents, house size, number of EVs, EV ownership cost, EV efficiency, Number of GVs, GV ownership cost, GV efficiency, solar implemented or not, solar system capacity, battery storage capacity, and peak solar hours become the independent variables or predictors for SML models. Moreover, TTEC becomes the dependent or response variable for SML models. To build SML models, we split the synthetic dataset into training and testing datasets. We split the data into 80–20, where 80% (440 out of 550) of the data is used for training the SML models and 20% (110 out of 550) of the data is used for testing the SML models. As discussed earlier in Section 3.2.3, we train eight different SML models that include: (1) Linear Regression, (2) Ridge Regression, (3) Lasso Regression, (4) Decision Tree, (5) Bagging, (6) Random Forest, (7) Adaptive Boosting, and (8) Gradient Boosting. In addition, to select the best SML model, the performance metrics that are used are: (1) Root Mean Square Error (RMSE), (2) Mean Absolute Percentage Error (MAPE), and (3) Adjusted R-squared. Table 3 presents the training and testing performance for different SML models. At first glance, it can be observed that all the SML models studied are able to predict the TTEC for single-family households with accuracy of more than 90% on testing dataset. This indicates that any of the SML models can be used and is good enough for predicting TTEC. However, comparing RMSE between training and testing datasets for different SML models indicate that Decision Tree, Bagging, Random Forest, Adaptive Boosting, and Gradient Boosting are overfitting models as the gap between RMSE on training and testing datasets is significantly high. This indicates that these SML models will predict with higher errors and lower accuracy on new datasets. In terms of the best SML model, regression models that include Linear Regression, Ridge Regression, and Lasso regression seem to be the best SML models as the difference in RMSE, MAPE, and Adjusted R-squared values between training and testing dataset is least. In addition, as RMSE and MAPE values are least and adjusted R-squared is highest, the regression SML models will have lower prediction errors and higher accuracy on new datasets. Therefore, any of the three SML regression models can be used in the real world to estimate the TTEC for single-family households rather than simulation model. Therefore, the SML regression models can be used as substitute for simulation model. This will reduce the computational complexity and allow the regression models to train themselves as new household’s TTEC are estimated and actual total costs are realized.

TABLE 3

SML modelTraining data resultsTesting data results
RMSEMAPEAdjusted R-squared (%)RMSEMAPEAdjusted R-squared (%)
Linear Regressiona594.782.4099.13665.612.8698.88
Ridge Regressiona594.782.4099.13665.532.8698.88
Lasso Regressiona594.782.4099.13665.612.8698.88
Decision Tree001001,341.943.9595.96
Bagging395.351.3199.621,342.413.6795.46
Random Forest342.911.1899.711,351.223.4995.40
Adaptive Boosting1,048.164.2697.291719.295.7793.37
Gradient Boosting242.960.9999.85979.852.8997.85

Performance of different SML models.

a

Best SML, models.

Since, Regression SML models are best, we perform sequential forward selection of parameters, which is shown in Figure 10. It indicates that out of twelve parameters considered, six parameters are the most important parameters as there is marginal change in R-squared adjusted as the number of parameters are added after six. Chronologically, the parameters that are most important are: (1) House size, (2) Number of GVs, (3) Number of EVs, (4) GV ownership cost, (5) EV ownership cost, and (6) Solar implemented. Figure 11 presents the EDA for important parameters selected by SML regression models. It clearly shows strong correlation between these parameters and TTEC. In addition, it can be observed that as we move from most important to least important parameter, the significance of the relationship between the TTEC and the parameter seems to decrease. For instance, house size significantly impacts the total price compared to having solar implemented or not. Furthermore, a positive correlation between TTEC and all the parameters except solar implemented is observed indicating that as parameter value increases, TTEC also increases. However, if solar is implemented, TTEC decreases. The results of the study also provide counter intuitive notion, i.e., the number of residents, GV and EV mileage, Solar system size, battery capacity and peak solar hours are not significant parameters and marginally contribute to the TTEC prediction.

FIGURE 10

FIGURE 11

5 Conclusion

This paper focuses on predicting the total transportation and energy costs (TTEC) for single-family households. A system boundary consisting of grid-powered electricity (GE) and solar-powered electricity (SE) as energy inputs and gasoline vehicles (GVs) and electric vehicles (EVs) as transportation methods for energy outputs is considered. A novel three stage prediction framework is developed that aims to predict the TTEC for any given single-family household with specific set of parameters and determine the important parameters that contribute towards predicting TTEC. The first stage of the prediction framework involves developing energy balance simulation model for an individual household. The second stage of the prediction framework involves running the simulation model several times to develop synthetic data. In the third stage, several supervised machine learning (SML) models are trained and tested by using the synthetic data to determine the best SML model as well as important parameters that contribute significantly towards predicting TTEC. A case study of single-family households in Central Texas region is used as an application of the prediction framework. The results of the first stage energy balance simulation model indicate that there is a significant difference in TTEC for different system boundary configurations for a single-family household. In fact, it is found that SE integration with EVs is the best and SE integration with GVs being the worst in terms of reducing costs. Currently, the subsidies provided to both solar systems and EVs favor solar and EV integration. However, this notion of solar and EV integration being best is still ambiguous and questionable as other factors impact their performance. For example, a household having both GV and EV along with solar system seems to outperform solar and only EV integration when the GV is hybrid given their high mileage.

In the second stage, the simulation model is run five hundred and fifty times, and the initial EDA indicates that the total cost ranges between $6,537-$38344 with a mean of $21211. In third stage, eight different SML models are trained and tested that include: (1) Linear Regression, (2) Ridge Regression, (3) Lasso Regression, (4) Decision Tree, (5) Bagging, (6) Random Forest, (7) Adaptive Boosting, and (8) Gradient Boosting. The performance metrics that are used to evaluate the SML models are: (1) Root Mean Square Error (RMSE), (2) Mean Absolute Percentage Error (MAPE), and (3) Adjusted R-squared. The results of the third stage indicate that regression SML models are best in predicting the total costs with an adjusted R-squared of 99.13% and 98.88% on training and testing datasets, respectively. In addition, the parameter analysis of regression SML models suggests that the house size, number of GVs, number of EVs, EV and GV ownership costs, and implementation of solar at households are the most important parameters that contribute significantly towards predicting the TTEC of a single-family household. Counterintuitively, number of residents, GV and EV mileage, Solar system size, battery capacity and peak solar hours are not significant parameters and marginally contribute to the TTEC prediction. In summary, since the best SML regression model is trained and tested with the energy balance simulation model synthetic data, the SML regression model can be used as a substitute for simulation model, thereby avoiding the computational burden of running simulation model for each new single-family household.

Even though the SML regression models predict with high degree of accuracy, the study has several limitations. First, the sample size of the synthetic data can be increased. Second, the study fails to consider variability in numerous other parameters such as variability in prices of gasoline and electricity as well as variability in distances travelled by households. In addition, different models can be developed for with and with solar to study whether parameters such as solar system size, battery capacity, roof pitch, and peak solar hours impact the SML model performance. Therefore, future research will include expanding the study to fill these gaps.

Statements

Data availability statement

The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author.

Author contributions

VG: Conceptualization, Data curation, Formal Analysis, Investigation, Methodology, Validation, Writing–original draft, Writing–review and editing. AO: Formal Analysis, Investigation, Methodology, Software, Validation, Writing–review and editing. RS: Conceptualization, Formal Analysis, Investigation, Methodology, Validation, Writing–original draft, Writing–review and editing.

Funding

The author(s) declare that no financial support was received for the research, authorship, and/or publication of this article.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declare that no Generative AI was used in the creation of this manuscript.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fenef.2024.1502854/full#supplementary-material

References

Summary

Keywords

simulation, supervised machine learning, energy costs, transportation costs, solar-powered electricity generation, electric vehicles, gasoline vehicles

Citation

Gonela V, Srinivasan R and Osmani A (2024) A novel simulation and supervised machine learning-based prediction framework to predict the total transportation and energy costs for single-family households. Front. Energy Effic. 2:1502854. doi: 10.3389/fenef.2024.1502854

Received

27 September 2024

Accepted

11 November 2024

Published

25 November 2024

Volume

2 - 2024

Edited by

Zhiming Gao, Oak Ridge National Laboratory (DOE), United States

Reviewed by

Youssef Kassem, Near East University, Cyprus

Jinghui Yuan, Oak Ridge National Laboratory (DOE), United States

Updates

Copyright

*Correspondence: Vinay Gonela,

Disclaimer

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Outline

Figures

Cite article

Copy to clipboard


Export citation file


Share article

Article metrics