Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Commun. Netw., 07 January 2026

Sec. IoT and Sensor Networks

Volume 6 - 2025 | https://doi.org/10.3389/frcmn.2025.1732098

Optimization of cloud resource demand forecasting and investment decisions in the context of digital transformation

Zhuolin ChenZhuolin Chen1Yuanchun TangYuanchun Tang2Fei WuFei Wu1Xujing WangXujing Wang3Minghui Xia
Minghui Xia3*Ziren WangZiren Wang3
  • 1Digitalization Department, State Grid Fujian Electric Power Company, Fuzhou, China
  • 2Economic and Technical Research Institute, State Grid Fujian Electric Power Company, Fuzhou, China
  • 3Department of Economics and Management, North China Electric Power University (Baoding), Baoding, China

With the rapid advancement of digital transformation, enterprises face escalating challenges in cloud resource allocation due to dynamic workloads and substantial capital investments. Existing forecasting models often overlook the impact of corporate digital maturity, leading to suboptimal investment decisions and resource inefficiencies. This study proposes an integrated framework combining an ARIMAX forecasting model with a multi-constraint optimization approach. We incorporate a quantified Digital Transformation Index (DTI) as an exogenous variable and develop a cost-minimization investment model under constraints including resource gaps, leasing ratios, alert thresholds, and budget limits. Simulation experiments using Alibaba Cloud cluster data demonstrate that the proposed model achieves a CPU load prediction error (MAPE) of less than 5%, with a statistically significant DTI coefficient (p < 0.01). The optimal investment strategy utilized 93.67% of a $2.22 million budget, achieving a leasing ratio below 45% while maintaining a 67% resource utilization safety threshold. We employed Mean Absolute Percentage Error (MAPE) for forecasting accuracy and Net Present Value (NPV) for cost evaluation, selected for their relevance to operational and financial performance in cloud resource management.

1 Introduction

With the rapid advancement of digital transformation, enterprises are actively adopting cloud technologies to transform their production, operations, and management processes. The exponential growth of business data has led to escalating pressure on cloud platforms, pushing resource allocation to the point of saturation. Simultaneously, cloud infrastructure construction requires substantial capital investment. Without scientific planning, unplanned expansion may result in compromised quality and financial waste. Therefore, systematic investigation of cloud resource load forecasting techniques and optimization of investment strategies are critical for enhancing operational efficiency and achieving sustainable development.

Existing studies have explored the impact of digital transformation. For instance, Pan et al. (2025) demonstrated that digital transformation significantly enhances corporate productivity, particularly in state-owned enterprises and the central and eastern regions of China (Pan and Hu, 2025). Xing et al. (2025) revealed varying sensitivities to pricing and data service benefits among manufacturing participants in industrial internet platforms (Xing et al., 2025). Li (2025) emphasized that developing digital management systems is key to gaining competitive advantages in the era of IoT and telematics (Li, 2025). These findings underscore the profound influence of digital transformation on modern enterprises’ operational frameworks.

In resource forecasting, scholars have explored a variety of prediction techniques and models, which are mainly divided into two categories: time series analysis and machine learning. Time series methods, such as the ARIMA model and exponential smoothing, excel at capturing the linear trends and periodic characteristics of cloud resource utilization, making them particularly suitable for scenarios with abundant historical data and stable operational patterns. For example, Calheiros et al. achieved 91% accuracy in predicting web server loads using ARIMA, validating its effectiveness in cloud environments (Calheiros et al., 2015). Mi et al. applied quadratic exponential smoothing to predict the number of user requests, and then estimated virtual machine resource requirements (Mi et al., 2011). Machine learning approaches, such as SVM and LSTM, demonstrate significant advantages in handling large-scale, high-dimensional cloud resource data by automatically extracting complex nonlinear relationships, often yielding higher prediction accuracy. For instance, Gao et al. designed a dynamic resource scheduling scheme based on the ant colony algorithm, optimizing the load balancing and energy consumption management in cloud computing platform (Gao, 2015). Additionally, Zhang et al. leveraged a deep belief network (DBN) for cloud resource demand prediction, improving accuracy through input-output relationship analysis (Zhang et al., 2017).

In the field of investment optimization, Zhang systematically categorized macro-asset allocation theories into five types: (1) return-risk balance, (2) return-only, (3) risk-only, (4) investor utility maximization, and (5) integration of economic cycles with subjective judgment (Zhang and Zhang, 2017). The study further analyzed the characteristics and limitations of each category. From the perspective of financial management and cost-effectiveness, Li compared four equipment allocation models—self-owned procurement, financial leasing, operating leasing, and hybrid leasing—providing actionable insights for enterprises in cost control, risk mitigation, and decision optimization (Li, 2022). Additionally, Liu et al. explored optimal choices among financing leases, operating leases, and outright purchases in fixed-asset allocation by constructing cash flow models (Liu et al., 2010).

Existing studies indicate that corporate digital maturity significantly influences cloud platform development and exhibits a strong correlation with cloud resource demand. However, current literature on cloud resource allocation has predominantly overlooked this linkage, resulting in a critical research gap in cloud infrastructure investment strategies. To bridge this gap, our study incorporates corporate digital maturity as a critical exogenous variable into a cloud resource demand forecasting model. Furthermore, by analyzing business growth trends and existing cloud resource configurations, we comprehensively investigate investment decision optimization during cloud platform upgrades, with a focus on comparing the economic viability of third-party cloud service leasing versus self-built data centers.

1.1 Contributions of this study are threefold:

• Theoretical: We introduce a multidimensional Digital Transformation Index (DTI) as an exogenous variable in cloud resource forecasting, bridging the gap between digital maturity and IT resource planning.

• Methodological: We develop an integrated ARIMAX-predictive optimization framework that combines forecasting with multi-constraint investment decision-making.

• Practical: The model provides enterprises with a scalable, budget-aware investment strategy for cloud platformexpansion, validated through real-world simulation.

2 Related works

The existing body of research on cloud resource management can be broadly categorized into two interconnected streams: (1) predictive modeling of resource demand, and (2) optimization of investment and allocation strategies. A review of recent literature (2020–2025) reveals distinct evolutionary trends and prevailing research gaps in both domains.

2.1 Advancements in cloud resource forecasting

Recent studies in cloud resource forecasting demonstrate a clear paradigm shift from traditional statistical methods towards sophisticated deep learning, hybrid optimization, and privacy-preserving computational frameworks. For instance, Wang et al. proposed a BO-LSTM model that integrates Bayesian optimization with marketing variables to enhance the accuracy of point forecasts (Wang and Chen, 2025). Similarly, Sania Malik et al. developed a hybrid FLNN model (FLGAPSONN) that combines Genetic Algorithm and Particle Swarm Optimization, enabling concurrently prediction of multiple resource metrics (e.g., CPU, memory) and demonstrating superior performance on Google cluster traces (Malik et al., 2022). In the realm of data privacy, Stefanidis et al. designed MulticloudFL, a federated learning framework that supports accurate distributed predictions without centralizing sensitive data (Stefanidis et al., 2023). Other innovations include the use of spiking neural networks (MASNN) by Karpagam et al. to capture temporal symmetries in resource usage, and the DimAug-TimesFM approach by Yang et al., which employs data augmentation to improve the robustness of long-horizon forecasts under conditions of data scarcity (Karpagam and Kanniappan, 2025; Yang et al., 2025).

While these studies represent significant progress in model architecture, optimization algorithms, and learning paradigms, they predominantly focus on technical and data-driven factors, largely overlooking the intrinsic impact of corporate digital maturity—a critical business driver that systematically influences IT resource consumption patterns. A summary of representative forecasting studies is provided in the upper part of Table 1. This oversight establishes a salient research gap, which our study aims to address by introducing a quantified Digital Transformation Index (DTI) as an exogenous predictive variable.

Table 1
www.frontiersin.org

Table 1. Research summary of forecasting and investment models.

2.2 Evolution of investment and resource allocation models

Parallel developments in investment optimization and resource allocation reflect a trend toward multi-objective, predictive, and synergistic decision-making frameworks. Serban and Dedu introduced a Mean-Deviation-Entropy (MDE) model for portfolio optimization, simultaneously balancing return, risk, and diversification—exemplifying the shift from single-to multi-criteria optimization (Serban and Dedu, 2025). Echoing this trend, Nalewaik (2025) emphasized the need in capital project planning to move beyond traditional cost-benefit analysis by integrating social value and resilience through Multi-Criteria Decision-Making (MCDM) methods (Nalewaik, 2025). The integration of forecasting into dynamic allocation is another key trend. Su, for example, utilized GARCH models for financial forecasting and developed dynamic weight allocation algorithms to track the efficient frontier in real-time, establishing a “predict-then-optimize” methodological paradigm (Su, 2020). Furthermore, the concept of synergistic resource configuration has gained traction. Liu, in the context of state-owned capital allocation, highlighted that optimal resource deployment requires the integration of different capital forms and the coordination between incremental and capital, a principle that profoundly informs the hybrid “lease-or-build” model in cloud resource strategy (Liu, 2023). Key contributions in this domain are summarized in the lower part of Table 1.

Despite these theoretical and methodological advances, a significant synthesis is lacking. Specifically, there remains no unified model that seamlessly integrates demand forecasting (e.g., as in Su’s approach), multi-objective trade-offs (e.g., following Serban and Nalewaik’s frameworks), and synergistic resource configuration (e.g., informed by Liu’s concept) to systematically address the core cloud investment dilemma of “leasing versus self-building”.

3 Methods

This study proposes an integrated framework for cloud resource demand forecasting and investment optimization, comprising four main steps:

• Quantify the enterprise’s digital maturity to compute a Digital Transformation Index (DTI).

• Forecast dynamic cloud resource demand using an ARIMAX model with the DTI as an exogenous variable.

• Diagnose real-time resource utilization and trigger optimization alerts against predictive thresholds.

• Formulate and solve a constrained optimization model to determine the cost-minimal investment decision.

The overall workflow is illustrated in Figure 1.

Figure 1
Flowchart outlining a digital transformation process starting with quantifying the enterprise digital transformation index. It then forecasts cloud resource demand, monitors and triggers scaling against thresholds, and asks if criteria are met. If yes, it establishes a minimum-cost decision model considering resource gaps, alert thresholds, leased ratios, and budget limitations, to generate an optimal investment strategy. Ends with the

Figure 1. Model operation flowchart.

3.1 Digital maturity quantification and exogenous variable design for ARIMAX modeling

Enterprise cloud resources exhibit a strong correlation with digital transformation progress within organizations. Critical resource fluctuations—such as in computing power, storage, and bandwidth—are closely tied to advancements in enterprise digital maturity (Zhong, 2018). Furthermore, digital transformation necessitates operational realignment of business processes and significantly reshapes cloud resource allocation strategies and usage patterns.

3.1.1 Digital maturity assessment framework

To assess enterprise digital maturity, this study introduces a four-dimensional evaluation framework, detailed in Tables 2, 3. The framework is structured around four core dimensions—technological application, data capability, business integration, and organizational adaptation—from which 12 quantifiable secondary indicators are derived. The Analytic Hierarchy Process (AHP) and expert scoring methods are applied to assign weights to these sub-indicators, enabling the calculation of a composite Digital Transformation Index (DTI). The DTI is calibrated against an industry benchmark of 100, with values above this threshold indicating superior digital maturity in enterprises.

Table 2
www.frontiersin.org

Table 2. Enterprise digital maturity evaluation indicator system.

Table 3
www.frontiersin.org

Table 3. Primary dimension weight allocation.

3.1.1.1 Indicator selection justification

Each secondary indicator in the DTI framework was selected based on its established linkage to cloud resource demand, as supported by prior IT and digital transformation literature. The justifications are organized by primary dimension:

• Technical Infrastructure (B1): Indicators C1-C3 directly reflect the scale, modernization level, and architectural paradigm of the IT environment. Higher investment in cloud computing (C1), server virtualization (C2), and cloud-native applications (C3) is intrinsically linked to increased and more dynamic consumption of computing, storage, and network resources.

• Data-Driven Capability (B2): Indicators C4-C6 measure the intensity of data utilization. Broader data middle platform coverage (C4), higher real-time data processing volumes (C5), and accelerated data storage growth (C6) are key drivers demanding robust, scalable storage, memory, and computing power.

• Business Digital Integration (B3): Indicators C7-C9 quantify the digitization of core business operations. An increasing online transaction ratio (C7), remote work penetration (C8), and deployment of intelligent systems (C9) generate sustained and variable loads on cloud platforms by translating business activity directly into IT workload.

• Organizational Adaptability (B4): Indicators C10-C12, while having a more indirect influence, capture the enterprise’s capacity for continuous digital innovation. A higher ratio of digitally skilled employees (C10) and agile teams (C11), coupled with mature digital decision-making systems (C12), fosters an environment where new digital initiatives are rapidly developed and deployed, thereby driving evolving and less predictable resource demands over time.

3.1.1.2 AHP-expert scoring method

The weights for both primary and secondary indicators were determined through an integrated AHP-Expert Scoring Method to ensure a rational and consensus-driven weighting scheme. The procedure was conducted as follows:

• Expert Panel Formation: A panel of 15 experts was assembled, consisting of 5 senior IT architects, 5 digital transformation strategists, and 5 enterprise cloud solution managers, each with over 10 years of relevant industry experience.

• Pairwise Comparisons: Each expert performed pairwise comparisons for indicators at the same hierarchical level (e.g., B1 vs. B2; C1 vs. C2 under B1) using the standard Saaty’s 1–9 scale.

• Consistency Verification: The consistency ratio (CR) was computed for each expert’s judgment matrix. Matrices with a CR > 0.1 were considered inconsistent and were returned to the respective expert for reassessment, thereby ensuring the logical reliability of individual inputs.

• Weight Aggregation and Calculation: The validated individual judgment matrices were aggregated into a final group matrix using the geometric mean method. The final weights for each indicator, as shown in Tables 1, 2, were obtained by calculating the principal eigenvector of the aggregated matrix. All resulting weights exhibited a high level of consistency (CR < 0.1), confirming the reliability of the expert judgments and the overall weighting scheme.

3.1.2 Exogenous variable generation for ARIMAX modeling

The composite score for digital transformation maturity is derived using Equation 1:

DTIt=k=14ωkiBkωkixit(1)

where.

DTIt: Composite score of digital transformation maturity at time t.

ωk: Weight of the k-th primary dimension (k = 1,2,3,4).

ωki: Weight of the i-th secondary indicator under the k-th primary dimension.

xit: Normalized score of the i-th indicator at time t, scaled to the interval [0,100].

Bk: Set of secondary indicators belonging to the k-th primary dimension.

For the raw score xit of each specific indicator, Z-score normalization or linear transformation is applied to rescale it into the [0, 100] interval.

3.1.2.1 Calculation example

For instance, consider a hypothetical enterprise at a specific time t. Its normalized scores xit and the predetermined weights are as follows:

B1(Weight ω1 = 0.35): C1 = 70, C2 = 60, C3 = 50.

B2(Weight ω2 = 0.30): C4 = 40, C5 = 55, C6 = 65.

B3(Weight ω3 = 0.25): C7 = 80, C8 = 30, C9 = 20.

B4(Weight ω4 = 0.10): C10 = 50, C11 = 40, C12 = 60.

The DTI score is calculated using Equation 2:

DITt=0.35×0.35×70+0.25×60+0.40×50+0.30×0.30×40+0.35×55+0.35×65+0.25×0.40×80+0.30×30+0.30×20+0.10×0.50×50+0.30×40+0.20×60=54(2)

3.1.2.2 Time-Series Dataset Generation

The above process generates a synchronized time-series dataset, given by Equation 3:

DTIt,t=1,2,...,T(3)

This dataset aligns temporally with cloud resource demand data to support subsequent analytical modeling.

3.2 Dynamic cloud resource demand forecasting with integrated digital maturity metrics

In the process of enterprise digital transformation, changes in cloud resource demand are jointly influenced by historical usage patterns and digital operational capabilities. This study employs an ARIMAX model, incorporating quantified enterprise digital transformation indicators as exogenous variables, to establish an integrated forecasting framework that combines technological development and business needs. Unlike the ARIMA model, which relies solely on historical data, the ARIMAX model integrates the DTI, enabling it to capture cloud resource demand fluctuations in complex scenarios (e.g., traffic peaks and resource auto-scaling) more accurately. This approach significantly enhances the capability of the model to analyze and predict dynamic cloud resource demands.

3.2.1 Basic principles of the model

The ARIMAX model extends the traditional ARIMA framework by incorporating exogenous variables. The general form of an ARIMAX (p, d, q) model is given by Equation 4:

B1Bdyt=θBϵt+m=0Mβmxtm+ϵt(4)

where.

yt denotes cloud resource demand metrics, such as CPU utilization, memory usage, or network traffic.

B and θB represent the autoregressive and moving average polynomials, respectively.

B denotes the backward shift operator.

d is the differencing order applied to achieve stationarity.

ϵt is a white noise sequence.

xt-m corresponds to the quantified enterprise digital transformation level, acting as the exogenous variable in this study.

βm denotes the coefficient for the exogenous variable lagged by m periods, reflecting the direction and magnitude of the impact of digital transformation on cloud resource demand.

3.2.2 Model construction process

3.2.2.1 Step 1 stationarity test

The ARIMA model requires the time series data to be stationary. Typically, the Augmented Dickey-Fuller (ADF) test is employed to assess the stationarity of the cloud resource demand series {yt}. The test begins with the null hypothesis that the series is non-stationary (i.e., it contains a unit root). After computing the test statistic, it is compared against standard critical values. If the null hypothesis is rejected, the series is deemed stationary; otherwise, differencing is applied iteratively until stationarity is achieved. This process determines the optimal differencing order d.

3.2.2.2 Step 2 model order determination

For the stationary series, the Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF) plots were analyzed to tentatively identify the autoregressive order p and moving average order q. The ACF plot helps identify the influence of past values, while the PACF plot helps identify the direct effect of a specific lag. The following standard guidelines were followed:

• A tailoring off ACF and a sharp cutoff in PACF after lag p suggest an AR(p) model.

• A sharp cutoff in ACF after lag q and a tailoring off PACF suggest an MA(q) model.

• If both ACF and PACF tail off, a mixed ARIMA (p, q) model is indicated.

• The Akaike Information Criterion (AIC) was subsequently used to compare models with different (p, q) combinations, and the model with the lowest AIC was selected for its optimal balance of goodness-of-fit and parsimony.

The formulas for calculating ACF and PACF are given in Equations 5, 6:

ACFk=ρk=Covyt,ytkVaryt(5)
PACFk=Covztzt¯,ztkztk¯Varztzt¯Varztkztk¯(6)

3.2.2.3 Step 3 model validation

After the initial model construction, the residual sequence is tested for white noise properties. This is assessed by examining the ACF and PACF plots of the residuals. If the majority of autocorrelation and partial autocorrelation coefficients lie within the confidence intervals, the model is considered to have effectively captured the information of data. Otherwise, the parameters p and q are adjusted for re-fitting. Upon successful validation of the ARIMA (p,d,q) model, the quantified enterprise digital transformation indicators are incorporated as exogenous variables, forming the ARIMAX (p,d,q) model.

3.2.2.4 Step 4 series forecasting

Once a valid ARIMA (p,d,q) model was established, the DTI time series was incorporated as an exogenous variable xt forming the final ARIMAX (p,d,q) model. The model parameters (AR, MA, and exogenous coefficients) were then estimated using the maximum likelihood method. Forecasted values Qforecasti are generated by applying the predefined autoregressive (AR), differencing (I), and moving average (MA) rules.

3.3 Real-time cloud resource utilization diagnosis and optimization triggering

3.3.1 Business-critical threshold specification

The safety threshold for resource utilization is determined using Equation 7, based on business criticality.

ralerti=11+ki(7)

where.

ralerti: Safety threshold for the i-th resource utilization.

ki: Safety buffer coefficient for the i-th resource (0< ki ≤0.5).

For critical services (e.g., real-time transaction processing, customer support), ki is typically set to 0.3–0.5 to ensure stability. For non-critical services (e.g., data archiving, historical analysis), ki is set to 0.1–0.2 to balance cost and risk. For example, setting ki=0.4 establishes a safety threshold ralerti0.71 , implying a 29% resource buffer.

This threshold ralerti triggers scaling actions when monitored utilization exceeds predicted capacity limits.

3.3.2 Predictive threshold activation mechanism

Utilizing the demand forecasts Qforecasti generated by the ARIMAX model (Section 3.2), real-time resource utilization is evaluated against the safety threshold ralerti. The optimization procedures are initiated when the condition specified in Equation 8 is met.

QforecastiQcurrentiralerti(8)

where.

Qforecasti: Predicted demand for the i-th resource.

Qcurrenti: Total provisioned capacity of current devices for the i-th resource.

If the condition is met, meaning the predicted demand ratio exceeds the safety alert threshold, then resource i is determined to face a risk of supply shortage in the future period, triggering an expansion alarm or an architecture optimization requirement for resource i. If the condition is not met, the water level of resource i is safe, and immediate expansion is not required.

3.4 Predictive-driven investment optimization under operational constraints

3.4.1 Objective function

The objective function focuses on minimizing the total cost Ctotal, integrating both leasing and purchasing costs. As shown in Equation 9:

minCtotal=Qleasei×Pleasei+Qpurchasei×Ppurchasei(9)

where.

Qleasei: Leased quantity of the i-th resource.

Pleasei: Unit leasing price of the i-th resource.

Qpurchasei: Purchased quantity of the i-th resource.

Ppurchasei: Unit purchasing price of the i-th resource.

The objective is to minimize the total cost by optimizing the proportion of leased and purchased resources.

3.4.2 Unified constraint framework

To ensure rational and feasible resource allocation, the following constraints are defined:

Constraint 1 Resource Demand Forecast Constraint. As shown in Equation 10:

Lleasei+lpurchaseiXi(10)

where.

Lleasei: Load allocated to leased equipment for the i-th resource.

Lpurchasei: Load allocated to purchased equipment for the i-th resource.

Xi: QforecastiQcurrenti

This constraint ensures that resource allocation meets the minimum demand requirement, preventing service disruptions due to underestimation.

Constraint 2 Resource Supply Constraint (Water Level Alert Line). As shown in Equation 11:

QforecastiQcurrenti+Lleasei+Lpurchaseiralerti(11)

This constraint enforces the safety margin ralerti derived from business-criticality analysis (Section 3.3.1), ensuring buffer resources for demand volatility.

Constraint 3 Natural Number Constraint. As shown in Equation 12:

QleaseiN,QpurchaseiN(12)

Ensures non-negative resource allocation, aligning with practical requirements.

Constraint 4 Cloud Leasing Cap Constraint. As shown in Equation 13:

QleaseiQleasei+Qpurchaseiri(13)

where.

ri: Maximum allowable ratio of leased resources to total resources.

This prevents over-reliance on leasing, which could inflate long-term costs.

Constraint 5 Budget Constraint. As shown in Equation 14:

CtotalCbudget(14)

where.

Cbudget: Total budget for cloud platform construction (Ensures costs remain within the approved budget.).

4 Simulation experiments

4.1 Experimental setup

To evaluate the effectiveness of the proposed model (Section 3), we conducted simulation experiments using the publicly available cluster dataset cluster_trace_v2018 from Alibaba Cloud. The detailed cluster parameters in this dataset are summarized in Table 4.

Table 4
www.frontiersin.org

Table 4. Dataset parameter information.

The dataset encompasses operational data from 4,000 servers, including both online application containers and offline computational tasks. To simplify the experimental process, we used CPU utilization data from two randomly selected containers on one machine to simulate the daily average operational states of containers a and b in Company A’s current infrastructure, resulting in 1,922 data points (Xie and Dong, 2025).

Container a had 2,954 CPU cores, whereas Container b featured 14,378 CPU cores. The CPU usage trends for containers are shown in Figure 2 (Container a) and Figure 3 (Container b).

Figure 2
Line graph depicting CPU utilization percentage over time from January 1 to January 9, 2018. The utilization fluctuates between 20% and 100%, showing multiple peaks and troughs throughout the period.

Figure 2. Time Series of Simulated Daily Average CPU Utilization for Container a.

Figure 3
Line graph showing CPU utilization percentage from January 1, 2018, to January 8, 2018. The utilization fluctuates between 30% and 80%, with several peaks and dips.

Figure 3. Time Series of Simulated Daily Average CPU Utilization for Container b.

As shown in Figures 2, 3, both Containers a and b currently operate in a high-utilization mode. Since they are critical for the enterprise’s real-time business operations, the enterprise must allocate sufficient resource reserves. The safety buffer coefficient is set to k = 0.5, resulting in a water level alert threshold of 0.67 for both containers. This indicates a clear need for capacity expansion. To ensure the continuity of daily operations and mitigate risks from extreme events, simulation-driven investment analysis is conducted to expand the capacity of Containers a and b.

4.2 Implementation framework

4.2.1 Data partitioning and environment

The dataset was partitioned such that the first 80% of the data was used as the training set, and the remaining 20% was allocated to the test set (Li et al., 2025; Liang et al., 2023). We conducted simulation experiments using Python 3.13. The hardware configuration details for this experiment are summarized in Table 5.

Table 5
www.frontiersin.org

Table 5. Experimental environment Configuration.

4.2.2 Digital maturity quantification

The Digital Transformation Index developed in this study is designed to capture the dynamic evolutionary nature of corporate digitalization processes. Since the cluster dataset used in this experiment covers only the period up to early 2018, we extended the evaluation of digital transformation levels from 2018 to 2020 by incorporating enterprises’ historical development paths and external environmental analyses. This method ensures both logical consistency in the research design and realism in the decision-making context. The resulting assessments not only supply exogenous variables for the subsequent ARIMAX model but also offer a contextual foundation that more accurately reflects actual transformation phases for investment decision simulations.

Building on the methods in Section 3.1, Table 6 summarizes the quantitative results of digital transformation maturity and contextual insights for the enterprise.

Table 6
www.frontiersin.org

Table 6. Quantitative results of enterprise digital maturity.

4.2.3 Cloud demand forecasting in digital transformation

4.2.3.1 Model order identification and validation

To determine the optimal parameter combination for the ARIMAX model, we first conducted stationarity tests and correlation analysis on the CPU utilization time series for Containers a and b. Figures 4, 5 present the Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF) for Container a, respectively, while Figures 6, 7 display the corresponding functions for Container b.

Figure 4
Residual ACF plot with 95% confidence intervals. The plot shows autocorrelations at different lags, with most points falling within the confidence band, indicating no significant autocorrelation in residuals.

Figure 4. ACF plot for Container a.

Figure 5
Partial autocorrelation function (PACF) plot with residual values plotted against lags from zero to twenty. Most values fall within the blue shaded 95% confidence interval, indicating no significant autocorrelation.

Figure 5. PACF plot for Container a.

Figure 6
Residual autocorrelation function (ACF) plot displaying residuals with 95% confidence intervals. The plot shows data points primarily within the shaded confidence region, indicating minor autocorrelation except at lag zero where it reaches one.

Figure 6. ACF plot for Container b.

Figure 7
Residual PACF plot with 95 percent confidence intervals shows lag values on the x-axis and correlation values on the y-axis. Most points fall within the shaded confidence band.

Figure 7. PACF plot for Container b.

From the ACF and PACF plots, both containers’ sequences exhibit significant autocorrelation structures. Based on these characteristics and the Akaike Information Criterion (AIC), we identified the optimal ARIMAX model orders as:

Container a: p = 1, d = 0, q = 1.

Container b: p = 1, d = 0, q = 1.

4.2.3.2 Model diagnostic tests

To verify the adequacy of the ARIMAX (1,0,1) model specification, we performed Ljung-Box white noise tests, presented in Tables 7, 8.

Table 7
www.frontiersin.org

Table 7. Ljung-box test results for container a ARIMAX (1,0,1) model.

Table 8
www.frontiersin.org

Table 8. Ljung-Box Test Results for Container b ARIMAX (1,0,1) Model.

Additionally, the residual statistics for both containers are provided in Table 9.

Table 9
www.frontiersin.org

Table 9. Residual statistics for ARIMAX (1,0,1) models.

All Ljung-Box test p-values exceed the 0.05 significance level, indicating that the residual sequences of both models are white noise, thus confirming model adequacy. Although Container a’s residuals show slightly elevated kurtosis (5.23), it remains within acceptable limits.

4.2.3.3 Exogenous variable significance

In the forecasting model for Container a, the exogenous variable DTI exhibited a statistically significant coefficient of 943.1211 (p = 0.001), indicating a strong positive impact on cloud resource demand. A comparable trend was observed in the model for Container b.

4.2.3.4 Forecasting implementation

We employed the ARIMAX model to train and forecast the load time series for Container a and Container b individually. The results of the training are illustrated in Figure 8 (Container a) and Figure 9 (Container b).

Figure 8
Line graph titled

Figure 8. ARIMAX Model Forecast vs. Actual Observed CPU Load for Container a.

Figure 9
Line graph showing CPU core usage prediction from January 1 to January 8, 2018. The blue line represents training data, the orange line indicates true values, and the green dashed line denotes predicted values using the ARIMAX model. CPU usage fluctuates, with significant variations observed throughout the period.

Figure 9. ARIMAX Model Forecast vs. Actual Observed CPU Load for Container b.

To guarantee resource adequacy, the maximum values from the operational load predictions were selected following the model’s application. The top five largest values for Containers a and b, sorted in descending-rank order, are presented in Table 10.

Table 10
www.frontiersin.org

Table 10. Forecast model output values.

4.2.4 Forecasting performance comparison: ARIMAX vs. ARIMA

To validate the predictive accuracy enhancement achieved by the proposed ARIMAX model, a comparative analysis was conducted against the traditional ARIMA model. Utilizing the same training and testing datasets for both containers, we evaluated the forecast performance through error distribution analysis and statistical metrics.

4.2.4.1 Visual error distribution analysis

The forecast error distributions for both models are visually compared in Figures 1013, providing insights into the prediction accuracy across different modeling approaches.

Figure 10
Histogram of forecast error distribution titled

Figure 10. Forecast Error Distribution of the ARIMAX Model (Container a).

Figure 11
Histogram showing forecast error distribution. The x-axis represents forecast error ranging from negative one thousand to one thousand, and the y-axis shows frequency up to sixty. A red dashed line marks the zero error, and a green dashed line indicates a mean error of thirty-three point one eight zero one.

Figure 11. Forecast Error Distribution of the ARIMA Model (Container a).

Figure 12
Histogram depicting forecast error distribution with frequency on the y-axis and forecast error on the x-axis. Bars show a roughly normal distribution centered around zero. Red dashed line represents zero error, and a green dashed line indicates a mean error of 3.3127.

Figure 12. Forecast Error Distribution of the ARIMA Model (Container b).

Figure 13
Histogram showing the distribution of forecast errors with frequencies on the y-axis and forecast errors on the x-axis, ranging from negative three thousand to three thousand. A red dashed line represents zero error, and a green dashed line indicates the mean error at negative 5.4614. The data is approximately normally distributed.

Figure 13. Forecast Error Distribution of the ARIMA Model (Container b).

Container a Performance:

Figure 10 illustrates the error distribution of the ARIMAX model for Container a, showing errors concentrated around zero with minimal dispersion.

Figure 11 displays the corresponding error distribution for the ARIMA model, revealing more scattered errors with greater variance.

Container b Performance:

Figure 12 presents the ARIMAX model’s error distribution for Container b, demonstrating similar concentration around zero.

Figure 13 shows the ARIMA model’s error distribution for Container b, exhibiting comparable dispersion patterns to Container a.

4.2.4.2 Statistical significance

The consistent reduction in mean error across both containers (approximately 54%–55% improvement) provides strong evidence for the superior predictive capability of the ARIMAX framework. This enhancement can be attributed to the incorporation of the Digital Transformation Index as an exogenous variable, which captures the systematic influence of enterprise digital maturity on cloud resource demand patterns.

The concentrated error distribution around zero in the ARIMAX models (Figures 10, 12) indicates reduced bias and variance, confirming the model’s ability to more accurately track actual resource utilization trends compared to the traditional ARIMA approach (Figures 11, 13).

4.2.5 Resource expansion constraints

Given that both Containers a and b support real-time business-critical services (e.g., transaction processing), we set the safety buffer coefficient k = 0.5, consistent with the upper range recommended in Section 3.3.1 for high-criticality workloads. This sets the water level alert threshold at 67% of total capacity. The minimum expansion requirement is derived from the safety threshold condition using Equation 15:

Qcurrenti+Lleasei+LprchaseiQforecastiralerti(15)

For Container a, the minimum scaling requirement is calculated by substituting the relevant parameters into Equation 16:

Lleasei+Lprchasei26630.6729541021(16)

For Container b, the minimum scaling requirement is calculated by substituting the relevant parameters into Equation 17:

Lleasei+Lprchasei99580.6714378485(17)

The expansion is subject to a total budget of $2.22 million and a constraint that leased resources constitute no more than 50% of the total added capacity. This leasing cap balances operational flexibility with long-term cost control. For the economic evaluation, purchased equipment is amortized over a 10-year service life, consistent with the typical useful life of enterprise server hardware, and future costs are discounted at an 8% rate, reflecting the standard cost of capital in the Chinese IT sector (Zhang et al., 2024). The market-quoted CPU leasing prices for Containers a and b are provided in Table 11.

Table 11
www.frontiersin.org

Table 11. Leased CPU Expansion Pricing for Containers a and b.

When expanding Containers a and b via self-built servers, refer to the detailed specifications listed in Table 12 (Container a) and Table 13 (Container b).

Table 12
www.frontiersin.org

Table 12. Container a server setup cost breakdown (per unit).

Table 13
www.frontiersin.org

Table 13. Container b Server Setup Cost Breakdown (Per Unit).

4.2.6 Genetic algorithm optimization implementation

This study employs a Genetic Algorithm (GA) to solve the mixed-integer nonlinear programming problem, primarily based on the following considerations: all (Huang, 2024; Xu et al., 2023) problem variables are integers, aligning with GA’s discrete variable handling characteristics; the constraint conditions include nonlinear relationships, making traditional linear programming methods difficult to apply directly; global search capability is required to avoid converging to local optima; and the problem scale is moderate, suitable for population-based intelligent optimization methods.

4.2.6.1 Parameter configuration

To ensure algorithm convergence and solution quality, the genetic algorithm parameters were configured as shown in Table 14 after multiple experimental trials and debugging:

Table 14
www.frontiersin.org

Table 14. Genetic algorithm parameter Configuration.

4.2.6.2 Constraint handling mechanism

For the nonlinear constraints in the problem, the algorithm employs a penalty function method. The degree of constraint violation is transformed into penalty terms added to the objective function, ensuring the search process consistently moves toward the feasible region. Specific constraints include seven inequality constraints related to production capacity, efficiency, proportional allocation, and budget.

Through the aforementioned parameter configuration and algorithm design, the genetic algorithm can effectively search for the global optimal solution within the complex feasible solution space, providing a reliable theoretical basis and numerical results for subsequent decision analysis.

5 Results

5.1 Optimal resource configuration

The computational results are tabulated in Table 15.

Table 15
www.frontiersin.org

Table 15. Cost-optimal expansion Configuration.

5.2 Container-specific validation

The validation results for each container’s optimal configuration are detailed in Table 16, confirming compliance with all operational constraints.

Table 16
www.frontiersin.org

Table 16. Container-specific Configuration validation.

The validation confirms that both container configurations comply with all operational constraints. The hybrid model for Container a optimally balances cost-efficiency with operational flexibility, whereas the exclusive self-build strategy for Container b is justified by its stable, high-demand profile, favoring long-term cost savings.

5.3 Feasibility assessment

The final feasibility of the proposed investment strategy is summarized as follows:

• Total NPV cost: $2.07 million

• Budget constraint: $2.22 million

• Utilization rate: 93.67%

The genetic algorithm solution achieves a 93.67% budget utilization ($2.07M of the $2.22M ceiling), which demonstrates high cost-efficiency in practice. This near-optimal expenditure indicates that the model successfully identified a configuration that maximizes resource acquisition within the financial limit, while the slight underspend (6.33%) provides a valuable financial buffer for unforeseen contingencies or future scaling needs.

6 Discussion

This study developed an integrated framework for cloud resource forecasting and investment decision-making by incorporating corporate digital maturity, yielding the following core findings:

• Enhanced Forecasting Accuracy: The ARIMAX model, enriched with the DTI as an exogenous variable, demonstrated a significant improvement in predicting cloud resource demand. The comprehensive DTI evaluation system, built upon four primary dimensions and twelve secondary indicators, successfully quantifies digital maturity. Simulation results confirmed that the DTI-embedded model achieves a CPU load peak prediction error of less than 5%, with the DTI coefficient being statistically significant (p < 0.01). This robustly validates a strong correlation between an enterprise’s digital transformation level and its cloud resource consumption.

• Optimized Investment Strategy: The proposed investment model effectively addresses the core “lease-or-build” dilemma under budgetary constraints (Lu, 2024; Wang, 2025). By evaluating the trade-offs between self-built data centers and third-party cloud leasing, the model achieves dual objectives: optimal resource allocation and stringent cost control. The simulation experiment, conducted within a $2.22 million budget ceiling, yielded an optimal configuration of 10 self-built servers and 231 leased units for Container a, and 9 self-built servers for Container b, with a total cost of $2.07 million.

• Comprehensive Framework Integration: The study presents a holistic framework that seamlessly integrates the DTI, predictive analytics, and multi-constraint optimization. The DTI systematically links strategic digital initiatives with IT resource planning. The optimization model, incorporating constraints such as budget, leasing ratios, and safety thresholds, enables a synergistic resource configuration that aligns long-term strategic goals with dynamic operational needs. This integration provides a theoretically sound and operationally viable solution for cloud resource planning in the context of digital transformation.

6.1 Limitations and future work

While this study provides a novel framework, it is subject to several limitations. Firstly, the model was validated using a dataset from a single cloud provider (Alibaba Cloud). The generalizability of the findings to other cloud ecosystems (e.g., AWS, Azure) or enterprise-specific IT environments with unique workload patterns may be limited and warrants further investigation. Secondly, the DTI, though comprehensive, may not capture all facets of digital transformation equally across different industries. Future research should aim to test and calibrate this model with multi-source datasets and explore industry-specific DTI adaptations. Furthermore, the model assumes static pricing and technology, whereas incorporating real-time spot market prices and evolving hardware specifications could enhance its practical utility.

7 Conclusion

This study successfully developed an integrated framework that bridges corporate digital maturity with cloud resource management. By introducing a quantified Digital Transformation Index (DTI) as a key exogenous variable into an ARIMAX forecasting model, we achieved high-precision prediction of cloud resource demand (MAPE <5%). The subsequent optimization model, solving for a cost-minimal investment strategy under multiple operational and financial constraints, efficiently allocated resources, utilizing 93.67% of a $2.22 million budget.

The primary theoretical contribution lies in establishing and validating the critical link between digital maturity and IT resource demand. Methodologically, the seamless “predict-then-optimize” framework demonstrates significant practical utility. It provides enterprises with a scalable, economically viable, and decision-support tool for navigating cloud investment choices during digital transformation. Future work will focus on enhancing the model’s dynamism by incorporating real-time market fluctuations and supply chain factors, thereby increasing its adaptability and robustness for real-world applications.

Data availability statement

The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author.

Author contributions

ZC: Conceptualization, Project administration, Writing – review and editing. YT: Supervision, Writing – review and editing. FW: Methodology, Writing – review and editing. XW: Formal Analysis, Validation, Visualization, Writing – original draft. MX: Data curation, Formal Analysis, Writing – original draft. ZW: Investigation, Writing – review and editing.

Funding

The author(s) declared that financial support was received for this work and/or its publication. This research was funded by Economic and Technology Research Institute of State Grid Fujian Electric Power Company, grant number 52130N24000U.

Conflict of interest

Authors ZC and FW were employed by Digitalization Department, State Grid Fujian Electric Power Company. Author YT was employed by Economic and Technical Research Institute, State Grid Fujian Electric Power Company.

The remaining author(s) declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The authors declared that this work received funding from Economic and Technology Research Institute of State Grid Fujian Electric Power Company. The funder had the following involvement in the study: Supervision, Writing-review and editing.

Generative AI statement

The author(s) declared that generative AI was not used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/frcmn.2025.1732098/full#supplementary-material

References

Calheiros, R. N., Masoumi, E., Ranjan, R., and Buyya, R. (2015). Workload prediction using ARIMA model and its impact on cloud applications' QoS. IEEE Trans. Cloud Comput. 3 (04), 449–458. doi:10.1109/TCC.2014.2350475

CrossRef Full Text | Google Scholar

Gao, C. (2015). Research and implementation of load balancing method in cloud stack platform. [Harbin (China)]: Harbin Institute of Technology. [M.S. thesis].

Google Scholar

Huang, W. (2024). Water resource allocation optimization method based on GM-VMD water demand prediction. Water Resour. Technol. Superv. 10, 165–170.

Google Scholar

Karpagam, T., and Kanniappan, J. (2025). Symmetry-aware multi-dimensional attention spiking neural network with optimization techniques for accurate workload and resource time series prediction in cloud computing systems. Symmetry 17 (3), 383. doi:10.3390/sym17030383

CrossRef Full Text | Google Scholar

Li, Q. (2022). Research on equipment leasing mode selection in engineering management. Port. Eng. Technol. 59 (04), 97–100. doi:10.16403/j.cnki.ggjs20220422

CrossRef Full Text | Google Scholar

Li, J. (2025). Industrial digitalization guiding comprehensive enterprise digital construction framework. China Mark. 12, 191–194. doi:10.13939/j.cnki.zgsc.2025.12.047

CrossRef Full Text | Google Scholar

Li, S., Yu, K., and Chen, Y. (2025). Research on resource usage prediction in high-performance computing platforms based on ARIMA and LSTM. Comput. Sci. 52 (09), 1–11.

Google Scholar

Liang, R., Xie, X., Zhai, Q., and Zhang, Q. (2023). Research on container cloud load prediction based on improved stacking ensemble model. Comput. Appl. Softw. 40 (12), 48–55+100.

Google Scholar

Liu, W. (2023). Research on the optimization path of capital allocation for state-owned capital investment and operation companies. Bus. News (11), 151–154.

Google Scholar

Liu, H., Jing, S., and Liu, T. (2010). Financial decision-making methods for leasing or purchasing fixed assets. Acc. Mon. 16, 15–16. doi:10.19641/j.cnki.42-1290/f.2010.16.007

CrossRef Full Text | Google Scholar

Lu, Y. (2024). Investment strategies and asset allocation analysis in financial markets. Bus. Inf. 10, 83–86.

Google Scholar

Malik, S., Tahir, M., Sardaraz, M., and Alourani, A. (2022). A resource utilization prediction model for cloud data centers using evolutionary algorithms and machine learning techniques. Appl. Sci. 12 (4), 2160. doi:10.3390/app12042160

CrossRef Full Text | Google Scholar

Mi, H., Wang, H., Yin, G., Shi, D., Zhou, Y., and Yuan, L. (2011). A resource on-demand reconfiguration method for virtualized data centers. Softw. 22 (9), 2193–2205. doi:10.3724/sp.j.1001.2011.04056

CrossRef Full Text | Google Scholar

Nalewaik, A. (2025). “A hybrid approach to benefits planning for capital projects,” in Proceedings of the 2025 IEEE European Technology and Engineering Management Summit (E-TEMS) (IEEE), 86–91. doi:10.1109/E-TEMS64751.2025.11239339

CrossRef Full Text | Google Scholar

Pan, H., and Hu, G. (2025). Can enterprises generate new productive forces through digital transformation? An empirical study from the perspective of technological innovation. Technol. Econ. 44 (02), 31–42.

Google Scholar

Serban, F., and Dedu, S. (2025). A scalarized entropy-based model for portfolio optimization: balancing return, risk and diversification. Mathematics 13 (20), 3311. doi:10.3390/math13203311

CrossRef Full Text | Google Scholar

Stefanidis, V.-A., Verginadis, Y., and Mentzas, G. (2023). MulticloudFL: adaptive federated learning for improving forecasting accuracy in multi-cloud environments. Information 14 (12), 662. doi:10.3390/info14120662

CrossRef Full Text | Google Scholar

Su, J. (2020). The implementation of asset allocation approaches: theory and evidence. Sustainability 12 (17), 7162. doi:10.3390/su12177162

CrossRef Full Text | Google Scholar

Wang, Z. (2025). Research on factors influencing investment asset allocation efficiency of Chinese insurance companies. Mark. Wkly. 38 (02), 42–45.

Google Scholar

Wang, Y., and Chen, T. (2025). BO-LSTM-Based cloud resource consumption prediction model. Comput. Eng. De. 46 (5), 1418–1423. doi:10.16208/j.issn1000-7024.2025.05.024

CrossRef Full Text | Google Scholar

Xie, X., and Dong, Y. (2025). A container cloud resource prediction model based on secondary decomposition and broad learning system. Lab. Res. Explor. 44 (03), 94–100.

Google Scholar

Xing, Q., Wu, P., and Deng, F. (2025). Dependency analysis between industrial internet platforms and manufacturing participants under digital transformation. Manag. Eng. 39 (05), 1–18.

Google Scholar

Xu, Y., Hong, Y., He, L., Hong, F., Zhang, Y., Hou, F., et al. (2023). Equity and resource prediction of child health human resources in maternal and child health institutions in Guizhou province. Chin. Health Resour. 26 (05), 582–588. doi:10.13688/j.cnki.chr.2023.230147

CrossRef Full Text | Google Scholar

Yang, X., Zheng, Q., Zhu, X., Luo, M., Hou, Z., Zhang, J., et al. (2025). DimAug-TimesFM: dimension augmentation for long-term cloud demand forecasting in few-shot scenarios. Appl. Sci. 15 (7), 3450. doi:10.3390/app15073450

CrossRef Full Text | Google Scholar

Zhang, X., and Zhang, L. (2017). A review of theoretical research on asset allocation. Econ. Dyn. 2, 137–147.

Google Scholar

Zhang, W., Duan, P., Yang, L. T., Xia, F., Li, Z., Lu, Q., et al. (2017). Resource requests prediction in the cloud computing environment with a deep belief network. Softw. Pract. Exper. 47 (03), 472–488. doi:10.1002/spe.2426

CrossRef Full Text | Google Scholar

Zhang, J., Ouyang, S., Wu, H., Xin, X., and Huang, W. (2024). Optimal configuration of grid-side energy storage considering distribution network reliability and operational economy. Power Syst. Autom. Equip. 44 (07), 62–68+85. doi:10.16081/j.epae.202312044

CrossRef Full Text | Google Scholar

Zhong, X. (2018). Research on task scheduling algorithms in cloud computing environments. [Jiangxi(China)]: Jiangxi University of Science and Technology. [M.S. thesis].

Google Scholar

Keywords: ARIMAX model, cloud resource forecasting, digital transformation, decision support systems, optimization techniques

Citation: Chen Z, Tang Y, Wu F, Wang X, Xia M and Wang Z (2026) Optimization of cloud resource demand forecasting and investment decisions in the context of digital transformation. Front. Commun. Netw. 6:1732098. doi: 10.3389/frcmn.2025.1732098

Received: 25 October 2025; Accepted: 11 December 2025;
Published: 07 January 2026.

Edited by:

Lukman Adewale Ajao, Federal University of Technology Minna, Nigeria

Reviewed by:

Farhan Nisar, Qurtuba University of Science and Information Technology, Pakistan
Sai Bharath Sannareddy, Abbott, United States

Copyright © 2026 Chen, Tang, Wu, Wang, Xia and Wang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Minghui Xia, MjU0OTYyMTkxNUBxcS5jb20=

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.