<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article article-type="research-article" dtd-version="2.3" xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Energy Res.</journal-id>
<journal-title>Frontiers in Energy Research</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Energy Res.</abbrev-journal-title>
<issn pub-type="epub">2296-598X</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">1008216</article-id>
<article-id pub-id-type="doi">10.3389/fenrg.2022.1008216</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Energy Research</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Adaptive forecasting of diverse electrical and heating loads in community integrated energy system based on deep transfer learning</article-title>
<alt-title alt-title-type="left-running-head">Wang et al.</alt-title>
<alt-title alt-title-type="right-running-head">
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.3389/fenrg.2022.1008216">10.3389/fenrg.2022.1008216</ext-link>
</alt-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Wang</surname>
<given-names>Kangsheng</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Yu</surname>
<given-names>Hao</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<uri xlink:href="https://loop.frontiersin.org/people/993080/overview"/>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Song</surname>
<given-names>Guanyu</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<xref ref-type="corresp" rid="c001">&#x2a;</xref>
<uri xlink:href="https://loop.frontiersin.org/people/1117258/overview"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Xu</surname>
<given-names>Jing</given-names>
</name>
<xref ref-type="aff" rid="aff2">
<sup>2</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Li</surname>
<given-names>Juan</given-names>
</name>
<xref ref-type="aff" rid="aff2">
<sup>2</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Li</surname>
<given-names>Peng</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<uri xlink:href="https://loop.frontiersin.org/people/993077/overview"/>
</contrib>
</contrib-group>
<aff id="aff1">
<sup>1</sup>
<institution>Key Laboratory of Smart Grid of Ministry of Education</institution>, <institution>Tianjin University</institution>, <addr-line>Tianjin</addr-line>, <country>China</country>
</aff>
<aff id="aff2">
<sup>2</sup>
<institution>State Grid Tianjin Economic Research Institute</institution>, <addr-line>Tianjin</addr-line>, <country>China</country>
</aff>
<author-notes>
<fn fn-type="edited-by">
<p>
<bold>Edited by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/1056529/overview">Chun Sing Lai</ext-link>, Brunel University London, United Kingdom</p>
</fn>
<fn fn-type="edited-by">
<p>
<bold>Reviewed by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/1284394/overview">Dong Liang</ext-link>, Hebei University of Technology, China</p>
<p>
<ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/1143503/overview">Haoran Zhang</ext-link>, The University of Tokyo, Japan</p>
</fn>
<corresp id="c001">&#x2a;Correspondence: Guanyu Song, <email>gysong@tju.edu.cn</email>
</corresp>
<fn fn-type="other">
<p>This article was submitted to Smart Grids, a section of the journal Frontiers in Energy Research</p>
</fn>
</author-notes>
<pub-date pub-type="epub">
<day>20</day>
<month>09</month>
<year>2022</year>
</pub-date>
<pub-date pub-type="collection">
<year>2022</year>
</pub-date>
<volume>10</volume>
<elocation-id>1008216</elocation-id>
<history>
<date date-type="received">
<day>31</day>
<month>07</month>
<year>2022</year>
</date>
<date date-type="accepted">
<day>29</day>
<month>08</month>
<year>2022</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#xa9; 2022 Wang, Yu, Song, Xu, Li and Li.</copyright-statement>
<copyright-year>2022</copyright-year>
<copyright-holder>Wang, Yu, Song, Xu, Li and Li</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p>
</license>
</permissions>
<abstract>
<p>The economic operation and scheduling of community integrated energy system (CIES) depend on accurate day-ahead multi-energy load forecasting. Considering the high randomness, obvious seasonality, and strong correlations between the multiple energy demands of CIES, this paper proposes an adaptive forecasting method for diverse loads of CIES based on deep transfer learning. First, a one-dimensional convolutional neural network (1DCNN) is formulated to extract hour-level local features, and the long short-term memory network (LSTM) is constructed to extract day-level coarse-grained features. In particular, an attention mechanism module is introduced to focus on critical load features. Second, a hard-sharing mechanism is adopted to learn the mutual coupling relationship between diverse loads, where the weather information is added to the shared layer as an auxiliary. Furthermore, considering the differences in the degree of uncertainty of multiple loads, dynamic weights are assigned to different tasks to facilitate their simultaneous optimization during training. Finally, a deep transfer learning strategy is constructed in the forecasting model to guarantee its adaptivity in various scenarios, where the maximum mean discrepancy (MMD) is used to measure the gradual deviation of the load properties and the external environment. Simulation experiments on two practical CIES cases show that compared with the four benchmark models, the electrical and heating load forecasting accuracy (measured by MAPE) increased by at least 4.99 and 18.22%, respectively.</p>
</abstract>
<kwd-group>
<kwd>community integrated energy system (CIES)</kwd>
<kwd>load forecasting</kwd>
<kwd>multi-task learning (MTL)</kwd>
<kwd>deep transfer learning</kwd>
<kwd>maximum mean discrepancy (MMD)</kwd>
<kwd>uncertainty</kwd>
</kwd-group>
<contract-sponsor id="cn001">National Natural Science Foundation of China<named-content content-type="fundref-id">10.13039/501100001809</named-content>
</contract-sponsor>
<contract-sponsor id="cn002">State Grid Tianjin Electric Power Company<named-content content-type="fundref-id">10.13039/501100015246</named-content>
</contract-sponsor>
</article-meta>
</front>
<body>
<sec id="s1">
<title>1 Introduction</title>
<p>Integrated energy system (IES) (<xref ref-type="bibr" rid="B2">Cheng et al., 2018</xref>) is recognised as a potential solution for reducing carbon emissions and improving energy utilisation efficiency (<xref ref-type="bibr" rid="B21">Quelhas et al., 2007</xref>). In contrast to conventional independent energy systems, IES is dedicated to the integration of various energy carriers such as electricity, gas, heat, and cooling, as well as different energy technologies such as distributed generation and energy storage (<xref ref-type="bibr" rid="B33">Yan et al., 2021</xref>). Community integrated energy system (CIES) involves the implementation of the IES concept near the demand side. The CIES facilitates the synergy of different energy carriers, obtains higher operational flexibility, and achieves better economic and environmental performance in the simultaneous supply of various energy forms (<xref ref-type="bibr" rid="B6">Gianfranco et al., 2020</xref>). Owing to these advantages, the CIES plays an important role in the development of the IES and has been put into practice in many countries.</p>
<p>Fluctuation of loads in a CIES is a critical factor that deteriorates operational performance and increases security risks, making load forecasting technologies indispensable in the planning and operation of modern CIES (<xref ref-type="bibr" rid="B32">Wang et al., 2021</xref>; <xref ref-type="bibr" rid="B46">Yu et al., 2022</xref>). Generally, load forecasting methods focus on different timescales. Short-term load forecasting (typically day-ahead forecasting) (<xref ref-type="bibr" rid="B3">Daniel et al., 2022</xref>) is most commonly used in the operation of CIES for the optimization of scheduling plans (<xref ref-type="bibr" rid="B12">Liu, 2020</xref>; <xref ref-type="bibr" rid="B20">Qin et al., 2020</xref>). It is also the basis for a CIES to determine future optimal strategies for demand response (<xref ref-type="bibr" rid="B15">Lyon et al., 2015</xref>; <xref ref-type="bibr" rid="B16">Ming et al., 2020</xref>), energy trading (<xref ref-type="bibr" rid="B5">Fu et al., 2021</xref>), and system maintenance (<xref ref-type="bibr" rid="B10">Kuster et al., 2017</xref>). As the granularity of these tasks becomes more refined, the requirement for accurate load forecasting is also promoted, motivating extensive studies on novel load forecasting theories and methods.</p>
<p>Load forecasting methods mainly fall into two categories: the statistical methods such as regression analysis (<xref ref-type="bibr" rid="B1">Bracale et al., 2020</xref>) and autoregressive integrated moving average (ARIMA) (<xref ref-type="bibr" rid="B13">L&#xf3;pez et al., 2019</xref>), and the machine learning methods such as artificial neural networks (Wang et al., 2018), support vector machine (SVM) (<xref ref-type="bibr" rid="B30">Wang et al., 2016</xref>), and extreme learning machine (ELM) (<xref ref-type="bibr" rid="B23">Sachin et al., 2018</xref>). Deep learning (<xref ref-type="bibr" rid="B11">Le et al., 2015</xref>) is a new type of machine learning method, which has gained popularity in load forecasting in recent years because of its superior learning ability, adaptability, and portability. For example, the electrical loads of 42 resident users (<xref ref-type="bibr" rid="B34">Yang et al., 2021</xref>) were forecasted, where it was demonstrated that deep learning has a higher accuracy than back propagation (BP) neural network and extreme gradient boosting (XGBoost) method. A novel evolutionary-based deep convolutional neural network (CNN) model (<xref ref-type="bibr" rid="B7">Jalali et al., 2021</xref>) was proposed for intelligent load forecasting, which mainly solved the problem of finding the optimal hyperparameters of the CNN efficiently. A novel pooling-based deep recurrent neural network (RNN) (<xref ref-type="bibr" rid="B24">Shi et al., 2018</xref>) was proposed, which batches a group of customer load profiles into a pool of inputs, and addresses the overfitting problem by increasing data diversity and volume. A deep belief network (DBN) was improved from three aspects (<xref ref-type="bibr" rid="B9">Kong et al., 2020</xref>), including input data, model, and performance, to consider demand-side management (DSM) in electrical load forecasting. Variational mode decomposition (VMD) and stacking model were employed to forecast short-term electrical loads (<xref ref-type="bibr" rid="B41">Zhang et al., 2022</xref>). These studies have demonstrated the applicability and effectiveness of deep learning methods in the load forecasting of energy systems.</p>
<p>However, load forecasting in a CIES is quite different from these existing studies, that mainly focus on aggregated load forecasting at the system level (<xref ref-type="bibr" rid="B37">Yu and Li, 2021</xref>). There are two new challenges need to be addressed. First, the variation and uncertainty in the diverse loads of a CIES are intensified. This is due to the smaller system scale of a CIES, as well as the coupling of different energy forms that enhance the propagation of uncertainties (<xref ref-type="bibr" rid="B45">Li et al., 2022</xref>). The interchangeability between different energy consumptions of users, which is enabled by flexible energy conversion equipment, would also complicate the characteristics of the load profiles.</p>
<p>Second, it is challenging to maintain the adaptivity of the forecasting model during long-term operation of a CIES. The load diversity in a CIES is generally reduced because of its specific functions such as commercial, residential, industrial, and educational. Under these conditions, the effects of long-term factors, such as changes in seasons, energy consumption habits, total loads, and system configurations, are magnified. For example, the characteristics of the load profile usually differ during the summer, winter, and seasonal transition periods. The gradual evolution of demand restricts the continuable applicability of a single model in the load forecasting of a practical CIES. It is also difficult to train a unified model that is suitable for all scenarios because there is no guarantee that the training data over a long period share the same distribution.</p>
<p>A feasible solution to deal with the uncertainty in the load forecasting of a CIES is to utilize the correlations between multiple energy demands, and perform joint forecasting. For example, a multi-energy forecasting framework based on deep belief network was designed for the short-term load forecasting of integrated energy systems, in which the correlation among electrical, gas, and heating loads were considered (<xref ref-type="bibr" rid="B44">Zhou et al., 2020</xref>). A hybrid network based on CNN and gated recurrent unit (GRU) was proposed for the multi-energy load forecasting of the main campus of the University of Texas at Austin (<xref ref-type="bibr" rid="B29">Wang et al., 2020a</xref>). A CNN-Sequence to Sequence (Seq2Seq) model was developed to consider temperature, humidity, wind speed, and the coupling relationship of multiple energy carriers in the hour-ahead load forecasting (<xref ref-type="bibr" rid="B39">Zhang et al., 2021</xref>). Long short-term memory (LSTM) and the coupling characteristic matrix of multiple types of loads were employed to extract the inherent features of loads and improve forecasting accuracy (<xref ref-type="bibr" rid="B31">Wang et al., 2020b</xref>). Multi-task learning (MTL) is also widely used as a basic framework for joint load forecasting, because it improves the cognition ability of different tasks by utilizing shared layers (<xref ref-type="bibr" rid="B42">Zhang and Yang, 2018</xref>). This framework was employed in similar studies for joint forecasting of electrical, heating, cooling, and gas loads (<xref ref-type="bibr" rid="B26">Tan et al., 2019</xref>; <xref ref-type="bibr" rid="B40">Zhang et al., 2020</xref>). Overall, for correlated load forecasting, MTL can learn the intrinsic relationships between different types of loads and usually achieves better performance than single-task approaches. However, differences in the degree of uncertainty of various loads may hinder the simultaneous optimization of multiple tasks, which remains a problem.</p>
<p>For the adaptivity of forecasting models, the transfer learning method can be considered a potential solution (<xref ref-type="bibr" rid="B18">Pinto et al., 2022</xref>). Existing studies on transfer learning in load forecasting primarily address the problem of insufficient training samples by learning from other similar scenarios. For example, in (<xref ref-type="bibr" rid="B14">Lu et al., 2022</xref>), transfer learning was utilized to solve the problem of insufficient historical load data samples when smart meters have just been deployed for a short time. The historical data of similar buildings were utilized to establish a regression model for the energy consumption forecasting of different schools (<xref ref-type="bibr" rid="B22">Ribeiro et al., 2018</xref>). Transfer learning was introduced into the short-term forecasting of the cooling and heating loads of buildings based on the knowledge learned from typical load models (<xref ref-type="bibr" rid="B19">Qian et al., 2020</xref>). Different transfer learning strategies were compared for different scenarios (building types or sample sizes) in short-term forecasting of building power consumption (<xref ref-type="bibr" rid="B4">Fan et al., 2020</xref>). In summary, transfer learning facilitates the sharing of common features in similar learning tasks, and can be expected to solve the problem of load data expiration in a CIES caused by gradual changes over time, such as seasonal transitions.</p>
<p>In this study, a multi-task deep transfer learning method with an online rolling mechanism is employed to address the challenges in the load forecasting of CIES, which enables the joint day-ahead forecasting of electrical and heating loads while dynamically adapting to the varying load properties. The main contributions of this study are summarised as follows:</p>
<p>1) A novel framework is established for day-ahead forecasting of electrical and heating loads in a CIES. CNN and LSTM are employed to extract the features of the loads at different time scales separately. Subsequently, an attention mechanism is designed to determine the key features and track them in the forecasting results. Day-ahead weather forecasting information is considered through a shared layer to further improve accuracy.</p>
<p>2) A novel loss function is applied to improve the training performance of the forecasting model. In this loss function, different weights are assigned to the learning tasks of the electrical and heating loads. These weights are dynamically adjusted in the training process based on the difference in the degree of uncertainty of different types of loads, which balances the convergence speed of multiple learning tasks and facilitates their simultaneous optimization in training.</p>
<p>3) A deep transfer learning strategy is constructed in the forecasting model to guarantee its adaptivity in various scenarios. The maximum mean discrepancy (MMD) is used to measure the gradual deviation of the load properties and the external environment. Then, different transfer learning strategies are adopted according to the range of the MMD, which enables the forecasting model to rapidly capture the new features of the CIES.</p>
<p>The remainder of this paper is organized as follows. <xref ref-type="sec" rid="s2">Section 2</xref> describes the overall forecasting model, including its architecture and loss function. <xref ref-type="sec" rid="s3">Section 3</xref> details the transfer learning strategy, and summarises the entire application process. Case studies are presented in <xref ref-type="sec" rid="s4">Section 4</xref> to verify the effectiveness of the proposed method by conducting simulations using two typical cases. Finally, <xref ref-type="sec" rid="s5">Section 5</xref> concludes the paper.</p>
</sec>
<sec id="s2">
<title>2 Multi-task learning for diverse load forecasting</title>
<sec id="s2-1">
<title>2.1 Architecture of the proposed multi-task learning model</title>
<p>As shown in <xref ref-type="fig" rid="F1">Figure 1</xref>, the architecture of the proposed forecasting model can be divided into four levels. In Level-1, the multisource inputs are normalized to reduce the computational complexity and accelerate the model convergence. In Level-2, a combination of CNN, LSTM, and the attention module is employed for electrical and heating loads to extract the features at different time granularities. At this level, because weather data do not contain temporal characteristics, we directly extract weather data features through a fully connected (FC) layer. In Level-3, the features of the loads and weather data are fused together using a shared layer. Finally, in Level-4, a hard-sharing mechanism is realized using two separate FC layers with identical topologies for electrical and heating loads, through which the normalized forecasted values are simultaneously output. The forecasting results are then obtained after an inverse normalization process. Since the features have been sufficiently extracted, the output can be learned from the features of the shared layer by a simple mapping. Therefore, in this paper, the number of fully connected layers from the shared layer to the output layer is set to 1. The configurations of CNN, LSTM, and the attention mechanism are detailed in the following sections.</p>
<fig id="F1" position="float">
<label>FIGURE 1</label>
<caption>
<p>Architecture of the forecasting model.</p>
</caption>
<graphic xlink:href="fenrg-10-1008216-g001.tif"/>
</fig>
<sec id="s2-1-1">
<title>2.1.1 One-dimensional convolutional neural network</title>
<p>A CNN is used to extract the fine-grained features of the loads. In this study, the input load data to the CNN is represented by time-series data. Therefore, a one-dimensional convolutional neural network (1DCNN) is adopted in the proposed model, in which the convolution operations are performed in only one dimension. The shape of a single sample input to the convolutional layer is expressed as <inline-formula id="inf1">
<mml:math id="m1">
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:mi mathvariant="normal">t</mml:mi>
<mml:mi mathvariant="normal">i</mml:mi>
<mml:mi mathvariant="normal">m</mml:mi>
<mml:mi mathvariant="normal">e</mml:mi>
<mml:mo>_</mml:mo>
<mml:mi mathvariant="normal">s</mml:mi>
<mml:mi mathvariant="normal">t</mml:mi>
<mml:mi mathvariant="normal">e</mml:mi>
<mml:mi mathvariant="normal">p</mml:mi>
<mml:mi mathvariant="normal">s</mml:mi>
<mml:mo>,</mml:mo>
<mml:mtext>&#x2009;</mml:mtext>
<mml:mi mathvariant="normal">d</mml:mi>
<mml:mi mathvariant="normal">i</mml:mi>
<mml:mi mathvariant="normal">m</mml:mi>
<mml:mi mathvariant="normal">e</mml:mi>
<mml:mi mathvariant="normal">n</mml:mi>
<mml:mi mathvariant="normal">s</mml:mi>
<mml:mi mathvariant="normal">i</mml:mi>
<mml:mi mathvariant="normal">o</mml:mi>
<mml:mi mathvariant="normal">n</mml:mi>
<mml:mi mathvariant="normal">s</mml:mi>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula>, where <inline-formula id="inf2">
<mml:math id="m2">
<mml:mrow>
<mml:mi mathvariant="normal">t</mml:mi>
<mml:mi mathvariant="normal">i</mml:mi>
<mml:mi mathvariant="normal">m</mml:mi>
<mml:mi mathvariant="normal">e</mml:mi>
<mml:mo>_</mml:mo>
<mml:mi mathvariant="normal">s</mml:mi>
<mml:mi mathvariant="normal">t</mml:mi>
<mml:mi mathvariant="normal">e</mml:mi>
<mml:mi mathvariant="normal">p</mml:mi>
<mml:mi mathvariant="normal">s</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>represents a given number of days before the forecasting day, and <inline-formula id="inf3">
<mml:math id="m3">
<mml:mrow>
<mml:mi mathvariant="normal">d</mml:mi>
<mml:mi mathvariant="normal">i</mml:mi>
<mml:mi mathvariant="normal">m</mml:mi>
<mml:mi mathvariant="normal">e</mml:mi>
<mml:mi mathvariant="normal">n</mml:mi>
<mml:mi mathvariant="normal">s</mml:mi>
<mml:mi mathvariant="normal">i</mml:mi>
<mml:mi mathvariant="normal">o</mml:mi>
<mml:mi mathvariant="normal">n</mml:mi>
<mml:mi mathvariant="normal">s</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> are determined by the time granularity of forecasting. In this study, we set <inline-formula id="inf4">
<mml:math id="m4">
<mml:mrow>
<mml:mi mathvariant="normal">t</mml:mi>
<mml:mi mathvariant="normal">i</mml:mi>
<mml:mi mathvariant="normal">m</mml:mi>
<mml:mi mathvariant="normal">e</mml:mi>
<mml:mo>_</mml:mo>
<mml:mi mathvariant="normal">s</mml:mi>
<mml:mi mathvariant="normal">t</mml:mi>
<mml:mi mathvariant="normal">e</mml:mi>
<mml:mi mathvariant="normal">p</mml:mi>
<mml:mi mathvariant="normal">s</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> as 7 considering the similarity in load patterns for each week, and <inline-formula id="inf5">
<mml:math id="m5">
<mml:mrow>
<mml:mi mathvariant="normal">d</mml:mi>
<mml:mi mathvariant="normal">i</mml:mi>
<mml:mi mathvariant="normal">m</mml:mi>
<mml:mi mathvariant="normal">e</mml:mi>
<mml:mi mathvariant="normal">n</mml:mi>
<mml:mi mathvariant="normal">s</mml:mi>
<mml:mi mathvariant="normal">i</mml:mi>
<mml:mi mathvariant="normal">o</mml:mi>
<mml:mi mathvariant="normal">n</mml:mi>
<mml:mi mathvariant="normal">s</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> equals to 24 to capture the hourly variation features of loads within a day.</p>
<p>The structure of the 1DCNN is shown in <xref ref-type="fig" rid="F2">Figure 2</xref>. The convolution kernel is convolved with the input data and then summed with the corresponding bias to obtain the result of this operation. All input data are traversed according to the given step information. This process is repeated for multiple convolution kernels to obtain the final matrix, that is, the features extracted by the convolution layer. The convolution calculation process is shown in <xref ref-type="disp-formula" rid="e1">Eq. 1</xref>:<disp-formula id="e1">
<mml:math id="m6">
<mml:mrow>
<mml:msubsup>
<mml:mi>z</mml:mi>
<mml:mi>l</mml:mi>
<mml:mi>j</mml:mi>
</mml:msubsup>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:munderover>
<mml:mstyle displaystyle="true">
<mml:mo>&#x2211;</mml:mo>
</mml:mstyle>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>,</mml:mo>
<mml:mi>p</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>T</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>p</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>T</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:munderover>
<mml:mrow>
<mml:msubsup>
<mml:mi mathvariant="bold-italic">w</mml:mi>
<mml:mi>l</mml:mi>
<mml:mi>i</mml:mi>
</mml:msubsup>
<mml:mo>&#x22c5;</mml:mo>
<mml:msubsup>
<mml:mi mathvariant="bold-italic">x</mml:mi>
<mml:mi>l</mml:mi>
<mml:mi>p</mml:mi>
</mml:msubsup>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi>b</mml:mi>
<mml:mi>l</mml:mi>
</mml:msub>
</mml:mrow>
</mml:mrow>
</mml:mrow>
</mml:math>
<label>(1)</label>
</disp-formula>where <inline-formula id="inf6">
<mml:math id="m7">
<mml:mrow>
<mml:mi>T</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> is the given step information; <inline-formula id="inf7">
<mml:math id="m8">
<mml:mrow>
<mml:msubsup>
<mml:mi mathvariant="bold-italic">x</mml:mi>
<mml:mi>l</mml:mi>
<mml:mi>p</mml:mi>
</mml:msubsup>
<mml:mo>&#x2208;</mml:mo>
<mml:msup>
<mml:mi mathvariant="double-struck">R</mml:mi>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>&#xd7;</mml:mo>
<mml:mi>d</mml:mi>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula> is the input vector at time step <inline-formula id="inf8">
<mml:math id="m9">
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> for the convolution operation with the <inline-formula id="inf9">
<mml:math id="m10">
<mml:mrow>
<mml:mi>l</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>th convolution kernel, <inline-formula id="inf10">
<mml:math id="m11">
<mml:mrow>
<mml:mi>d</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> is dimensions of each time step; <inline-formula id="inf11">
<mml:math id="m12">
<mml:mrow>
<mml:msubsup>
<mml:mi mathvariant="bold-italic">w</mml:mi>
<mml:mi>l</mml:mi>
<mml:mi>i</mml:mi>
</mml:msubsup>
<mml:mo>&#x2208;</mml:mo>
<mml:msup>
<mml:mi mathvariant="double-struck">R</mml:mi>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>&#xd7;</mml:mo>
<mml:mi>d</mml:mi>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula> is the <inline-formula id="inf12">
<mml:math id="m13">
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>th weight parameter vector of the <inline-formula id="inf13">
<mml:math id="m14">
<mml:mrow>
<mml:mi>l</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>th convolution kernel; &#x201c;<inline-formula id="inf14">
<mml:math id="m15">
<mml:mrow>
<mml:mo>&#x22c5;</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula>&#x201d; denotes the dot product operation; <inline-formula id="inf15">
<mml:math id="m16">
<mml:mrow>
<mml:msub>
<mml:mi>b</mml:mi>
<mml:mi>l</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is the corresponding bias parameter; and <inline-formula id="inf16">
<mml:math id="m17">
<mml:mrow>
<mml:msubsup>
<mml:mi>z</mml:mi>
<mml:mi>l</mml:mi>
<mml:mi>j</mml:mi>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula> is the <inline-formula id="inf17">
<mml:math id="m18">
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>th output result of the <inline-formula id="inf18">
<mml:math id="m19">
<mml:mrow>
<mml:mi>l</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>th convolution operation.</p>
<fig id="F2" position="float">
<label>FIGURE 2</label>
<caption>
<p>Structure of 1DCNN.</p>
</caption>
<graphic xlink:href="fenrg-10-1008216-g002.tif"/>
</fig>
<p>Because the 1DCNN is intended to extract hourly local features of loads within a day, we set <inline-formula id="inf19">
<mml:math id="m20">
<mml:mrow>
<mml:mi>T</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> as 1. A small number should be selected for the convolutional layers to avoid the impact of convolution operations on the original features. In addition, we choose the rectified linear unit (ReLU) as the activation function to avoid exploding gradients (<xref ref-type="bibr" rid="B29">Wang et al., 2020a</xref>). The numbers of network layers and convolution kernels are hyperparameters that need to be tuned in the 1DCNN.</p>
</sec>
<sec id="s2-1-2">
<title>2.1.2 Long short-term memory network</title>
<p>The LSTM takes the output of the CNN and is used to extract coarse-grained load features. In other words, it attempts to further learn how loads vary from day to day. The shape of a single-sample input to the LSTM is also <inline-formula id="inf20">
<mml:math id="m21">
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:mi mathvariant="normal">t</mml:mi>
<mml:mi mathvariant="normal">i</mml:mi>
<mml:mi mathvariant="normal">m</mml:mi>
<mml:mi mathvariant="normal">e</mml:mi>
<mml:mo>_</mml:mo>
<mml:mi mathvariant="normal">s</mml:mi>
<mml:mi mathvariant="normal">t</mml:mi>
<mml:mi mathvariant="normal">e</mml:mi>
<mml:mi mathvariant="normal">p</mml:mi>
<mml:mi mathvariant="normal">s</mml:mi>
<mml:mo>,</mml:mo>
<mml:mtext>&#x2009;</mml:mtext>
<mml:mi mathvariant="normal">d</mml:mi>
<mml:mi mathvariant="normal">i</mml:mi>
<mml:mi mathvariant="normal">m</mml:mi>
<mml:mi mathvariant="normal">e</mml:mi>
<mml:mi mathvariant="normal">n</mml:mi>
<mml:mi mathvariant="normal">s</mml:mi>
<mml:mi mathvariant="normal">i</mml:mi>
<mml:mi mathvariant="normal">o</mml:mi>
<mml:mi mathvariant="normal">n</mml:mi>
<mml:mi mathvariant="normal">s</mml:mi>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula>, which is the same as the output of the 1DCNN.</p>
<p>The input of the LSTM cell at the current moment includes the input of the current moment (<inline-formula id="inf21">
<mml:math id="m22">
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold-italic">x</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>), hidden state of the previous moment (<inline-formula id="inf22">
<mml:math id="m23">
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold-italic">h</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>), and cell state (<inline-formula id="inf23">
<mml:math id="m24">
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold-italic">c</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>). <inline-formula id="inf24">
<mml:math id="m25">
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold-italic">h</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf25">
<mml:math id="m26">
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold-italic">c</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> are reserved for the input at the next moment. These input data are processed <italic>via</italic> three types of gates, as shown in <xref ref-type="fig" rid="F3">Figure 3</xref>: the forgetting gate, input gate, and output gate.</p>
<fig id="F3" position="float">
<label>FIGURE 3</label>
<caption>
<p>Structure of LSTM.</p>
</caption>
<graphic xlink:href="fenrg-10-1008216-g003.tif"/>
</fig>
<p>The equation for the forgetting gate is expressed as:<disp-formula id="e2">
<mml:math id="m27">
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold-italic">f</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mi mathvariant="bold">&#x3c3;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold-italic">W</mml:mi>
<mml:mi>f</mml:mi>
</mml:msub>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold-italic">h</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi mathvariant="bold-italic">x</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi mathvariant="bold-italic">b</mml:mi>
<mml:mi>f</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
<label>(2)</label>
</disp-formula>where <inline-formula id="inf26">
<mml:math id="m28">
<mml:mrow>
<mml:mi mathvariant="bold">&#x3c3;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>is the sigmoid function. The output range of the sigmoid function is [0,1]; therefore, <inline-formula id="inf27">
<mml:math id="m29">
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold-italic">f</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> represents the probability of forgetting the cell state at time step <inline-formula id="inf28">
<mml:math id="m30">
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn mathvariant="italic">1</mml:mn>
<mml:mo>.</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> <inline-formula id="inf29">
<mml:math id="m31">
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold-italic">W</mml:mi>
<mml:mi>f</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf30">
<mml:math id="m32">
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold-italic">b</mml:mi>
<mml:mi>f</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> are the weights and biases of the forgetting gate, respectively.</p>
<p>The equation of the input gate is expressed as <xref ref-type="disp-formula" rid="e3">Eqs. 3</xref>, <xref ref-type="disp-formula" rid="e4">4</xref>:<disp-formula id="e3">
<mml:math id="m33">
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold-italic">i</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mi mathvariant="bold">&#x3c3;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold-italic">W</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold-italic">h</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi mathvariant="bold-italic">x</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi mathvariant="bold-italic">b</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
<label>(3)</label>
</disp-formula>
<disp-formula id="e4">
<mml:math id="m34">
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold-italic">a</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mi mathvariant="bold">tanh</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold-italic">W</mml:mi>
<mml:mi mathvariant="italic">tan</mml:mi>
</mml:msub>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold-italic">h</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi mathvariant="bold-italic">x</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi mathvariant="bold-italic">b</mml:mi>
<mml:mi mathvariant="italic">tan</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
<label>(4)</label>
</disp-formula>where <inline-formula id="inf31">
<mml:math id="m35">
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold-italic">W</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf32">
<mml:math id="m36">
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold-italic">b</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> are the weights and biases of the input gate, <inline-formula id="inf33">
<mml:math id="m37">
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold-italic">W</mml:mi>
<mml:mi mathvariant="italic">tan</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf34">
<mml:math id="m38">
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold-italic">b</mml:mi>
<mml:mi mathvariant="italic">tan</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> are the weights and biases of the tanh layer. <inline-formula id="inf35">
<mml:math id="m39">
<mml:mrow>
<mml:mi mathvariant="bold">tanh</mml:mi>
<mml:mo>&#x2061;</mml:mo>
<mml:mo>&#x2061;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> denotes the activation function.</p>
<p>The update equation of the cell state is expressed as:<disp-formula id="e5">
<mml:math id="m40">
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold-italic">c</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mi mathvariant="bold-italic">c</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2609;</mml:mo>
<mml:msub>
<mml:mi mathvariant="bold-italic">f</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi mathvariant="bold-italic">i</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x2609;</mml:mo>
<mml:msub>
<mml:mi mathvariant="bold-italic">a</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
<label>(5)</label>
</disp-formula>where <inline-formula id="inf36">
<mml:math id="m41">
<mml:mrow>
<mml:mo>&#x2609;</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> denotes the Hadamard product. <inline-formula id="inf37">
<mml:math id="m42">
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold-italic">c</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2609;</mml:mo>
<mml:msub>
<mml:mi mathvariant="bold-italic">f</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> determines whether to retain the original cell state at time step <inline-formula id="inf38">
<mml:math id="m43">
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn mathvariant="italic">1</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>, which represents the effect of the cell state at time step <inline-formula id="inf39">
<mml:math id="m44">
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn mathvariant="italic">1</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> on the cell state at time step <inline-formula id="inf40">
<mml:math id="m45">
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>. <inline-formula id="inf41">
<mml:math id="m46">
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold-italic">i</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x2609;</mml:mo>
<mml:msub>
<mml:mi mathvariant="bold-italic">a</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> determines whether to update the cell state at time step <inline-formula id="inf42">
<mml:math id="m47">
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>, which represents the effect of the load at time step <inline-formula id="inf43">
<mml:math id="m48">
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> on the cell state at time step <inline-formula id="inf44">
<mml:math id="m49">
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>.</p>
<p>The equation of the output gate is expressed as <xref ref-type="disp-formula" rid="e6">Eqs. 6</xref>, <xref ref-type="disp-formula" rid="e7">7</xref>:<disp-formula id="e6">
<mml:math id="m50">
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold-italic">o</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mi mathvariant="bold">&#x3c3;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold-italic">W</mml:mi>
<mml:mi>o</mml:mi>
</mml:msub>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold-italic">h</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi mathvariant="bold-italic">x</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi mathvariant="bold-italic">b</mml:mi>
<mml:mi>o</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
<label>(6)</label>
</disp-formula>
<disp-formula id="e7">
<mml:math id="m51">
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold-italic">h</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mi mathvariant="bold-italic">o</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
<mml:mo>&#x2609;</mml:mo>
<mml:mi mathvariant="bold">tanh</mml:mi>
<mml:mo>&#x2061;</mml:mo>
<mml:mo>&#x2061;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold-italic">c</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
<label>(7)</label>
</disp-formula>where <inline-formula id="inf45">
<mml:math id="m52">
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold-italic">W</mml:mi>
<mml:mi>o</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf46">
<mml:math id="m53">
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold-italic">b</mml:mi>
<mml:mi>o</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> are the weights and biases of the output gate, respectively.</p>
<p>The hyperparameters of LSTM include the number of network layers and the number of neurons in the hidden layer. The activation function also uses ReLU.</p>
</sec>
<sec id="s2-1-3">
<title>2.1.3 Attention mechanism module</title>
<p>The attention mechanism module is used to capture the temporal long-term dependencies in the load sequence (<xref ref-type="bibr" rid="B38">Zang et al., 2021</xref>). The core idea of the attention mechanism is to allocate more attention to important information and less attention to other information, thereby achieving the purpose of focusing on a specific region. In this study, the attention mechanism module is used to focus on historical key load features. The input of the attention mechanism module is the output vector <inline-formula id="inf47">
<mml:math id="m54">
<mml:mrow>
<mml:mi mathvariant="bold-italic">h</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> processed by the LSTM activation layer. The structure of the attention mechanism module is shown in <xref ref-type="fig" rid="F4">Figure 4</xref>.</p>
<fig id="F4" position="float">
<label>FIGURE 4</label>
<caption>
<p>Structure of the attention mechanism.</p>
</caption>
<graphic xlink:href="fenrg-10-1008216-g004.tif"/>
</fig>
<p>The specific implementation of the attention mechanism can be expressed as follows:<disp-formula id="e8">
<mml:math id="m55">
<mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mi>e</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>tanh</mml:mi>
<mml:mo>&#x2061;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
</mml:mrow>
<mml:mi mathvariant="bold-italic">W</mml:mi>
</mml:mrow>
<mml:mi>a</mml:mi>
</mml:msub>
<mml:mo>&#x22c5;</mml:mo>
<mml:msub>
<mml:mi mathvariant="bold-italic">h</mml:mi>
<mml:mi>j</mml:mi>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi>b</mml:mi>
<mml:mi>a</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:math>
<label>(8)</label>
</disp-formula>
<disp-formula id="e9">
<mml:math id="m56">
<mml:mrow>
<mml:msub>
<mml:mi>&#x3b1;</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mi>exp</mml:mi>
<mml:mo>&#x2061;</mml:mo>
<mml:mo>&#x2061;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>e</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:msubsup>
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>T</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo>&#x2061;</mml:mo>
<mml:mi>exp</mml:mi>
<mml:mo>&#x2061;</mml:mo>
<mml:mo>&#x2061;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>e</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
<label>(9)</label>
</disp-formula>
<disp-formula id="e10">
<mml:math id="m57">
<mml:mrow>
<mml:msub>
<mml:mover accent="true">
<mml:mi>h</mml:mi>
<mml:mo>&#x5e;</mml:mo>
</mml:mover>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:msubsup>
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>T</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:msub>
<mml:mi>&#x3b1;</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mi>h</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
<label>(10)</label>
</disp-formula>where <inline-formula id="inf48">
<mml:math id="m58">
<mml:mrow>
<mml:msub>
<mml:mi>h</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is the <inline-formula id="inf49">
<mml:math id="m59">
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>th output feature vector at time step <inline-formula id="inf50">
<mml:math id="m60">
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>, <inline-formula id="inf51">
<mml:math id="m61">
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold-italic">h</mml:mi>
<mml:mi>j</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>h</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn mathvariant="italic">1,2</mml:mn>
<mml:mo>,</mml:mo>
<mml:mo>&#x2026;</mml:mo>
<mml:mo>,</mml:mo>
<mml:mi>T</mml:mi>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>, <inline-formula id="inf52">
<mml:math id="m62">
<mml:mrow>
<mml:mi>T</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> is the total time step, <inline-formula id="inf53">
<mml:math id="m63">
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold-italic">W</mml:mi>
<mml:mi>a</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf54">
<mml:math id="m64">
<mml:mrow>
<mml:msub>
<mml:mi>b</mml:mi>
<mml:mi>a</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> are trainable weights and biases, <inline-formula id="inf55">
<mml:math id="m65">
<mml:mrow>
<mml:msub>
<mml:mi>e</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is the attention score of <inline-formula id="inf56">
<mml:math id="m66">
<mml:mrow>
<mml:msub>
<mml:mi>h</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>, <inline-formula id="inf57">
<mml:math id="m67">
<mml:mrow>
<mml:msub>
<mml:mi>&#x3b1;</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is the corresponding weight of <inline-formula id="inf58">
<mml:math id="m68">
<mml:mrow>
<mml:msub>
<mml:mi>h</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>, and <inline-formula id="inf59">
<mml:math id="m69">
<mml:mrow>
<mml:msub>
<mml:mover accent="true">
<mml:mi>h</mml:mi>
<mml:mo>&#x5e;</mml:mo>
</mml:mover>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is the final output feature of attention mechanism layer.</p>
<p>By introducing the attention mechanism, more prominent features can achieve higher scores and thus occupy more weight in the output features. Thus, long-distance interdependent features of loads can be captured more easily.</p>
</sec>
</sec>
<sec id="s2-2">
<title>2.2 Loss function based on uncertainty for multi-task learning</title>
<p>Owing to the influence of different factors such as the external environment and temperature, the uncertainty of electrical and heating loads generally varies significantly. This brings difficulty in the multi-task learning model to define a unified loss function for the training of multiple tasks.</p>
<p>The simplest approach is to integrate the loss functions of the different tasks and then sum them up. This approach has some shortcomings, particularly when there are significant differences in the degree of uncertainty for different tasks. For example, when the model converges, the electrical load may be more regular and performs better in forecasting, whereas the heating load is much more uncertain and exhibits poor forecasting. The reason behind this is that certain loss functions with larger magnitude dominates the entire loss function and hides the effects of loss functions with smaller magnitude. The solution to this problem is to replace the &#x201c;average summation&#x201d; of multiple loss functions with a &#x201c;weighted summation.&#x201d; Weighting can make the scale of each loss function consistent; however, it also introduces a new problem: the hyperparameter of the weight coefficient is difficult to determine.</p>
<p>A weight optimisation approach for MTL using uncertainty was proposed in 2018 by <xref ref-type="bibr" rid="B8">Kendall et al. (2018)</xref>. In this study, we apply the loss function to dynamically adjust the weight coefficient during the training process, which is expressed as follows:<disp-formula id="e11">
<mml:math id="m70">
<mml:mrow>
<mml:mi>L</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>W</mml:mi>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>&#x3c3;</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>&#x3c3;</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mrow>
<mml:mn>2</mml:mn>
<mml:msubsup>
<mml:mi>&#x3c3;</mml:mi>
<mml:mn>1</mml:mn>
<mml:mn>2</mml:mn>
</mml:msubsup>
</mml:mrow>
</mml:mfrac>
<mml:msup>
<mml:mrow>
<mml:mo>&#x2016;</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold-italic">y</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:msup>
<mml:mi mathvariant="bold-italic">f</mml:mi>
<mml:mi>W</mml:mi>
</mml:msup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi mathvariant="bold-italic">x</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>&#x2016;</mml:mo>
</mml:mrow>
<mml:mn>2</mml:mn>
</mml:msup>
<mml:mo>&#x2b;</mml:mo>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mrow>
<mml:mn>2</mml:mn>
<mml:msubsup>
<mml:mi>&#x3c3;</mml:mi>
<mml:mn>2</mml:mn>
<mml:mn>2</mml:mn>
</mml:msubsup>
</mml:mrow>
</mml:mfrac>
<mml:msup>
<mml:mrow>
<mml:mo>&#x2016;</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold-italic">y</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:msup>
<mml:mi mathvariant="bold-italic">f</mml:mi>
<mml:mi>W</mml:mi>
</mml:msup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi mathvariant="bold-italic">x</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>&#x2016;</mml:mo>
</mml:mrow>
<mml:mn>2</mml:mn>
</mml:msup>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>log</mml:mi>
<mml:mo>&#x2061;</mml:mo>
<mml:msub>
<mml:mi>&#x3c3;</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:msub>
<mml:mi>&#x3c3;</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
</mml:mrow>
</mml:math>
<label>(11)</label>
</disp-formula>where <inline-formula id="inf60">
<mml:math id="m71">
<mml:mrow>
<mml:mi mathvariant="bold-italic">x</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> is the input data of the sample, <inline-formula id="inf61">
<mml:math id="m72">
<mml:mrow>
<mml:msup>
<mml:mi mathvariant="bold-italic">f</mml:mi>
<mml:mi>W</mml:mi>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula> denotes the MTL model (<xref ref-type="sec" rid="s2-1">Section 2.1</xref>), <inline-formula id="inf62">
<mml:math id="m73">
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold-italic">y</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>and <inline-formula id="inf63">
<mml:math id="m74">
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="bold-italic">y</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> are the sample labels, <inline-formula id="inf64">
<mml:math id="m75">
<mml:mrow>
<mml:mi>W</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> is the weight of the network, <inline-formula id="inf65">
<mml:math id="m76">
<mml:mrow>
<mml:mi>&#x3c3;</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> (<inline-formula id="inf66">
<mml:math id="m77">
<mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>&#x3c3;</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mo>{</mml:mo>
<mml:mi>&#x3c3;</mml:mi>
</mml:mrow>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:msub>
<mml:mrow>
<mml:mo>,</mml:mo>
<mml:mi>&#x3c3;</mml:mi>
</mml:mrow>
<mml:mn>2</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula>) is the trainable variate.</p>
<p>The parameters <inline-formula id="inf67">
<mml:math id="m78">
<mml:mrow>
<mml:msub>
<mml:mi>&#x3c3;</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf68">
<mml:math id="m79">
<mml:mrow>
<mml:msub>
<mml:mi>&#x3c3;</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> are used to measure the uncertainty of different tasks, and by dividing by <inline-formula id="inf69">
<mml:math id="m80">
<mml:mrow>
<mml:msup>
<mml:mi>&#x3c3;</mml:mi>
<mml:mn>2</mml:mn>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula>, the effect of the uncertainty of different tasks can be eliminated to some extent. It has the following advantages: 1) by dividing by <inline-formula id="inf70">
<mml:math id="m81">
<mml:mrow>
<mml:msup>
<mml:mi>&#x3c3;</mml:mi>
<mml:mn>2</mml:mn>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula>, equivalently, different weight coefficients are assigned to different tasks, which can ensure that the individual tasks converge simultaneously; 2) <inline-formula id="inf71">
<mml:math id="m82">
<mml:mrow>
<mml:mi>log</mml:mi>
<mml:mo>&#x2061;</mml:mo>
<mml:msub>
<mml:mi>&#x3c3;</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:msub>
<mml:mi>&#x3c3;</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is a regular term, which prevents <inline-formula id="inf72">
<mml:math id="m83">
<mml:mrow>
<mml:msub>
<mml:mi>&#x3c3;</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf73">
<mml:math id="m84">
<mml:mrow>
<mml:msub>
<mml:mi>&#x3c3;</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> from becoming infinitely large and ensures reliable convergence of the model; and 3) this loss function does not decrease the accuracy of the original well-performed model, but mainly optimises the parameters of the original poorly performing task.</p>
</sec>
<sec id="s2-3">
<title>2.3 The dropout layer and hard sharing mechanism</title>
<p>Owing to the small number of samples and large number of trainable parameters of LSTM, a dropout layer is added between the LSTM and attention module to prevent overfitting. During each round of training, the dropout layer discards the nodes with a certain probability. The discarded nodes are not identical each time; therefore, the structure of the model is slightly different in each training process (<xref ref-type="bibr" rid="B25">Srivastava et al., 2014</xref>). The dropout rate is a hyperparameter of the dropout layer.</p>
<p>Hard sharing is the most widely used sharing mechanism, that embeds the data representation of multiple tasks into the same space and extracts the task-specific representation for each task using a task-specific layer. Under the hard sharing mechanism, the input features are uniformly shared, and the top-level parameters of each model are independent, mainly by constructing a shared feature layer between individual tasks. Because most of the features are shared, the overfitting probability of the MTL model with the hard sharing mechanism is much smaller (<xref ref-type="bibr" rid="B35">Ye et al., 2022</xref>). Hard sharing is easy to implement and suitable for tasks with a strong correlation such as the coordinated load forecasting in a CIES (<xref ref-type="bibr" rid="B31">Wang et al., 2020a</xref>). Because features extracted from multi-source input data have been concatenated together at the shared layer, we directly use two separate fully connected layers with identical topology to quickly learn the mapping relationship between features and outputs based on the shared layer.</p>
</sec>
</sec>
<sec id="s3">
<title>3 Transfer learning strategy for adaptive load forecasting</title>
<sec id="s3-1">
<title>3.1 Methodology of transfer learning</title>
<p>For transfer learning, there are two basic concepts: the source domain <inline-formula id="inf74">
<mml:math id="m85">
<mml:mrow>
<mml:msub>
<mml:mi>D</mml:mi>
<mml:mi>S</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> and target domain <inline-formula id="inf75">
<mml:math id="m86">
<mml:mrow>
<mml:msub>
<mml:mi>D</mml:mi>
<mml:mi>T</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>. The source domain is the domain with knowledge and a large number of data annotations, which represents the object to be transferred, and the target domain represents the object to which knowledge and annotations are eventually given. Tasks are also divided into source-domain tasks <inline-formula id="inf76">
<mml:math id="m87">
<mml:mrow>
<mml:msub>
<mml:mi>&#x3c8;</mml:mi>
<mml:mi>S</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> and target-domain tasks <inline-formula id="inf77">
<mml:math id="m88">
<mml:mrow>
<mml:msub>
<mml:mi>&#x3c8;</mml:mi>
<mml:mi>T</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>. The transfer learning process involves transferring the knowledge of the source domain to the target domain, finding the forecasting function of the target domain, and completing the task of the target domain. Specifically, transfer learning can be divided into two categories given a labelled source domain i.e., <inline-formula id="inf78">
<mml:math id="m89">
<mml:mrow>
<mml:msub>
<mml:mi>D</mml:mi>
<mml:mi>S</mml:mi>
</mml:msub>
<mml:mo>&#x2260;</mml:mo>
<mml:msub>
<mml:mi>D</mml:mi>
<mml:mi>T</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> or <inline-formula id="inf79">
<mml:math id="m90">
<mml:mrow>
<mml:msub>
<mml:mi>&#x3c8;</mml:mi>
<mml:mi>S</mml:mi>
</mml:msub>
<mml:mo>&#x2260;</mml:mo>
<mml:msub>
<mml:mi>&#x3c8;</mml:mi>
<mml:mi>T</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>.</p>
<p>A schematic of knowledge sharing for the load forecasting of a CIES is shown in <xref ref-type="fig" rid="F5">Figure 5</xref>. In this paper, the centralized heating period of the CIES is considered as the source domain <inline-formula id="inf80">
<mml:math id="m91">
<mml:mrow>
<mml:msub>
<mml:mi>D</mml:mi>
<mml:mi>S</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> and used to initialize the forecasting model, because there is sufficient historical data, and the data distribution for each type of data (the load data and weather data) varies closely from day to day during this period. The transition season is considered as the target domain <inline-formula id="inf81">
<mml:math id="m92">
<mml:mrow>
<mml:msub>
<mml:mi>D</mml:mi>
<mml:mi>T</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>, which has less historical data, and the distribution property for each type of data has changed from that of the centralised heating period.</p>
<fig id="F5" position="float">
<label>FIGURE 5</label>
<caption>
<p>Schematic of knowledge sharing in load forecasting of CIES.</p>
</caption>
<graphic xlink:href="fenrg-10-1008216-g005.tif"/>
</fig>
<p>Traditional machine learning methods require sample data to be independent and identically distributed, which creates challenges for maintaining the precision of the forecasting model. At the same time, the relatively small amount of data in the transition season also limits the ability to obtain an efficient model. Fortunately, although the quantity of user demand changes gradually with the seasons, the energy usage habits of the same user are generally unchanged. Therefore, transfer learning can be introduced to reduce the difference between the source and target domains, and thus obtain an adaptive forecasting model. Here, the role of transfer learning is to extract knowledge sharing from a centralised heating period. This knowledge is then combined with the new data observed during the transition season to continuously adjust the previous model, and finally obtain the target domain model quickly and effectively.</p>
</sec>
<sec id="s3-2">
<title>3.2 Strategies of transfer learning based on maximum mean discrepancy</title>
<p>MMD is used in transfer learning mainly to measure the distribution of two different but related datasets, and is an effective method to measure the correlation of data in the source and target domains. The MMD of two datasets <inline-formula id="inf82">
<mml:math id="m93">
<mml:mrow>
<mml:mi mathvariant="bold-italic">X</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf83">
<mml:math id="m94">
<mml:mrow>
<mml:mi mathvariant="bold-italic">Y</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> is defined as:<disp-formula id="e12">
<mml:math id="m95">
<mml:mrow>
<mml:mi mathvariant="normal">M</mml:mi>
<mml:mi mathvariant="normal">M</mml:mi>
<mml:mi mathvariant="normal">D</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi mathvariant="bold-italic">X</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="bold-italic">Y</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:msup>
<mml:mi>m</mml:mi>
<mml:mn>2</mml:mn>
</mml:msup>
</mml:mrow>
</mml:mfrac>
<mml:msubsup>
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>m</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:msubsup>
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>m</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mi>k</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>j</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x2b;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:msup>
<mml:mi>n</mml:mi>
<mml:mn>2</mml:mn>
</mml:msup>
</mml:mrow>
</mml:mfrac>
<mml:msubsup>
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:msubsup>
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mi>k</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>y</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>y</mml:mi>
<mml:mi>j</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mfrac>
<mml:mn>2</mml:mn>
<mml:mrow>
<mml:mi>m</mml:mi>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:mfrac>
<mml:msubsup>
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>m</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:msubsup>
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mi>k</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>y</mml:mi>
<mml:mi>j</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
<label>(12)</label>
</disp-formula>where <inline-formula id="inf84">
<mml:math id="m96">
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> denotes the number of <inline-formula id="inf85">
<mml:math id="m97">
<mml:mrow>
<mml:mi mathvariant="bold-italic">X</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>, <inline-formula id="inf86">
<mml:math id="m98">
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> denotes the number of <inline-formula id="inf87">
<mml:math id="m99">
<mml:mrow>
<mml:mi mathvariant="bold-italic">Y</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>, and <inline-formula id="inf88">
<mml:math id="m100">
<mml:mrow>
<mml:mi>k</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> is the kernel function.</p>
<p>Typically, radial basis kernel (RBF) functions are used as the kernel function:<disp-formula id="e13">
<mml:math id="m101">
<mml:mrow>
<mml:mi>k</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>j</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>exp</mml:mi>
<mml:mo>&#x2061;</mml:mo>
<mml:mo>&#x2061;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mfrac>
<mml:msup>
<mml:mrow>
<mml:mo>&#x2016;</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>j</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>&#x2016;</mml:mo>
</mml:mrow>
<mml:mn>2</mml:mn>
</mml:msup>
<mml:mrow>
<mml:mn>2</mml:mn>
<mml:msup>
<mml:mi>&#x3bb;</mml:mi>
<mml:mn>2</mml:mn>
</mml:msup>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
<label>(13)</label>
</disp-formula>where <inline-formula id="inf89">
<mml:math id="m102">
<mml:mrow>
<mml:mi>&#x3bb;</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> denotes the width parameter of the kernel function.</p>
<p>It can be observed that if <inline-formula id="inf90">
<mml:math id="m103">
<mml:mrow>
<mml:mi mathvariant="bold-italic">X</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf91">
<mml:math id="m104">
<mml:mrow>
<mml:mi mathvariant="bold-italic">Y</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> are identically distributed, <inline-formula id="inf92">
<mml:math id="m105">
<mml:mrow>
<mml:mi mathvariant="normal">M</mml:mi>
<mml:mi mathvariant="normal">M</mml:mi>
<mml:mi mathvariant="normal">D</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi mathvariant="bold-italic">X</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="bold-italic">Y</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> is approximately zero. In other words, if <inline-formula id="inf93">
<mml:math id="m106">
<mml:mrow>
<mml:mi mathvariant="normal">M</mml:mi>
<mml:mi mathvariant="normal">M</mml:mi>
<mml:mi mathvariant="normal">D</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi mathvariant="bold-italic">X</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="bold-italic">Y</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> is sufficiently small, the two distributions can be considered identical. Then, MMD is used as a criterion to measure the difference in the distribution of electrical and heating loads as well as weather data when seasons change.</p>
<p>As shown in <xref ref-type="fig" rid="F6">Figure 6</xref>, the dynamic source and target domains are divided using a fixed-day sliding time window. For example, if the deviation of the model on day <inline-formula id="inf94">
<mml:math id="m107">
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> does not meet the pre-set forecasting accuracy, the historical data of <inline-formula id="inf95">
<mml:math id="m108">
<mml:mrow>
<mml:mi>N</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> days before day <inline-formula id="inf96">
<mml:math id="m109">
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> are considered as the target domain data, and the historical data from the previous (<inline-formula id="inf97">
<mml:math id="m110">
<mml:mrow>
<mml:mi>M</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>N</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>) to <inline-formula id="inf98">
<mml:math id="m111">
<mml:mrow>
<mml:mi>N</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> days before day <inline-formula id="inf99">
<mml:math id="m112">
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> are considered as the source domain. <inline-formula id="inf100">
<mml:math id="m113">
<mml:mrow>
<mml:mi>M</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf101">
<mml:math id="m114">
<mml:mrow>
<mml:mi>N</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> are the pre-set values of the number of days in the source and target domains, respectively. The value of M is much larger than N, so the source domain data can be used as a reference and can stably reflect the distribution of loads and weather in the previous period. The value of N is generally small; therefore, it can sensitively reflect the gradual degree of recent loads and weather. Then, the MMD values for each type of data in the source and target domains on day <inline-formula id="inf102">
<mml:math id="m115">
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> (<inline-formula id="inf103">
<mml:math id="m116">
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi>M</mml:mi>
<mml:mi>M</mml:mi>
<mml:mi>D</mml:mi>
</mml:mrow>
<mml:mi>t</mml:mi>
<mml:mi mathvariant="normal">E</mml:mi>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula>; <inline-formula id="inf104">
<mml:math id="m117">
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi>M</mml:mi>
<mml:mi>M</mml:mi>
<mml:mi>D</mml:mi>
</mml:mrow>
<mml:mi>t</mml:mi>
<mml:mi mathvariant="normal">H</mml:mi>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula>; <inline-formula id="inf105">
<mml:math id="m118">
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi>M</mml:mi>
<mml:mi>M</mml:mi>
<mml:mi>D</mml:mi>
</mml:mrow>
<mml:mi>t</mml:mi>
<mml:mi mathvariant="normal">W</mml:mi>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula>) are separately calculated using <xref ref-type="disp-formula" rid="e12">Eq. 12</xref>.</p>
<fig id="F6" position="float">
<label>FIGURE 6</label>
<caption>
<p>Schematic of dynamic source and target domains.</p>
</caption>
<graphic xlink:href="fenrg-10-1008216-g006.tif"/>
</fig>
<p>If <inline-formula id="inf106">
<mml:math id="m119">
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi>M</mml:mi>
<mml:mi>M</mml:mi>
<mml:mi>D</mml:mi>
</mml:mrow>
<mml:mi>t</mml:mi>
<mml:mi mathvariant="normal">E</mml:mi>
</mml:msubsup>
<mml:mo>&#x2264;</mml:mo>
<mml:msubsup>
<mml:mi>&#x3b1;</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi mathvariant="normal">E</mml:mi>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula>, <inline-formula id="inf107">
<mml:math id="m120">
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi>M</mml:mi>
<mml:mi>M</mml:mi>
<mml:mi>D</mml:mi>
</mml:mrow>
<mml:mi>t</mml:mi>
<mml:mi mathvariant="normal">H</mml:mi>
</mml:msubsup>
<mml:mo>&#x2264;</mml:mo>
<mml:msubsup>
<mml:mi>&#x3b1;</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi mathvariant="normal">H</mml:mi>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula>, and <inline-formula id="inf108">
<mml:math id="m121">
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi>M</mml:mi>
<mml:mi>M</mml:mi>
<mml:mi>D</mml:mi>
</mml:mrow>
<mml:mi>t</mml:mi>
<mml:mi mathvariant="normal">W</mml:mi>
</mml:msubsup>
<mml:mo>&#x2264;</mml:mo>
<mml:msubsup>
<mml:mi>&#x3b1;</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi mathvariant="normal">W</mml:mi>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula>, no adjustment is required to the forecasting model, where <inline-formula id="inf109">
<mml:math id="m122">
<mml:mrow>
<mml:mi>&#x3b1;</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:msubsup>
<mml:mi>&#x3b1;</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi mathvariant="normal">E</mml:mi>
</mml:msubsup>
<mml:mo>,</mml:mo>
<mml:mtext>&#x2009;</mml:mtext>
<mml:msubsup>
<mml:mi>&#x3b1;</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi mathvariant="normal">H</mml:mi>
</mml:msubsup>
<mml:mo>,</mml:mo>
<mml:mtext>&#x2009;</mml:mtext>
<mml:msubsup>
<mml:mi>&#x3b1;</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi mathvariant="normal">W</mml:mi>
</mml:msubsup>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>, is the threshold value on day <inline-formula id="inf110">
<mml:math id="m123">
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> for different types of data. This indicates that the distributions of the source and target domains are very close. Under this condition, the forecasting model does not need to be adjusted, and forecasting deviations are mostly caused by weather anomalies on a certain day. Occasional poor performance does not indicate a substantial change in load or weather patterns in recent times.</p>
<p>A major advantage of using MMD is that once <inline-formula id="inf111">
<mml:math id="m124">
<mml:mrow>
<mml:mi>M</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf112">
<mml:math id="m125">
<mml:mrow>
<mml:mi>N</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> are given, the thresholds for evaluating the differences in the data distribution over time can be easily obtained. For example, if the allowable average deviation of the electrical load is <inline-formula id="inf113">
<mml:math id="m126">
<mml:mrow>
<mml:msup>
<mml:mi>R</mml:mi>
<mml:mi mathvariant="normal">E</mml:mi>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula>, the final <inline-formula id="inf114">
<mml:math id="m127">
<mml:mrow>
<mml:mi>N</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> days of electrical load data are multiplied by a random number uniformly distributed over <inline-formula id="inf115">
<mml:math id="m128">
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>&#x2212;</mml:mo>
<mml:msup>
<mml:mi>R</mml:mi>
<mml:mi mathvariant="normal">E</mml:mi>
</mml:msup>
<mml:mo>,</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>&#x2b;</mml:mo>
<mml:msup>
<mml:mi>R</mml:mi>
<mml:mi mathvariant="normal">E</mml:mi>
</mml:msup>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> and use it as the simulation data of the target domain. The MMD of both (<inline-formula id="inf116">
<mml:math id="m129">
<mml:mrow>
<mml:msubsup>
<mml:mi>&#x3b1;</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi mathvariant="normal">E</mml:mi>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula>) is calculated as a criterion to judge whether the data distribution of electrical load in the source and target domains has changed. In this paper, the allowable average deviation of weather data <inline-formula id="inf117">
<mml:math id="m130">
<mml:mrow>
<mml:msup>
<mml:mi>R</mml:mi>
<mml:mi mathvariant="normal">W</mml:mi>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula>is the same as the allowable average deviation of heating load <inline-formula id="inf118">
<mml:math id="m131">
<mml:mrow>
<mml:msup>
<mml:mi>R</mml:mi>
<mml:mi mathvariant="normal">H</mml:mi>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula>.</p>
<p>If <inline-formula id="inf119">
<mml:math id="m132">
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi>M</mml:mi>
<mml:mi>M</mml:mi>
<mml:mi>D</mml:mi>
</mml:mrow>
<mml:mi>t</mml:mi>
<mml:mi mathvariant="normal">E</mml:mi>
</mml:msubsup>
<mml:mo>&#x3e;</mml:mo>
<mml:msubsup>
<mml:mi>&#x3b1;</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi mathvariant="normal">E</mml:mi>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula> or <inline-formula id="inf120">
<mml:math id="m133">
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi>M</mml:mi>
<mml:mi>M</mml:mi>
<mml:mi>D</mml:mi>
</mml:mrow>
<mml:mi>t</mml:mi>
<mml:mi mathvariant="normal">H</mml:mi>
</mml:msubsup>
<mml:mo>&#x3e;</mml:mo>
<mml:msubsup>
<mml:mi>&#x3b1;</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi mathvariant="normal">H</mml:mi>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula> or <inline-formula id="inf121">
<mml:math id="m134">
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi>M</mml:mi>
<mml:mi>M</mml:mi>
<mml:mi>D</mml:mi>
</mml:mrow>
<mml:mi>t</mml:mi>
<mml:mi mathvariant="normal">W</mml:mi>
</mml:msubsup>
<mml:mo>&#x3e;</mml:mo>
<mml:msubsup>
<mml:mi>&#x3b1;</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi mathvariant="normal">W</mml:mi>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula>, it means that seasonal changes lead to variation of electrical and heating loads or user energy habits, and the transfer and fine-tuning of model parameters are performed at this time. This can be divided into two categories, as shown in <xref ref-type="fig" rid="F7">Figure 7</xref>.<list list-type="simple">
<list-item>
<p>a) If <inline-formula id="inf122">
<mml:math id="m135">
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi>M</mml:mi>
<mml:mi>M</mml:mi>
<mml:mi>D</mml:mi>
</mml:mrow>
<mml:mi>t</mml:mi>
<mml:mrow>
<mml:mi mathvariant="normal">E</mml:mi>
<mml:mo>/</mml:mo>
<mml:mi mathvariant="normal">H</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo>&#x3e;</mml:mo>
<mml:msubsup>
<mml:mi>&#x3b1;</mml:mi>
<mml:mi>t</mml:mi>
<mml:mrow>
<mml:mi mathvariant="normal">E</mml:mi>
<mml:mo>/</mml:mo>
<mml:mi mathvariant="normal">H</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf123">
<mml:math id="m136">
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi>M</mml:mi>
<mml:mi>M</mml:mi>
<mml:mi>D</mml:mi>
</mml:mrow>
<mml:mi>t</mml:mi>
<mml:mi mathvariant="normal">W</mml:mi>
</mml:msubsup>
<mml:mo>&#x2264;</mml:mo>
<mml:msubsup>
<mml:mi>&#x3b1;</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi mathvariant="normal">W</mml:mi>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula>, using the newest data in the target domain as the new training set and fixing the other parameters of the model on day <inline-formula id="inf124">
<mml:math id="m137">
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>, only the parameters located in the fully connected layer between the shared layer and the output layer of electrical/heating load (located in the blue/red frame of <xref ref-type="fig" rid="F7">Figure 7</xref>) are fine-tuned. The fine-tuned model is used as the forecasting model on day <inline-formula id="inf125">
<mml:math id="m138">
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>.</p>
</list-item>
<list-item>
<p>b) If <inline-formula id="inf126">
<mml:math id="m139">
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi>M</mml:mi>
<mml:mi>M</mml:mi>
<mml:mi>D</mml:mi>
</mml:mrow>
<mml:mi>t</mml:mi>
<mml:mi mathvariant="normal">W</mml:mi>
</mml:msubsup>
<mml:mo>&#x3e;</mml:mo>
<mml:msubsup>
<mml:mi>&#x3b1;</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi mathvariant="normal">W</mml:mi>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula>, using the newest data in the target domain as the new training set and fixing the other parameters of the model on day <inline-formula id="inf127">
<mml:math id="m140">
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>, only the parameters of all fully connected layers between the weather data and the output layer (located in the green frame of <xref ref-type="fig" rid="F7">Figure 7</xref>) are fine-tuned. The fine-tuned model is used as the forecasting model on day <inline-formula id="inf128">
<mml:math id="m141">
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>.</p>
</list-item>
</list>
</p>
<fig id="F7" position="float">
<label>FIGURE 7</label>
<caption>
<p>Strategies of the transfer learning.</p>
</caption>
<graphic xlink:href="fenrg-10-1008216-g007.tif"/>
</fig>
<p>It can be seen that the MMD helps to decide which parts of the network should be fine-tuned. For example, in Scenario a), the weather data does not change significantly, but the electrical or heating load changes significantly, which often occurs at the end of the heating period. During this period, the heating demand decreases, the central heating equipment may be turned off, and the shortfall in the heating load is replaced by other energy conversion equipment. Because the weather features are roughly unchanged, it is not necessary to adjust the parameters corresponding to the weather features. Thus, fewer parameters need to be fine-tuned, which is beneficial for the model to quickly learn dynamic changes in the target domain.</p>
</sec>
<sec id="s3-3">
<title>3.3 Overall framework of the proposed method</title>
<p>The entire online rolling forecasting process using the proposed model is shown in <xref ref-type="fig" rid="F8">Figure 8</xref>. The specific steps are as follows:</p>
<fig id="F8" position="float">
<label>FIGURE 8</label>
<caption>
<p>Overall framework of the proposed method.</p>
</caption>
<graphic xlink:href="fenrg-10-1008216-g008.tif"/>
</fig>
<p>
<statement content-type="step" id="Step_1">
<label>Step 1</label>
<p>: Train the initial model on day 1 offline based on historical data, and use the model to forecast electrical and heating loads on day 2. Set <inline-formula id="inf129">
<mml:math id="m142">
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>;</p>
</statement>
</p>
<p>
<statement content-type="step" id="Step_2">
<label>Step 2</label>
<p>: Forecast the load on day <inline-formula id="inf130">
<mml:math id="m143">
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> with the model on day <inline-formula id="inf131">
<mml:math id="m144">
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>;</p>
</statement>
</p>
<p>
<statement content-type="step" id="Step_3">
<label>Step 3</label>
<p>: At the end of day <inline-formula id="inf132">
<mml:math id="m145">
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>, electrical load deviation <inline-formula id="inf133">
<mml:math id="m146">
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi>M</mml:mi>
<mml:mi>A</mml:mi>
<mml:mi>P</mml:mi>
<mml:mi>E</mml:mi>
</mml:mrow>
<mml:mi>t</mml:mi>
<mml:mi mathvariant="normal">E</mml:mi>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula> and heating load deviation <inline-formula id="inf134">
<mml:math id="m147">
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi>M</mml:mi>
<mml:mi>A</mml:mi>
<mml:mi>P</mml:mi>
<mml:mi>E</mml:mi>
</mml:mrow>
<mml:mi>t</mml:mi>
<mml:mi mathvariant="normal">H</mml:mi>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula> are calculated using <xref ref-type="disp-formula" rid="e14">Eq. 14</xref>. If <inline-formula id="inf135">
<mml:math id="m148">
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi>M</mml:mi>
<mml:mi>A</mml:mi>
<mml:mi>P</mml:mi>
<mml:mi>E</mml:mi>
</mml:mrow>
<mml:mi>t</mml:mi>
<mml:mi mathvariant="normal">E</mml:mi>
</mml:msubsup>
<mml:mo>&#x2264;</mml:mo>
<mml:msup>
<mml:mi>R</mml:mi>
<mml:mi mathvariant="normal">E</mml:mi>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula>, and <inline-formula id="inf136">
<mml:math id="m149">
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi>M</mml:mi>
<mml:mi>A</mml:mi>
<mml:mi>P</mml:mi>
<mml:mi>E</mml:mi>
</mml:mrow>
<mml:mi>t</mml:mi>
<mml:mi mathvariant="normal">H</mml:mi>
</mml:msubsup>
<mml:mo>&#x2264;</mml:mo>
<mml:msup>
<mml:mi>R</mml:mi>
<mml:mi mathvariant="normal">H</mml:mi>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula>, the model is not adjusted and is used directly for the forecasting task on day <inline-formula id="inf137">
<mml:math id="m150">
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>, where <inline-formula id="inf138">
<mml:math id="m151">
<mml:mrow>
<mml:msup>
<mml:mi>R</mml:mi>
<mml:mi mathvariant="normal">E</mml:mi>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf139">
<mml:math id="m152">
<mml:mrow>
<mml:msup>
<mml:mi>R</mml:mi>
<mml:mi mathvariant="normal">H</mml:mi>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula> are pre-set electrical and heating load accuracy thresholds;</p>
</statement>
</p>
<p>
<statement content-type="step" id="Step_4">
<label>Step 4</label>
<p>: If <inline-formula id="inf140">
<mml:math id="m153">
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi>M</mml:mi>
<mml:mi>A</mml:mi>
<mml:mi>P</mml:mi>
<mml:mi>E</mml:mi>
</mml:mrow>
<mml:mi>t</mml:mi>
<mml:mi mathvariant="normal">E</mml:mi>
</mml:msubsup>
<mml:mo>&#x3e;</mml:mo>
<mml:msup>
<mml:mi>R</mml:mi>
<mml:mi mathvariant="normal">E</mml:mi>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula> or <inline-formula id="inf141">
<mml:math id="m154">
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi>M</mml:mi>
<mml:mi>A</mml:mi>
<mml:mi>P</mml:mi>
<mml:mi>E</mml:mi>
</mml:mrow>
<mml:mi>t</mml:mi>
<mml:mi mathvariant="normal">H</mml:mi>
</mml:msubsup>
<mml:mo>&#x3e;</mml:mo>
<mml:msup>
<mml:mi>R</mml:mi>
<mml:mi mathvariant="normal">H</mml:mi>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula>, the parameters are fine-tuned according to the different strategies in <xref ref-type="sec" rid="s3-2">Section 3.2</xref>. After updating the parameters, the new model is used for the forecasting task on day <inline-formula id="inf142">
<mml:math id="m155">
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>;</p>
</statement>
</p>
<p>
<statement content-type="step" id="Step_5">
<label>Step 5</label>
<p>: <inline-formula id="inf143">
<mml:math id="m156">
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>, repeat <xref ref-type="statement" rid="Step_2">Step 2</xref> to <xref ref-type="statement" rid="Step_4">Step 4</xref>, continuously update the model online to complete the following forecasting tasks.</p>
</statement>
</p>
</sec>
</sec>
<sec id="s4">
<title>4 Case studies and analysis</title>
<p>In this section, to demonstrate the effectiveness of the proposed method, we present simulation experiments based on real-world data of a CIES provided by the official website of the National Renewable Energy Laboratory (<xref ref-type="bibr" rid="B17">NREL Data Catalog, 2011</xref>) and a CIES in China. The results are compared with the following models and updating strategies:</p>
<p>Model-1 (no update): The model is initially trained in an offline batch manner and utilised permanently without updating.</p>
<p>Model-2 (daily update): The model is trained daily in a batch manner. The training set of the model keeps the number of training samples constant, continuously adding the newest observed data and eliminating the oldest data. The model adopts the structure described in <xref ref-type="sec" rid="s2">Section 2</xref>.</p>
<p>Model-3 (single-task model, online update): This model adopts the most widely used LSTM network, and its results can be used as a reference for evaluation. The model also adopts the transfer learning strategy described in <xref ref-type="sec" rid="s3">Section 3</xref>.</p>
<p>Model-4 (without considering the degree of uncertainty): Except for the loss function, the rest of the model is the same as in Model-5.</p>
<p>Model-5: The multi-tasking rolling adaptive forecasting method proposed in this study.</p>
<p>The determination of the hyperparameters adopts the longitudinal comparison method (<xref ref-type="bibr" rid="B36">Yu et al., 2021</xref>). The initial model is obtained by conducting several trials on the training set to determine optimal parameters. The longitudinal comparison method adopts the idea of the control variable method. According to the importance of each hyperparameter, the hyperparameters of different models are determined in the following priority: number of network layers&#x2013;number of filters in 1DCNN&#x2014;number of neurons in LSTM layer&#x2014;dropout rate&#x2014;number of iterations&#x2014;batch size. The candidate sets for each hyperparameter are shown in <xref ref-type="sec" rid="s12">Supplementary Table SA1</xref>. For example, when determining the number of layers of 1DCNN, the values of other hyperparameters are temporarily given empirically. The number of layers that minimizes the RMSE of the training set is used as the number of layers of 1DCNN and remains fixed throughout the optimization search process. Then, the next hyperparameters are determined in order of priority.</p>
<p>To unify the magnitudes, smooth the gradients between different batches and different layers of data, we use 0&#x2013;1 normalization to normalize the data of the training set. To prevent possible changes in the maximum/minimum values when new data of the testing set are added, the maximum/minimum values for each type of data are determined based on the entire original data set, which can also prevent the effects from anomalous data. <xref ref-type="disp-formula" rid="e14">Eq. 14</xref> is used to normalize the input data:<disp-formula id="e14">
<mml:math id="m157">
<mml:mrow>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>n</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>o</mml:mi>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mrow>
<mml:mi mathvariant="normal">m</mml:mi>
<mml:mi mathvariant="normal">e</mml:mi>
<mml:mi mathvariant="normal">a</mml:mi>
<mml:mi mathvariant="normal">n</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>max</mml:mi>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>min</mml:mi>
</mml:msub>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
<label>(14)</label>
</disp-formula>where <inline-formula id="inf144">
<mml:math id="m158">
<mml:mrow>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>n</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is the normalized data, <inline-formula id="inf145">
<mml:math id="m159">
<mml:mrow>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>o</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is the original data, <inline-formula id="inf146">
<mml:math id="m160">
<mml:mrow>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mrow>
<mml:mi mathvariant="normal">m</mml:mi>
<mml:mi mathvariant="normal">e</mml:mi>
<mml:mi mathvariant="normal">a</mml:mi>
<mml:mi mathvariant="normal">n</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>, <inline-formula id="inf147">
<mml:math id="m161">
<mml:mrow>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>max</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf148">
<mml:math id="m162">
<mml:mrow>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>min</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> are the mean, maximum, and minimum values of all data in the dataset, respectively.</p>
<p>The evaluation criteria used in this study are the mean absolute percentage deviation (MAPE) and root mean square error (RMSE), which are calculated as follows:<disp-formula id="e15">
<mml:math id="m163">
<mml:mrow>
<mml:mi>M</mml:mi>
<mml:mi>A</mml:mi>
<mml:mi>P</mml:mi>
<mml:mi>E</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:msubsup>
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>24</mml:mn>
</mml:mrow>
</mml:msubsup>
<mml:mfrac>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mn>24</mml:mn>
</mml:mrow>
</mml:mfrac>
<mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:mover accent="true">
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>&#x5e;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mfrac>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
</mml:mrow>
<mml:mo>&#xd7;</mml:mo>
<mml:mn>100</mml:mn>
<mml:mo>%</mml:mo>
</mml:mrow>
</mml:math>
<label>(15)</label>
</disp-formula>
<disp-formula id="e16">
<mml:math id="m164">
<mml:mrow>
<mml:mi>R</mml:mi>
<mml:mi>M</mml:mi>
<mml:mi>S</mml:mi>
<mml:mi>E</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:msqrt>
<mml:mrow>
<mml:msubsup>
<mml:mo>&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>24</mml:mn>
</mml:mrow>
</mml:msubsup>
<mml:mfrac>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mn>24</mml:mn>
</mml:mrow>
</mml:mfrac>
<mml:msup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>&#x2212;</mml:mo>
<mml:mover accent="true">
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>&#x5e;</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mn>2</mml:mn>
</mml:msup>
</mml:mrow>
</mml:msqrt>
</mml:mrow>
</mml:math>
<label>(16)</label>
</disp-formula>where <inline-formula id="inf149">
<mml:math id="m165">
<mml:mrow>
<mml:mover accent="true">
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>&#x5e;</mml:mo>
</mml:mover>
</mml:mrow>
</mml:math>
</inline-formula> is the forecasting value for the <inline-formula id="inf150">
<mml:math id="m166">
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>th hour and <inline-formula id="inf151">
<mml:math id="m167">
<mml:mrow>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is the actual value for the <inline-formula id="inf152">
<mml:math id="m168">
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>th hour.</p>
<p>The simulation experiments for <xref ref-type="statement" rid="Case_1">Case 1</xref> and <xref ref-type="statement" rid="Case_2">Case 2</xref> are conducted under the framework of TenserFlow 2.4.1, with Intel Core i7 CPU as the hardware platform and Pycharm 2020.3 as the integrated development environment.</p>
<p>
<statement content-type="case" id="Case_1">
<label>Case 1</label>
<p>: A typical park from NREL</p>
<p>The typical park from NREL consists of electrical, thermal, and cooling systems, with energy conversion equipment, including boilers and chillers. The dataset is composed of the hourly average electrical load, heating load, temperature, and solar radiation, collected from January 2011 to December 2011.</p>
<p>Cosine similarity is used to measure the similarity of load patterns between weekdays and weekends, and the results are shown in <xref ref-type="sec" rid="s12">Supplementary Figure SA1</xref>. The results indicate that there is a significant difference between the weekday and weekend load patterns for this park; therefore, separate forecasting models are constructed for weekdays and weekends. The data collected from 1 January 2011 to 10 February 2011 are used as the training set, and the remaining data are used as the testing set. The number of days of the source and target domains are 20 and 4, respectively. The accuracies of the electrical and heating loads are set as 8% and 12%, respectively. The optimal hyperparameters of the different models in <xref ref-type="statement" rid="Case_1">Case 1</xref> are presented in <xref ref-type="sec" rid="s12">Supplementary Table SA2</xref>.</p>
<p>The forecasting results of the different models during the heating period are shown in <xref ref-type="fig" rid="F9">Figure 9</xref>. It is clear from <xref ref-type="fig" rid="F9">Figure 9</xref> that Model-1 has the lowest forecasting accuracy. In the first few days, its accuracy is almost identical to that of the other models, but over time, the forecasting performance of Model-1 drops dramatically. Because the training set of Model-2 is updated with time, its forecasting accuracy can be improved adaptively over a period of time; however, it also performs poorly in transition seasons and cannot fully capture the dynamic load changes over time.</p>
<p>
<xref ref-type="fig" rid="F10">Figure 10</xref> shows the forecasting results of different models in detail. It can be concluded the overall performance of Model-3 is better than Model-1 and Model-2, but there are a few time periods with large forecasting deviations that even inferior to Model-1. This is due to the fact that the single-task model does not consider the mutual coupling relationship between the electrical and heating loads and is more prone to overfitting. <xref ref-type="fig" rid="F10">Figure 10</xref> 4) demonstrates that all models perform poorly when the daily fluctuation of the heating load in the transition season (from 14 February 2011 to 18 February 2011) is drastic. However, after 2&#xa0;days of fine-tuning model parameters, the results of Model-5 are closest to the actual values, which indicates that Model-5 can capture the load change characteristics most quickly and stably.</p>
<p>
<xref ref-type="fig" rid="F11">Figure 11</xref> shows the distribution of RMSE of the different models. It demonstrates that that the results of Model-1 deviate significantly from the actual values and cannot be used for day-ahead forecasting throughout the year. Model-2 with the constantly updated training set has better forecasting performance in the period of smooth changes, but cannot capture load dynamics quickly when the seasonal changes are drastic. Model-3 is generally better than Model-1 and Model-2, but large deviations still occur in a few periods, which is due to the failure to consider the relationship between electrical and heating loads at the same moment. This problem makes Model-3 prone to overfitting phenomena, insufficient generalization ability and poor stability. When entering the heating period from the transition period again, Model-5 can also learn the dynamic changes of the diverse loads fastest and most stably.</p>
<p>The specific statistics for the heating period are listed in <xref ref-type="table" rid="T1">Table 1</xref>. Combining <xref ref-type="fig" rid="F11">Figure 11</xref> with <xref ref-type="table" rid="T1">Table 1</xref>, it can be concluded that Model-5 has higher forecasting accuracy than Model-4. The performance of the two methods on the electrical load is almost the same, but the accuracy improvement of Model-5 on the heating load is more obvious. Because the Pearson correlation coefficient of the electrical and heating loads of the park is as high as 0.94, the degree of homoscedastic uncertainty between the two is comparable, so the improvement obtained by considering uncertainty is not very obvious.</p>
</statement>
</p>
<fig id="F9" position="float">
<label>FIGURE 9</label>
<caption>
<p>Comparison of the load forecasting results in <xref ref-type="statement" rid="Case_1">Case 1</xref>.</p>
</caption>
<graphic xlink:href="fenrg-10-1008216-g009.tif"/>
</fig>
<fig id="F10" position="float">
<label>FIGURE 10</label>
<caption>
<p>Comparison of the details in <xref ref-type="statement" rid="Case_1">Case 1</xref>.</p>
</caption>
<graphic xlink:href="fenrg-10-1008216-g010.tif"/>
</fig>
<fig id="F11" position="float">
<label>FIGURE 11</label>
<caption>
<p>Comparison of distribution of RMSE for the different models in <xref ref-type="statement" rid="Case_1">Case 1</xref>.</p>
</caption>
<graphic xlink:href="fenrg-10-1008216-g011.tif"/>
</fig>
<table-wrap id="T1" position="float">
<label>TABLE 1</label>
<caption>
<p>Indicator results of the different models for the heating period in <xref ref-type="statement" rid="Case_1">Case 1</xref>. The exact meaning of Model 1-5 has been given at the beginning of <xref ref-type="sec" rid="s4">Section 4</xref>.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">Indicators</th>
<th align="left">Model-1</th>
<th align="left">Model-2</th>
<th align="left">Model-3</th>
<th align="left">Model-4</th>
<th align="left">Model-5</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">MAPE (electrical)</td>
<td align="left">86.71</td>
<td align="left">32.41</td>
<td align="left">19.65</td>
<td align="left">14.65</td>
<td align="left">
<bold>14.62</bold>
</td>
</tr>
<tr>
<td align="left">RMSE (electrical)</td>
<td align="left">628.58</td>
<td align="left">252.10</td>
<td align="left">204.28</td>
<td align="left">158.85</td>
<td align="left">
<bold>158.50</bold>
</td>
</tr>
<tr>
<td align="left">RMSE (heating)</td>
<td align="left">330.17</td>
<td align="left">122.62</td>
<td align="left">83.92</td>
<td align="left">63.80</td>
<td align="left">
<bold>54.90</bold>
</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>
<statement content-type="case" id="Case_2">
<label>Case 2</label>
<p>A practical CIES in China</p>
<p>The studied CIES in China consists of electricity, thermal and cooling systems, with energy conversion equipment including CCHP units, electrical boilers, and ground source heat pumps (<xref ref-type="bibr" rid="B43">Zhao et al., 2022</xref>). The dataset is composed of hourly average electrical load, heating load, temperature, photovoltaic power, solar radiation, humidity, and wind speed, collected from October 2019 to June 2020.</p>
<p>Similarly, cosine similarity analysis shows that there is no difference between weekdays and weekends on this park; therefore, there is no need to model these cases separately. In fact, the park is operational all year round because of its business type. The Pearson correlation coefficients for diverse loads and influencing factors are shown in <xref ref-type="sec" rid="s12">Supplementary Figure SA2</xref>. The influencing factors with correlation coefficients less than 0.4 (weak correlation) are not considered to avoid the influence of noise, and the final selected environment input is the temperature data.</p>
<p>Another difference compared with <xref ref-type="statement" rid="Case_1">Case 1</xref> is that the correlation coefficient of the electrical and heating loads for this park is 0.63 (moderate correlation), so there is a relatively obvious difference in uncertainty between the two. The data collected from 1 October 2019 to 12 February 2020 are used as the training set, and the remaining data are used as the testing set. The optimal hyperparameters of the different models in <xref ref-type="statement" rid="Case_2">Case 2</xref> are listed in <xref ref-type="sec" rid="s12">Supplementary Table SA3</xref>.</p>
<p>A comparison of the electrical and heating load accuracies for each algorithm is shown in <xref ref-type="fig" rid="F12">Figure 12</xref>. It is clear from <xref ref-type="fig" rid="F12">Figure 12</xref> that Model-1 and Model-2 still have the worst forecasting performance, and Model-3 still exhibits large deviations during certain periods, which is consistent with the previous conclusions of <xref ref-type="statement" rid="Case_1">Case 1</xref>. The daily curves of the electrical load are more regular and their uncertainties are small, whereas the fluctuation of the heating load is much higher.</p>
<p>In <xref ref-type="fig" rid="F12">Figure 12</xref> 1), comparing Model-4 and Model-5, it can be concluded that the forecasting performance of the model with and without considering load uncertainty differences is comparable, which is due to the high regularity of the electrical load. The dynamic weight of the loss function corresponding to the electrical load in Model-5 is larger; therefore, the parameters corresponding to the electrical load are not easily adjusted.</p>
<p>
<xref ref-type="fig" rid="F12">Figure 12</xref> 2) demonstrates that compared with Model-4, the forecasting effect of Model-5, which uses homoscedastic uncertainty to optimise the overall loss, has a significant improvement in forecasting performance, especially in the transition period. Although the RMSE of the heating load forecasted by Model-4 decreases rapidly after large deviations occur, the RMSE of the heating load forecasted by Model-5 remains at a low level. To minimise the comprehensive loss function, the weight of the heating forecasting task is smaller. This allows significant adjustment of the parameters corresponding to the heating load and effectively learns new load characteristics caused by changes in the external environment, thereby improving forecasting accuracy.</p>
<p>The specific statistics for the heating period in <xref ref-type="statement" rid="Case_2">Case 2</xref> are listed in <xref ref-type="table" rid="T2">Table 2</xref>. Compared with the four models, the MAPE and RMSE of the electrical and heating loads forecasted by Model-5 decrease by at least 4.99%, 5.61%, 18.22%, and 16.72%, respectively. <xref ref-type="fig" rid="F13">Figure 13</xref> shows the number of days that meet the different forecasting precisions in <xref ref-type="statement" rid="Case_2">Case 2</xref>. Based on a comparison of the results shown in <xref ref-type="table" rid="T2">Table 2</xref> and <xref ref-type="fig" rid="F13">Figure 13</xref>, it can be concluded that the forecasting performance of the proposed method (Model-5) is superior to that of the other methods in all aspects, both in terms of load type and different evaluation criteria. This improvement is particularly evident for the heating load forecasting task.</p>
</statement>
</p>
<table-wrap id="T2" position="float">
<label>TABLE 2</label>
<caption>
<p>Indicator results of the different models for the heating period in <xref ref-type="statement" rid="Case_2">Case 2</xref>.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">Indicators</th>
<th align="left">Model-1</th>
<th align="left">Model-2</th>
<th align="left">Model-3</th>
<th align="left">Model-4</th>
<th align="left">Model-5</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">MAPE (electrical)</td>
<td align="left">15.57</td>
<td align="left">11.82</td>
<td align="left">7.75</td>
<td align="left">7.53</td>
<td align="left">
<bold>7.15</bold>
</td>
</tr>
<tr>
<td align="left">RMSE (electrical)</td>
<td align="left">258.52</td>
<td align="left">190.95</td>
<td align="left">127.43</td>
<td align="left">123.43</td>
<td align="left">
<bold>116.51</bold>
</td>
</tr>
<tr>
<td align="left">MAPE (heating)</td>
<td align="left">58.94</td>
<td align="left">48.90</td>
<td align="left">16.95</td>
<td align="left">16.00</td>
<td align="left">
<bold>13.08</bold>
</td>
</tr>
<tr>
<td align="left">RMSE (heating)</td>
<td align="left">823.06</td>
<td align="left">626.05</td>
<td align="left">292.67</td>
<td align="left">280.28</td>
<td align="left">
<bold>233.40</bold>
</td>
</tr>
</tbody>
</table>
</table-wrap>
<fig id="F12" position="float">
<label>FIGURE 12</label>
<caption>
<p>Comparison of the forecasting results in <xref ref-type="statement" rid="Case_2">Case 2</xref>.</p>
</caption>
<graphic xlink:href="fenrg-10-1008216-g012.tif"/>
</fig>
<fig id="F13" position="float">
<label>FIGURE 13</label>
<caption>
<p>Distribution of the number of days to meet different precision.</p>
</caption>
<graphic xlink:href="fenrg-10-1008216-g013.tif"/>
</fig>
</sec>
<sec id="s5">
<title>5 Conclusion</title>
<p>Oriented to the adaptive multi-energy load forecasting of CIES, this paper proposes an adaptive forecasting method for diverse loads of CIES based on deep transfer learning. The proposed model uses multi-task learning to learn the interrelationships among diverse loads. CNN and LSTM are constructed to extract the features of loads at different time scales separately, and then an attention mechanism module is introduced to pay more attention to the important features. Furthermore, the dynamic weights of different tasks are assigned according to the differences in the degree of uncertainty of diverse loads to optimise the overall forecasting model. To address the adaptation of the proposed model, a deep transfer learning strategy is adopted, which enables the forecasting model to rapidly capture new CIES features. Two simulation experiments are conducted for different scenarios. The results show that the performance of the proposed method in this study is better than that of four benchmark models in forecasting diverse CIES loads. The following conclusions are drawn.</p>
<p>First, transfer learning is an effective method for addressing seasonal changes in CIES loads. The model without updating does not produce a consistently accurate forecast. The model whose training set is continuously updated over time can reflect the dynamic changes in load, but its performance is also poor when the load changes drastically during the seasonal transition. Second, compared to the single-task learning model, the multi-task learning model has better performance because the MTL considers the relationship between diverse loads and shares their potential information, owing to which the model has stronger generalisation ability. Finally, the MTL loss function applied in this study can improve the forecasting accuracy of the task with larger uncertainty.</p>
<p>Limited by the availability of data, none of the cases in this study include gas loads. In future work, CIES containing electrical, gas, and heating loads can be investigated. In addition, this study does not consider the impact of demand-side management, which can be studied further.</p>
</sec>
</body>
<back>
<sec sec-type="data-availability" id="s6">
<title>Data availability statement</title>
<p>The original contributions presented in the study are included in the article/<xref ref-type="sec" rid="s12">Supplementary Material</xref>, further inquiries can be directed to the corresponding author.</p>
</sec>
<sec id="s7">
<title>Author contributions</title>
<p>KW: data curation, writing&#x2014;original draft. HY: conceptualization and methodology. GS: formal analysis, writing&#x2014;review and editing. JX: project administration. JL: investigation and software. PL: supervision and validation.</p>
</sec>
<sec id="s9">
<title>Funding</title>
<p>This study was supported by the National Natural Science Foundation of China (51907139, 52011530127).</p>
</sec>
<sec sec-type="COI-statement" id="s10">
<title>Conflict of interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
<p>The authors declare that this study received funding from Science and Technology Project of Tianjin Electric Power Company (KJ21-1-36). The funder had the following involvement in the study: JX: project administration. JL: investigation and software.</p>
</sec>
<sec sec-type="disclaimer" id="s11">
<title>Publisher&#x2019;s note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
<sec id="s12">
<title>Supplementary material</title>
<p>The Supplementary Material for this article can be found online at: <ext-link ext-link-type="uri" xlink:href="https://www.frontiersin.org/articles/10.3389/fenrg.2022.1008216/full%23supplementary-material">https://www.frontiersin.org/articles/10.3389/fenrg.2022.1008216/full&#x23;supplementary-material</ext-link>
</p>
<supplementary-material xlink:href="Table1.docx" id="SM1" mimetype="application/docx" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="Image2.pdf" id="SM2" mimetype="application/pdf" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="Table2.docx" id="SM3" mimetype="application/docx" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="Table3.docx" id="SM4" mimetype="application/docx" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="Image1.pdf" id="SM5" mimetype="application/pdf" xmlns:xlink="http://www.w3.org/1999/xlink"/>
</sec>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bracale</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Caramia</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>De</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Hong</surname>
<given-names>T.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Multivariate quantile regression for short-term probabil-istic load forecasting</article-title>. <source>IEEE Trans. Power Syst.</source> <volume>35</volume>, <fpage>628</fpage>&#x2013;<lpage>638</lpage>. <pub-id pub-id-type="doi">10.1109/TPWRS.2019.2924224</pub-id> </citation>
</ref>
<ref id="B2">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Cheng</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Lu</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Kang</surname>
<given-names>C.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Planning multiple energy systems toward low-carbon society: A decentralized approach</article-title>. <source>IEEE Trans. Smart Grid</source> <volume>10</volume>, <fpage>4859</fpage>&#x2013;<lpage>4869</lpage>. <pub-id pub-id-type="doi">10.1109/TSG.2018.2870323</pub-id> </citation>
</ref>
<ref id="B3">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Daniel</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Pedro</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Zita</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Regina</surname>
<given-names>C.</given-names>
</name>
</person-group> (<year>2022</year>). <article-title>Short time electricity consumption forecast in an industry facility</article-title>. <source>IEEE Trans. Ind. Appl.</source> <volume>58</volume>, <fpage>123</fpage>&#x2013;<lpage>130</lpage>. <pub-id pub-id-type="doi">10.1109/TIA.2021.3123103</pub-id> </citation>
</ref>
<ref id="B4">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Fan</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Sun</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Xiao</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Ma</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Lee</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>J.</given-names>
</name>
<etal/>
</person-group> (<year>2020</year>). <article-title>Statistical investigations of transfer learning-based methodology for short-term building energy predictions</article-title>. <source>Appl. Energy</source> <volume>262</volume>, <fpage>114499</fpage>. <pub-id pub-id-type="doi">10.1016/j.apenergy.2020.114499</pub-id> </citation>
</ref>
<ref id="B5">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Fu</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>N&#xfa;&#xf1;ez</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Schutter</surname>
<given-names>B. D.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>A short-term preventive maintenance scheduling method for distribution networks with distributed generators and batteries</article-title>. <source>IEEE Trans. Power Syst.</source> <volume>36</volume>, <fpage>2516</fpage>&#x2013;<lpage>2531</lpage>. <pub-id pub-id-type="doi">10.1109/TPWRS.2020.3037558</pub-id> </citation>
</ref>
<ref id="B6">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gianfranco</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Shariq</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Andrea</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Pierluigi</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Flexibility from distributed multienergy systems</article-title>. <source>Proc. IEEE</source> <volume>108</volume>, <fpage>1496</fpage>&#x2013;<lpage>1517</lpage>. <pub-id pub-id-type="doi">10.1109/JPROC.2020.2986378</pub-id> </citation>
</ref>
<ref id="B7">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Jalali</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Ahmadian</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Khosravi</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Miadreza</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Saeid</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Jo&#xe3;o</surname>
<given-names>P.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>A novel evolutionary-based deep convolutional neural network model for intelligent load forecasting</article-title>. <source>IEEE Trans. Ind. Inf.</source> <volume>17</volume>, <fpage>8243</fpage>&#x2013;<lpage>8253</lpage>. <pub-id pub-id-type="doi">10.1109/TII.2021.3065718</pub-id> </citation>
</ref>
<ref id="B8">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kendall</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Gal</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Cipolla</surname>
<given-names>R.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Multi-task learning using uncertainty to weigh losses for scene geometry and semantics</article-title>. <source>Proc. IEEE Conf. Comput. Vis. pattern Recognit.</source> <volume>2018</volume>, <fpage>7482</fpage>&#x2013;<lpage>7491</lpage>. <pub-id pub-id-type="doi">10.1109/CVPR.2018.00781</pub-id> </citation>
</ref>
<ref id="B9">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kong</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Zheng</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>C.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Improved deep belief network for short-term load forecasting considering demand-side management</article-title>. <source>IEEE Trans. Power Syst.</source> <volume>35</volume>, <fpage>1531</fpage>&#x2013;<lpage>1538</lpage>. <pub-id pub-id-type="doi">10.1109/TPWRS.2019.2943972</pub-id> </citation>
</ref>
<ref id="B10">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kuster</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Rezgui</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Mourshed</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>Electrical load forecasting models: A critical systematic review</article-title>. <source>Sustain. Cities Soc.</source> <volume>35</volume>, <fpage>257</fpage>&#x2013;<lpage>270</lpage>. <pub-id pub-id-type="doi">10.1016/j.scs.2017.08.009</pub-id> </citation>
</ref>
<ref id="B11">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Le</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Bengio</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Hinton</surname>
<given-names>G.</given-names>
</name>
</person-group> (<year>2015</year>). <article-title>Deep learning</article-title>. <source>Nature</source> <volume>521</volume>, <fpage>436</fpage>&#x2013;<lpage>444</lpage>. <pub-id pub-id-type="doi">10.1038/nature14539</pub-id> </citation>
</ref>
<ref id="B45">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Li</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Yu</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Yan</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Ji</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Wu</surname>
<given-names>J.</given-names>
</name>
<etal/>
</person-group> (<year>2022</year>). <article-title>Quantized event-driven simulation for integrated energy systems with hybrid continuous-discrete dynamics</article-title>. <source>Appl. Energy</source> <volume>307</volume>, <fpage>118268</fpage>. <pub-id pub-id-type="doi">10.1016/j.apenergy.2021.118268</pub-id> </citation>
</ref>
<ref id="B12">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Liu</surname>
<given-names>X.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Energy stations and pipe network collaborative planning of integrated energy system based on load complementary characteristics</article-title>. <source>Sustain. Energy Grids Netw.</source> <volume>23</volume>, <fpage>100374</fpage>. <pub-id pub-id-type="doi">10.1016/j.segan.2020.100374</pub-id> </citation>
</ref>
<ref id="B13">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>L&#xf3;pez</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Rider</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Wu</surname>
<given-names>Q.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Parsimonious short-term load forecasting for optimal operation planning of electrical distribution systems</article-title>. <source>IEEE Trans. Power Syst.</source> <volume>34</volume>, <fpage>1427</fpage>&#x2013;<lpage>1437</lpage>. <pub-id pub-id-type="doi">10.1109/TPWRS.2018.2872388</pub-id> </citation>
</ref>
<ref id="B14">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lu</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Huang</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2022</year>). <article-title>A short-term load forecasting model based on mixup and transfer learning</article-title>. <source>Electr. Power Syst. Res.</source> <volume>207</volume>, <fpage>107837</fpage>. <pub-id pub-id-type="doi">10.1016/j.epsr.2022.107837</pub-id> </citation>
</ref>
<ref id="B15">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lyon</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Hedman</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2015</year>). <article-title>Market implications and pricing of dynamic reserve policies for systems with renewables</article-title>. <source>IEEE Trans. Power Syst.</source> <volume>30</volume>, <fpage>1593</fpage>&#x2013;<lpage>1602</lpage>. <pub-id pub-id-type="doi">10.1109/PESGM.2015.7285837</pub-id> </citation>
</ref>
<ref id="B16">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ming</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Xia</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Lee</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Adepoju</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Shakkottai</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Xie</surname>
<given-names>L.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Prediction and assessment of demand response potential with coupon incentives in highly renewable power systems</article-title>. <source>Prot. Control Mod. Power Syst.</source> <volume>5</volume>, <fpage>12</fpage>. <pub-id pub-id-type="doi">10.1186/s41601-020-00155-x</pub-id> </citation>
</ref>
<ref id="B17">
<citation citation-type="web">
<collab>National Renewable Energy Laboratory (NREL) Data Catalog</collab> (<year>2011</year>). <comment>Available at: <ext-link ext-link-type="uri" xlink:href="https://data.nrel.gov/submissions/40">https://data.nrel.gov/submissions/40</ext-link>
</comment>.</citation>
</ref>
<ref id="B18">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Pinto</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Roy</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Hong</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Capozzoli</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2022</year>). <article-title>Transfer learning for smart buildings: A critical review of algorithms, applications, and future perspectives</article-title>. <source>Adv. Appl. Energy</source> <volume>5</volume>, <fpage>100084</fpage>. <pub-id pub-id-type="doi">10.1016/j.adapen.2022.100084</pub-id> </citation>
</ref>
<ref id="B19">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Qian</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Gao</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Yang</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Yu</surname>
<given-names>D.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Potential analysis of the transfer learning model in short and medium-term forecasting of building HVAC energy consumption</article-title>. <source>Energy</source> <volume>193</volume>, <fpage>116724</fpage>. <pub-id pub-id-type="doi">10.1016/j.energy.2019.116724</pub-id> </citation>
</ref>
<ref id="B20">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Qin</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Wu</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Zheng</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Jing</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Wu</surname>
<given-names>Q.</given-names>
</name>
<etal/>
</person-group> (<year>2020</year>). <article-title>Optimal operation of integrated energy systems subject to coupled demand constraints of electricity and natural gas</article-title>. <source>CSEE J. Power Energy Syst.</source> <volume>6</volume>, <fpage>444</fpage>&#x2013;<lpage>457</lpage>. <pub-id pub-id-type="doi">10.17775/CSEEJPES.2018.00640</pub-id> </citation>
</ref>
<ref id="B21">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Quelhas</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Gil</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Mccalley</surname>
<given-names>J. D.</given-names>
</name>
<name>
<surname>Ryan</surname>
<given-names>S. M.</given-names>
</name>
</person-group> (<year>2007</year>). <article-title>A multiperiod generalized network flow model of the US integrated energy system: Part I-model description</article-title>. <source>IEEE Trans. Power Syst.</source> <volume>22</volume>, <fpage>829</fpage>&#x2013;<lpage>836</lpage>. <pub-id pub-id-type="doi">10.1109/TPWRS.2007.894844</pub-id> </citation>
</ref>
<ref id="B22">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ribeiro</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Grolinger</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Elyamany</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Wilson</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Miriam</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Transfer learning with seasonal and trend adjustment for cross-building energy forecasting</article-title>. <source>Energy Build.</source> <volume>165</volume>, <fpage>352</fpage>&#x2013;<lpage>363</lpage>. <pub-id pub-id-type="doi">10.1016/j.enbuild.2018.01.034</pub-id> </citation>
</ref>
<ref id="B23">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sachin</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Saibal</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Ram</surname>
<given-names>P.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Intra ELM variants ensemble based model to predict energy performance in residential buildings</article-title>. <source>Sustain. Energy Grids Netw.</source> <volume>16</volume>, <fpage>177</fpage>&#x2013;<lpage>187</lpage>. <pub-id pub-id-type="doi">10.1016/j.segan.2018.07.001</pub-id> </citation>
</ref>
<ref id="B24">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Shi</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Xu</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>R.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Deep learning for household load forecasting&#x2014;A novel pooling deep RNN</article-title>. <source>IEEE Trans. Smart Grid</source> <volume>9</volume>, <fpage>5271</fpage>&#x2013;<lpage>5280</lpage>. <pub-id pub-id-type="doi">10.1109/TSG.2017.2686012</pub-id> </citation>
</ref>
<ref id="B25">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Srivastava</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Hinton</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Krizhevsky</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Sutskever</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Salakhutdinov</surname>
<given-names>R.</given-names>
</name>
</person-group> (<year>2014</year>). <article-title>Dropout: A simple way to prevent neural networks from overfitting</article-title>. <source>J. Mach. Learn. Res.</source> <volume>15</volume>, <fpage>1929</fpage>&#x2013;<lpage>1958</lpage>. </citation>
</ref>
<ref id="B26">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Tan</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>De</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Lin</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Yang</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Huang</surname>
<given-names>L.</given-names>
</name>
<etal/>
</person-group> (<year>2019</year>). <article-title>Combined electricity-heat-cooling-gas load forecasting model for integrated energy system based on multi-task learning and least square support vector machine</article-title>. <source>J. Clean. Prod.</source> <volume>248</volume>, <fpage>119252</fpage>. <pub-id pub-id-type="doi">10.1016/j.jclepro.2019.119252</pub-id> </citation>
</ref>
<ref id="B28">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wang</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Lee</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Yuen</surname>
<given-names>R.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Novel dynamic forecasting model for building cooling loads combining an artificial neural network and an ensemble approach</article-title>. <source>Appl. Energy</source> <volume>228</volume>, <fpage>1740</fpage>&#x2013;<lpage>1753</lpage>. <pub-id pub-id-type="doi">10.1016/j.apenergy.2018.07.085</pub-id> </citation>
</ref>
<ref id="B29">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wang</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Gu</surname>
<given-names>Q.</given-names>
</name>
</person-group> (<year>2020a</year>). <article-title>Multi-energy load forecasting for regional integrated energy systems considering temporal dynamic and coupling characteristics</article-title>. <source>Energy</source> <volume>195</volume>, <fpage>116964</fpage>. <pub-id pub-id-type="doi">10.1016/j.energy.2020.116964</pub-id> </citation>
</ref>
<ref id="B30">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wang</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Lee</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Huang</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Szabados</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Olinda</surname>
<given-names>P.</given-names>
</name>
</person-group> (<year>2016</year>). <article-title>Factors that impact the accuracy of clustering-based load forecasting</article-title>. <source>IEEE Trans. Ind. Appl.</source> <volume>52</volume>, <fpage>3625</fpage>&#x2013;<lpage>3630</lpage>. <pub-id pub-id-type="doi">10.1109/TIA.2016.2558563</pub-id> </citation>
</ref>
<ref id="B31">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wang</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Zhao</surname>
<given-names>Q.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Fu</surname>
<given-names>L.</given-names>
</name>
</person-group> (<year>2020b</year>). <article-title>A multi-energy load prediction model based on deep multi-task learning and ensemble approach for regional integrated energy systems</article-title>. <source>Int. J. Electr. Power &#x26; Energy Syst.</source> <volume>126</volume>, <fpage>106583</fpage>. <pub-id pub-id-type="doi">10.1016/j.ijepes.2020.106583</pub-id> </citation>
</ref>
<ref id="B32">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wang</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Tian</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Mary</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>Predicting city-scale daily electricity consumption using data-driven models</article-title>. <source>Adv. Appl. Energy</source> <volume>2</volume>, <fpage>100025</fpage>. <pub-id pub-id-type="doi">10.1016/j.adapen.2021.100025</pub-id> </citation>
</ref>
<ref id="B33">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yan</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Bie</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Urgun</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Singh</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Xie</surname>
<given-names>L.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>A reliability model for integrated energy system considering multi-energy correlation</article-title>. <source>J. Mod. Power Syst. Clean. Energy</source> <volume>9</volume>, <fpage>811</fpage>&#x2013;<lpage>825</lpage>. <pub-id pub-id-type="doi">10.35833/MPCE.2020.000301</pub-id> </citation>
</ref>
<ref id="B34">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yang</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Shi</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Song</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>Z.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>A combined deep learning load forecasting model of single household resident user considering multi-time scale electricity consumption behavior</article-title>. <source>Appl. Energy</source> <volume>307</volume>, <fpage>118197</fpage>. <pub-id pub-id-type="doi">10.1016/j.apenergy.2021.118197</pub-id> </citation>
</ref>
<ref id="B35">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ye</surname>
<given-names>Q.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Guo</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Huang</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Yang</surname>
<given-names>B.</given-names>
</name>
</person-group> (<year>2022</year>). <article-title>A power load prediction method of associated industry chain production resumption based on multi-task LSTM</article-title>. <source>Energy Rep.</source> <volume>8</volume>, <fpage>239</fpage>&#x2013;<lpage>249</lpage>. <pub-id pub-id-type="doi">10.1016/j.egyr.2022.01.110</pub-id> </citation>
</ref>
<ref id="B36">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yu</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Megumi</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Yasuhiro</surname>
<given-names>H.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>Deep reservoir architecture for short-term residential load forecasting: An online learning scheme for edge computing</article-title>. <source>Appl. Energy</source> <volume>298</volume>, <fpage>117176</fpage>. <pub-id pub-id-type="doi">10.1016/j.apenergy.2021.117176</pub-id> </citation>
</ref>
<ref id="B46">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yu</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Tian</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Yan</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Zhao</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Wallin</surname>
<given-names>F.</given-names>
</name>
<etal/>
</person-group> (<year>2022</year>). <article-title>Improved triangle splitting based bi-objective optimization for community integrated energy systems with correlated uncertainties</article-title>. <source>Sustain. Energy Technol. Assess.</source> <volume>49</volume>, <fpage>101682</fpage>. <pub-id pub-id-type="doi">10.1016/j.seta.2021.101682</pub-id> </citation>
</ref>
<ref id="B37">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yu</surname>
<given-names>Q.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>Z.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>Correlated load forecasting in active distribution networks using spatial-temporal synchronous graph convolutional networks</article-title>. <source>IET Energy Syst. Integr.</source> <volume>3</volume>, <fpage>355</fpage>&#x2013;<lpage>366</lpage>. <pub-id pub-id-type="doi">10.1049/esi2.12028</pub-id> </citation>
</ref>
<ref id="B38">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zang</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Xu</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Cheng</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Ding</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Wei</surname>
<given-names>Z.</given-names>
</name>
<etal/>
</person-group> (<year>2021</year>). <article-title>Residential load forecasting based on LSTM fusing self-attention mechanism with pooling</article-title>. <source>Energy</source> <volume>229</volume>, <fpage>120682</fpage>. <pub-id pub-id-type="doi">10.1016/j.energy.2021.120682</pub-id> </citation>
</ref>
<ref id="B39">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhang</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Bai</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>Y.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>Short-time multi-energy load forecasting method based on CNN-Seq2Seq model with attention mechanism</article-title>. <source>Mach. Learn. Appl.</source> <volume>5</volume>, <fpage>100064</fpage>. <pub-id pub-id-type="doi">10.1016/j.mlwa.2021.100064</pub-id> </citation>
</ref>
<ref id="B40">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhang</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Shi</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Xu</surname>
<given-names>C.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Electricity, heat, and gas load forecasting based on deep multitask learning in industrial-park integrated energy system</article-title>. <source>Entropy</source> <volume>22</volume>, <fpage>1355</fpage>. <pub-id pub-id-type="doi">10.3390/e22121355</pub-id> </citation>
</ref>
<ref id="B41">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhang</surname>
<given-names>Q.</given-names>
</name>
<name>
<surname>Wu</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Ma</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Ma</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>C.</given-names>
</name>
</person-group> (<year>2022</year>). <article-title>Short-term load forecasting method with variational mode decomposition and stacking model fusion</article-title>. <source>Sustain. Energy Grids Netw.</source> <volume>30</volume>, <fpage>100622</fpage>. <pub-id pub-id-type="doi">10.1016/j.segan.2022.100622</pub-id> </citation>
</ref>
<ref id="B42">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhang</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Yang</surname>
<given-names>Q.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>An overview of multi-task learning</article-title>. <source>Natl. Sci. Rev.</source> <volume>5</volume>, <fpage>30</fpage>&#x2013;<lpage>43</lpage>. <pub-id pub-id-type="doi">10.1093/nsr/nwx105</pub-id> </citation>
</ref>
<ref id="B43">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhao</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Xiong</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Yu</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Bu</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Zhao</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Yan</surname>
<given-names>J.</given-names>
</name>
<etal/>
</person-group> (<year>2022</year>). <article-title>Reliability evaluation of community integrated energy systems based on fault incidence matrix</article-title>. <source>Sustain. Cities Soc.</source> <volume>80</volume>, <fpage>103769</fpage>. <pub-id pub-id-type="doi">10.1016/j.scs.2022.103769</pub-id> </citation>
</ref>
<ref id="B44">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhou</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Meng</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Huang</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Deng</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Huang</surname>
<given-names>S.</given-names>
</name>
<etal/>
</person-group> (<year>2020</year>). <article-title>Multi-energy net load forecasting for integrated local energy systems with heterogeneous prosumers</article-title>. <source>Int. J. Electr. Power &#x26; Energy Syst.</source> <volume>126</volume>, <fpage>106542</fpage>. <pub-id pub-id-type="doi">10.1016/j.ijepes.2020.106542</pub-id> </citation>
</ref>
</ref-list>
</back>
</article>