Linking Design and Operation Phase Energy Performance Analysis Through Regression-Based Approaches

The reduction of energy usage and environmental impact of the built environment and construction industry is crucial for sustainability on a global scale. We are working towards an increased commitment towards resource efficiency in the built environment and to the growth of innovative businesses following circular economy principles. The conceptualization of change is a relevant part of energy and sustainability transitions research, which is aimed at enabling radical shifts compatible with societal functions. In this framework, building performance has to be considered in a whole life cycle perspective because buildings are long-term assets. In a life cycle perspective, both operational and embodied energy and carbon emissions have to be considered for appropriate comparability and decision-making. The application of sustainability assessments of products and practices in the built environment is itself a critical and debatable issue. For this reason, the way energy consumption data are measured, processed, and reported has to be progressively standardized in order to enable transparency and consistency of methods at multiple scales (from single buildings up to building stock) and levels of analysis (from individual components up to systems), ideally complementing ongoing research initiatives that use open science principles in energy research. In this paper, we analyse the topic of linking design and operation phase’s energy performance analysis through regression-based approaches in buildings, highlighting the hierarchical nature of building energy modelling data. The goal of this research is to review the current state of the art of in order to orient future efforts towards integrated data analysis workflows, from design to operation. In this sense, we show how data analysis techniques can be used to evaluate the impact of both technical and human factors. Finally, we indicate how approximated physical interpretation of regression models can help in developing data-driven models that could enhance the possibility of learning from feedback and reconstructing building stock data at multiple levels.


INTRODUCTION
It is widely acknowledged that a lower environmental impact from the construction industry and built environment is crucial for sustainability and that this problem has to be tackled on a global scale (Berardi, 2017). Carbon emission reduction goals (i.e., decarbonisation) require pressing needs be met, such as increasing energy efficiency in end-uses, reducing demand, and providing a relevant quota of energy supply by renewable sources. Energy efficiency paradigms that consider the entire building life cycle performance (Berardi, 2018) are emerging both for new and existing buildings (e.g., Nearly Zero Energy Buildings, or NZEBs) (D'Agostino et al., 2016). At the EU level, we are working towards an increasing commitment towards resource efficiency in the built environment (Dodd et al., 2015) and the growth of innovative business opportunities following circular economy principles (McKinsey, 2014), because buildings are long-term assets. Built environment sustainability strategies can be inscribed within the more general (and rapidly growing) field of sustainability transitions research (Köhler et al., 2019). Sustainability transitions research focuses on the conceptualization of radical changes that have to be compatible with societal functions.
For the construction industry, a conceptualization is proposed by Thuesen et al. (2016), identifying three generic knowledge domains: project, product, and service. Further, methods such as life cycle assessment are fundamental for the development of innovative economic paradigms, such as Circular Economy, in the built environment , but the assessment of the sustainability of products and practices through life cycle assessment is itself a critical issue. In fact, there could be a large variability in the way this method is currently used in practice (Pomponi and Moncaster, 2018), creating difficulties in terms of transparent comparability of results and performance benchmarking. From a life cycle perspective, the environmental impact of buildings should account for both embodied (products and construction practices) and operational energy consumptions. Both the measurement of embodied energy and carbon equivalent of buildings (De Wolf et al., 2017) and operational energy consumption (de Wilde, 2017;Imam et al., 2017) represent critical (and debatable) issues that can ultimately determine a relevant "performance gap." Finally, energy transition strategies have to address complementarities (Markard and Hoffmann, 2016) which are crucial for the co-evolution of built environment and energy infrastructures (Junker et al., 2018;Dominković et al., 2020). In this framework, empirically grounded and tested methods that can help in standardizing the way energy consumption data are measured, processed, and reported are particularly valuable, because they can provide reliable evidence, inform policies, and support decision-making processes adequately. In the next sections, some introductory examples in this sense will be given, indicating the motivation for the review work.

BACKGROUND AND MOTIVATION
The energy modelling research community at present is emphasizing the fundamental importance of open energy data and models (Pfenninger et al., 2017;Pfenninger et al., 2018), and we can envision an evolution towards systems of models (Bollinger et al., 2018) created to tackle fundamental problems in energy transitions, eventually exploiting soft-linking approaches (Deane et al., 2012;Dominković et al., 2020). The way energy consumption data are measured, processed, and reported can be standardized in order to enable transparency and consistency of methods at multiple scales and levels of analysis, as well as to complement initiatives applying open science principles in energy research (Openmod;Hilpert et al., 2018). Some research projects in this direction have been developed in recent years (CalTRACK; Jayaweera et al., 2013;Miller and Meggers, 2017;Firth et al., 2018).
The methods proposed in these projects are empirically grounded and could be useful in providing evidence regarding innovative practices, services, and technologies. As an example, providing robust and empirically tested methods is fundamental if we think about issues such as "re-bound" (Herring and Roy, 2007) and "pre-bound" effects related to energy efficiency practices (Sunikka-Blank and Galvin, 2012;Rosenow and Galvin, 2013), that are determining a substantial difference between simulated and measured performance.
Further, surrogate models are flexible and can be used to link design and operation phase performance analysis (Allard et al., 2018;Tronchin et al., 2018a) in an integrated workflow (Manfren and Nastasi, 2020). The choice of a specific surrogate modelling technique depends on multiple factors (Koulamas et al., 2018;Østergård et al., 2018). In this review work, we focus on regression-based approaches that can be used from design to operation phase energy analysis. The basic reasons for this choice are its conceptual simplicity, availability of technical standardization (ISO, 2013;ASHRAE, 2014), availability of technical guidelines and protocols for field applications (EVO, 2003;FEMP, 2008), and availability of open software (CalTRACK; Paulus, 2017). Other fundamental aspects, that will be discussed later in more detail, include the scalability of building stock analysis (Meng et al., 2020), the affinity with variable-base degree days methods (Kohler et al., 2016;Meng and Mourshed, 2017), and the possibility to perform analysis at a utility scale (Acquaviva et al., 2015) or a city scale (Qomi et al., 2016), including the assessment of the impact of users' behaviour (Oh et al., 2020). Finally, the possibility to analyse thermal, electrical, and fuel demands with these methods opens up new possibilities in terms of model soft-linking, for example, by using regression-based approaches to create energy demand scenarios (e.g., considering technological evolution, climate change, behavioural change, etc.) in multi-commodity systems models (Adhikari et al., 2012a;Manfren, 2012;Kraning et al., 2014;Dorfner, 2016;Nastasi, 2019), thereby supplementing open science oriented approaches in energy research (Hilpert et al., 2018).

LINKING DESIGN AND OPERATIONAL ENERGY PERFORMANCE ANALYSIS AT SCALE
Building performance can be studied by means of Key Performance Indicators (KPIs) (Ghaffarianhoseini et al., 2016;Kylili et al., 2016;Yoshino et al., 2017;Talele et al., 2018) that aggregate a larger set of data in a single representative quantity. In energy transition strategies for the built environment, the efficiency dimension plays a key role (Abu Bakar et al., 2015). From a techno-economic perspective (Fabbri et al., 2011;Aste et al., 2013;Corgnati et al., 2013;Tronchin et al., 2014) the analysis is generally concentrated on the trade-off between performance indicators, such as primary energy demand and investment and operation costs (Ferrara et al., 2018), following definitions from technical standardization (ISO, 2017). However, in order to enable an effective performance assessment, additional indicators have to be considered regarding IEQ (Fabbri and Tronchin, 2015;, RES on-site generation and self-consumption (Kurnitski, 2013), load matching, and grid interaction (Voss et al., 2010;Frontini et al., 2012). Further, indicators are essential to address innovative solutions (Mancini and Nastasi, 2019) and operation strategies for buildings aimed at flexibility (Clauß et al., 2017) and relevant problems, such as softlinking simulation models with energy planning tools (Noussan and Nastasi, 2018;Dominković et al., 2020). Indeed, the scalability of analysis methods and tools up to district (Adhikari et al., 2012b), city (Cipriano et al., 2017), and regional scales (Aste et al., 2014;Kuster et al., 2019) is fundamental to ensure the credibility of policies and practices.
For all these reasons, KPIs are essential to address building performance problems in a wider sense (de Wilde, 2018), especially when the goal is understanding performance uncertainty and variability. These issues can be addressed, from a computational perspective, by means of a parametric and probabilistic analysis, used as an exploratory tool and for optimization purposes (Tronchin et al., 2016;Østergård et al., 2020). The importance of accounting for multiple performance scenarios (Shiel et al., 2018), considering the impact of both technical and human factors (Yoshino et al., 2017), is becoming evident, and the Design Of Experiments technique is being used in many cases for building energy performance simulations (Jaffal et al., 2009;Kotireddy et al., 2018;Schlueter and Geyer, 2018;Tronchin et al., 2018a). With respect to human factors influencing performance, generally occupants' comfort preferences and behaviours (Menezes et al., 2012;Tagliabue et al., 2016;Gaetani et al., 2018) are overlooked, even though they can create a relevant gap between simulated and measured performance. This fact can clearly undermine the effectiveness of policies that have to propose techno-economically feasible solutions and, at the same time, consider human behaviour realistically (Herring and Roy, 2007;Rosenow and Galvin, 2013).
In conclusion, understanding the variability of building energy performance outcomes, both in design phase simulation and in actual operation, requires the definition of appropriate KPIs and of parametric/probabilistic analysis strategies, involving multiple input-output combinations. These analyses can be performed by means of reduced order models, following the argumentation reported in Introduction and Background and Motivation regarding the necessity to address multiple problems and the creating methods that are computationally efficient; in the subsequent sections, we will discuss data and modelling strategies in this direction. In Hierarchical Structure of Building Energy Modelling Data, we analyse the hierarchical structure of building performance data, while in Regression Models in Design Phase Analysis and Regression Models in Operational Phase Analysis we propose examples of regression models for design phase and operational phase analysis.

Hierarchical Structure of Building Energy Modelling Data
Building energy modelling data can be organized using a hierarchical structure to improve the level of transparency in modelling and the possibility to achieve reliable results. Examples of hierarchical structures in building energy modelling data can be found in EU legislation regarding the definition of costoptimal levels of performance (European Commission, 2012) and in the EU Building Stock Observatory web portal (Arcipowska et al., 2016). In the United States, technical standardization was tested with the definition of reference building models (Deru et al., 2011;Goel et al., 2014), considering also costs of different technological options (Thornton et al., 2011). Further, the use of hierarchical structures in datasets for building energy modelling can be found, for example, in studies about performance gaps (Imam et al., 2017), automation systems' efficiency (Aste et al., 2017), and occupancy modelling (Gaetani et al., 2016). Additionally, we can find examples of hierarchical data in multi-level calibration frameworks (Yang and Becerik-Gerber, 2015) and in other cases where macro-parameters (Calleja Rodríguez et al., 2013) are used to facilitate and guide the uncertainty and sensitivity analysis, or where they are used to support the definition of archetypes (Korolija et al., 2013). In turn, the definition of archetypes is essential for reference building analysis (statistically representative buildings) and for parametric studies, where an appropriate level of simplification is crucial to enable a correct analysis process. Examples in this sense can be found for the simulation of simplified building models (Pernigotto et al., 2014), for city scale modelling (Delmastro et al., 2016;Dogan et al., 2016;Dogan and Reinhart, 2017;Ghiassi and Mahdavi, 2017), and for building stock modelling (Ballarini and Corrado, 2017). An explicative example of the hierarchical structure of data in building energy modelling is reported in Table 1. The table is by no means exhaustive, but aims to outline a way to organize building energy modelling data in a transparent and possibly reproducible way, using current technical standardization and validated simulation software as a reference.
Alternatively, the hierarchy of building energy modelling data can also be conceived for the "vertical integration" of information, from the user level up to energy infrastructures, with a subdivision in levels such as meter, building, building zones, individual rooms, individual spaces within the room, and user. Examples of research in this direction can be found in international research initiatives on "Energy Flexibility in Buildings" (IEA, 2018a) and "Occupant-Centric Building Design and Operation" (IEA, 2018b). Further research developments in this direction appear particularly promising for Internet of Things applications (Breiner et al., 2016;Reka and Dragicevic, 2018).

Regression Models in Design Phase Analysis
In Hierarchical Structure of Building Energy Modelling Data, the multi-level nature of building data was illustrated using examples from the current state of the art. Following the argumentation reported at the beginning of Linking Design and Operational Energy Performance Analysis at Scale, we can understand how linking design and operation phase performance analysis using standardized and consistent methods is crucial to enable integrated data analysis workflows, from design to operation. In this section, we illustrate some examples of regression models used in design phase's performance analysis, which makes use of datasets that partially overlap with the ones reported in Table 1. First of all, there are many examples of applications of regression models for early stage design evaluation (Catalina et al., 2008;Hygh et al., 2012;Asadi et al., 2014;Al Gharably et al., 2016;Ipbüker et al., 2016), also using a Design of Experiments approach (Jaffal et al., 2009) to create multiple input combinations in a rational way. Further, we can find examples of application of energy signatures analysis from design to operation (Allard et al., 2018;Tronchin et al., 2018a), leading to a continuity in the data analysis workflow. Additionally, we can find examples of this type of model for techno-economic analysis (using Life Cycle Cost as a KPI) based on optimization techniques (Aparicio-Ruiz et al., 2019), and Data Envelopment Analysis (Kavousian and Rajagopal, 2013). Finally, we can find an example of regression models used for Energy Performance Contracting (EPC) to control operation costs using models trained on parametric simulations (Ligier et al., 2017), thereby creating a continuity between design and operational phase analysis. A summary of the topics and sub-topics emerging from the review of regression-based approaches for design phase analysis is reported in Table 2.
While many of the research works reported above focus on design phase analysis and are, therefore, based on models trained on simulation data, the applicability of these models has also been shown for the analysis of energy performance databases (Walter and Sohn, 2016), considering the actual energy consumption data collected from surveys.

Regression Models in Operational Phase Analysis
As anticipated, regression modelling approaches have been chosen because of their standardization (ISO, 2013; ASHRAE, 2014) and the fact that they are adopted in (M&V) protocols (EVO, 2003;FEMP, 2008), where specific thresholds (expressed as statistical KPIs) are given for the acceptability of models (Fabrizio and Monetti, 2015). Therefore, they represent empirically grounded and tested methods that can be successfully used to provide and document evidence of results of energy efficiency measures (Mathieu et al., 2011;Jayaweera et al., 2013), providing results that are normalized with respect to weather and operational conditions (Price, 2010). They are based on energy interval data (dependent variable) and weather data (independent variables), together with other independent variables that may be derived from contextual information. External air temperature is the most important independent variable, used for weather normalization of energy consumption (Masuda and Claridge, 2014;Lin and Claridge, 2015;Westermann et al., 2020). Additionally, rather than using energy data directly, we can transform them to derive the average power, called energy signature (ISO, 2013), over the amount of operating hours in the time interval considered. Scalability constitutes one of the essential pre-requisites for the applicability of these methods at scale and we consider in this section both temporal and spatial scalability. On the one hand, with respect to temporal scalability, regression-based approaches can be used with monthly, daily, and hourly energy interval data and weather data. Monthly data are the most easily accessible (e.g., utility bills or periodic meter readings) and they can used for multiple purposes, such as targeting energy savings from single buildings up to utility scale and establishing priorities (Hallinan et al., 2011b) and for recommissioning and refurbishment interventions (Hallinan et al., 2011a). Further, they can be used to estimate energy savings in industrial buildings (Server et al., 2011) and to perform the disaggregation of weatherdependent and production-dependent energy consumption in industrial facilities . Finally, they can be used as a basis to measure energy efficiency progress with Normalized Energy Intensity (Lammers et al., 2011), using a methodology compatible with energy management systems standardization (ISO, 2018). Regression with daily data extends these capabilities further (Masuda and Claridge, 2014;Lin and Claridge, 2015), giving the possibility to include an autoregressive term in the model formulation to improve goodness of fit (Masuda and Claridge, 2012b;Danov et al., 2013). Finally, hourly models represent an even more detailed formulation (Jalori and Reddy, 2015b;Abushakra and Paulus, 2016), which can be used effectively to understand dynamic patterns of energy demand to, for example, optimize operation interval and control strategies, considering issues such as dynamic energy tariffs and interactions with the grid and onsite generation.
On the other hand, in terms of spatial scalability, we can see how regression-based approaches can be used to model the performance of construction technologies, considering building fabric heat transfer (Bauwens and Roels, 2014;Erkoreka et al., 2016;Giraldo-Soto et al., 2018;Uriarte et al., 2019), or whole building energy behaviour (Masuda and Claridge, 2014;Lin and Claridge, 2015;Paulus et al., 2015). Going beyond single buildings, we can find examples of applications regarding building stock (Meng and Mourshed, 2017;Meng et al., 2020) and community and city scale systems (Qomi et al., 2016;Pasichnyi et al., 2019), considering also complex interactions with the urban environment and physicalstatistical interpretation of models (Afshari et al., 2017). A summary of the topics and sub-topics emerging from the review of regression-based approaches for operation phase analysis is reported in Table 3.
As shown above, regression-based approaches are temporally and spatially scalable and can be employed for different types of end-uses and for different aggregations of users, making them suitable for analytics aimed at energy retrofit (Pistore et al., 2019) and decarbonisation , considering performance variability due to realistic operational patterns Oh et al., 2020) or the impact of properties of construction components such as thermal inertia (Aste et al., 2015).
We already mentioned conceptual simplicity as one of the advantages of this type of model compared to other metamodelling techniques . Additionally, given the standard structure of the basic models, automated or partially automated model selection techniques (Paulus et al., 2015;Paulus, 2017) can be applied to compare the performance of multiple modelling options. Clearly, the presence of different operating conditions in time (e.g., different types of operational profiles) may determine the need to cluster operational conditions on a daily basis (Jalori and Reddy, 2015a;Miller et al., 2015;Richard et al., 2017). The need for integration of clustering and regression in data analysis workflow is also indicated in recent literature reviews regarding energy transition strategies for the built environment (Tronchin et al., 2018b). 2 | Regression-based approaches for design phase performance assessment and techno-economic analysis.

FURTHER RESEARCH
In the previous sections we illustrated how a regression-based approach can be used to analyse performance from design to operation phase in the building life cycle using standardized methods that are conceptually simple, scalable (temporally and spatially), and easily interpretable. Future research efforts should be oriented, first of all, to the exploitation of the approximated physical interpretation of regression model coefficients (Masuda and Claridge, 2012a;Bauwens and Roels, 2014;Tronchin et al., 2016. Indeed, this interpretation depends on the formulation of an approximated physical model, which can be created, for example, according to current technical standardization (ISO, 2017). This, in turn, could enable a harmonized definition of quantities and methods at multiple levels . Other relevant issues to be considered are Monte Carlo simulation methods to test the robustness of models' estimates with respect to variable operational conditions (Cecconi et al., 2017) and Bayesian analysis as an extension of conventional regression paradigms (Li et al., 2016). In particular, Bayesian analysis can be used to reconstruct built environment data (Booth et al., 2013;Zhao et al., 2016;Lim and Zhai, 2017), considering the hierarchical data structure outlined in Hierarchical Structure of Building Energy Modelling Data. Further, regression-based approaches could become suitable for projections about energy consumption in future climate change scenarios (Jentsch et al., 2008;Jentsch et al., 2013;Bravo Dias et al., 2020) and to create load profiles when designing decentralized energy systems from buildings (Stadler et al., 2018) up to community scales (Adhikari et al., 2012a;Orehounig et al., 2014;Orehounig et al., 2015), also using clustering techniques to identify typical (recurrent) operational conditions.
In terms of technologies, regression-based approaches can complement the analysis of performance of technologies such as heat pumps and cooling machines (Busato et al., 2012;Busato et al., 2013), considering also exergy balance (Tronchin and Fabbri, 2008;Meggers et al., 2012), where temperature dependence is fundamental.
In conclusion, standardized and harmonized regression-based approaches can be used to complement recent advances in research regarding end-use energy demand based on epidemiology concepts (Hamilton et al., 2013;Hamilton et al., 2017), providing suitable evidence (Jack et al., 2018;Lomas et al., 2018) aimed at informing decision-making processes and future policies by means of robust and empirically grounded methods.

CONCLUSION
Understanding and conceptualizing innovation processes is crucial to respond to global sustainability issues for the built environment. Research has to be able to address the whole building life cycle, assessing the sustainability of products and practices transparently and consistently. The correct assessment of operational and embodied energy and carbon emission depends critically on the way energy data are measured, processed, and reported. The methods to perform energyrelated data analysis workflows during building life cycle have to be progressively standardized and harmonized in order to enable transparency and consistency at multiple scales, from single components up to city scale and building stocks, and levels of analysis, from individual components up to systems. In other words, the evolution of methods should be aimed at creating continuity between energy performance analysis across life cycle phases (e.g., by means of data-driven model based analysis and linked open data standards), using parametric simulations in design phases and progressively calibrating building models to measured data. In this way, parametric data generated in the design phase can be analysed to detect the most relevant factors influencing performance and potentially critical assumptions, while (multi-level) model calibration can help in deriving insights on the actual performance and show transparently any potential misalignment between simulation assumptions and measured values. On the one hand, the possibility to learn constantly from measured performance could help in providing robust evidence of the impact of innovative products and practices. On the other hand, the possibility to exploit an approximated physical interpretation of regression model structure could greatly enhance the interpretability and explainability of data-driven methods, learning from feedback to enhance the performance of both single technologies and systems. The goal of this research was mapping the ongoing research in these areas, with a focus on

Topic Sub topic References
Temporal Monthly Hallinan et al., 2011a, Hallinan et al., 2011bLammers et al., 2011;Server et al., 2011) Daily (Masuda and Claridge, 2012b;Danov et al., 2013;Masuda and Claridge, 2014;Paulus et al., 2015;Hitchin and Knight, 2016;Paulus, 2017) Hourly (Jalori and Reddy, 2015b;Abushakra and Paulus, 2016) Spatial Building fabric heat transfer (Bauwens and Roels, 2014;Erkoreka et al., 2016;Giraldo-Soto et al., 2018;Uriarte et al., 2019) Building energy behaviour (Masuda and Claridge, 2014;Lin and Claridge, 2015;Paulus et al., 2015) Building stock energy behaviour (Meng and Mourshed, 2017;Meng et al., 2020) Community and city scale analysis energy behaviour (Qomi et al., 2016;Pasichnyi et al., 2019) Frontiers in Energy Research | www.frontiersin.org November 2020 | Volume 8 | Article 557649 regression-based approaches that could be used to create scalable (temporally and spatially) integrated data analysis workflows from design to operation in buildings. In this sense, we showed how data analysis techniques could be used to evaluate the impact of both technical and human factors, with the aim of reconstructing building stock data at multiple levels. In turn, these data could be used for the development of next-generation products and services in the built environment, following a continuous improvement approach, which is already recommended (and standardized) in energy management practices at the state of the art. In particular, these methods will be crucial for the further development of innovative building design paradigms (e.g., NZEBs) and will be necessary for the development of innovative energy services (e.g., exploiting energy flexibility and considering user behaviour) and technologies (e.g., energy management and automation systems). Finally, energy transition strategies have to address "complementarities" for the co-evolution of built environment and energy infrastructures, and the possibility to create an improved "soft-linking" among open energy modelling tools (and data) is particularly valuable, because they can provide reliable evidence, inform policies, and support decision-making processes.