Progress Update and Challenges on V.O2max Testing and Interpretation

The maximal oxygen uptake (V.O2max) is the primary determinant of endurance performance in heterogeneous populations and has predictive value for clinical outcomes and all-cause mortality. Accurate and precise measurement of V.O2max requires the adherence to quality control procedures, including combustion testing and the use of standardized incremental exercise protocols with a verification phase preceded by an adequate familiarization. The data averaging strategy employed to calculate the V.O2max from the breath-by-breath data can change the V.O2max value by 4–10%. The lower the number of breaths or smaller the number of seconds included in the averaging block, the higher the calculated V.O2max value with this effect being more prominent in untrained subjects. Smaller averaging strategies in number of breaths or seconds (less than 30 breaths or seconds) facilitate the identification of the plateau phenomenon without reducing the reliability of the measurements. When employing metabolic carts, averaging intervals including 15–20 breaths or seconds are preferable as a compromise between capturing the true V.O2max and identifying the plateau. In training studies, clinical interventions and meta-analysis, reporting of V.O2max in absolute values and inclusion of protocols and the averaging strategies arise as imperative to permit adequate comparisons. Newly developed correction equations can be used to normalize V.O2max to similar averaging strategies. A lack of improvement of V.O2max with training does not mean that the training program has elicited no adaptations, since peak cardiac output and mitochondrial oxidative capacity may be increased without changes in V.O2max.

The maximal oxygen uptake (VO 2max ) is the primary determinant of endurance performance in heterogeneous populations and has predictive value for clinical outcomes and all-cause mortality. Accurate and precise measurement ofVO 2max requires the adherence to quality control procedures, including combustion testing and the use of standardized incremental exercise protocols with a verification phase preceded by an adequate familiarization. The data averaging strategy employed to calculate theVO 2max from the breath-by-breath data can change theVO 2max value by 4-10%. The lower the number of breaths or smaller the number of seconds included in the averaging block, the higher the calculatedVO 2max value with this effect being more prominent in untrained subjects. Smaller averaging strategies in number of breaths or seconds (less than 30 breaths or seconds) facilitate the identification of the plateau phenomenon without reducing the reliability of the measurements. When employing metabolic carts, averaging intervals including 15-20 breaths or seconds are preferable as a compromise between capturing the trueVO 2max and identifying the plateau. In training studies, clinical interventions and meta-analysis, reporting ofVO 2max in absolute values and inclusion of protocols and the averaging strategies arise as imperative to permit adequate comparisons. Newly developed correction equations can be used to normalizeVO 2max to similar averaging strategies. A lack of improvement ofVO 2max with training does not mean that the training program has elicited no adaptations, since peak cardiac output and mitochondrial oxidative capacity may be increased without changes inVO 2max .

Relevance ofVO 2max Determination in Performance and Health
The maximal oxygen uptake (VO 2max ) is the highest flow of oxygen (O 2 ) that can be used by the organism, representing the integrated capacity of the pulmonary, cardiovascular and muscle systems to take up, transport and utilize O 2. Classically, the determination of theVO 2max has been considered the gold standard for the assessment of the functional limits of the cardiorespiratory system (Taylor et al., 1955;Mitchell et al., 1958;Di Prampero, 2003;Levine, 2008). TheVO 2max is one of the main factors associated with endurance performance (Joyner and Coyle, 2008) in various exercise modalities (Magel and Faulkner, 1967;Holmér et al., 1974;Maughan and Leiper, 1983). When a wide range of performances is included, VO 2max is the single best predictor of performance as shown in trained non-elite athletes, in whomVO 2max explained 90.2% of the total variance in a 16 km running time trial (McLaughlin et al., 2010). AlthoughVO 2max does not predict performance in homogeneous groups of athletes (i.e., elite level), an exceptionally highVO 2max constitutes a prerequisite to compete at world-class level (Joyner and Coyle, 2008;Losnegard et al., 2013).
The terms cardiorespiratory fitness andVO 2max are used interchangeably, particularly in epidemiological studies. The assessment ofVO 2max is gaining interest in clinical populations since it constitutes the strongest predictor of all-cause mortality when compared with other risk factors such as hypertension, smoking, obesity and diabetes (Myers et al., 2002;Lee et al., 2010). Furthermore,VO 2max is one of the objective criteria used to select candidates for heart transplantation (Mehra et al., 2016) and predicts the success of thoracic surgery (Brunelli et al., 2014). Thus, both in sport science and clinical medicine fields, there is a necessity for an accurate and precise determination ofVO 2max .
ASSESSMENT OFVO 2max : EVOLUTION OF PROCEDURES AND THE PLATEAU PHENOMENON Hill and Lupton (1923) are credited for performing the first assessments ofVO 2max . Years later, Taylor et al. (1955) proposed standardized procedures to determineVO 2max , which were based on discontinuous protocols using three-min constantintensity exercise periods carried out on subsequent days. At that time,VO 2 measurements were based on the use of Douglas Bags and Tissot spirometers (Macfarlane, 2001) combined with Haldane's or Scholander's methods to obtain the concentration of O 2 , which turned to be the gold-standard method for the assessment ofVO 2max . Taylor et al. (1955) defined the "plateau" in theVO 2 /intensity relationship as an increase inVO 2 lower than 150 mL.min −1 or ∼2 mL.kg −1 .min −1 with increasing exercise intensity. Continuous exercise protocols which allow for the assessment of additional variables of cardiorespiratory fitness were developed during the years to follow (Balke and Ware, 1959;Bruce, 1971;Froelicher et al., 1974a,b). However, these continuous protocols often failed to fulfill Taylor's criteria for the plateauing effect (Froelicher et al., 1974a,b;Pollock et al., 1976).
The development of fast gas analyzers and flow sensors during the 1960s and 1970s together with the progressive miniaturization of computers resulted in the proposal of shorter continuous protocols which used the "limit of tolerance" of the subject (volitional exhaustion) as the criterion for test finalization (Whipp et al., 1981;Buchfuhrer et al., 1983). The duration of continuous protocols was judged as a critical variable for the achievement of "true" maximalVO 2max values (Buchfuhrer et al., 1983). These authors reported that durations between 8 and 17 min yielded the highestVO 2max values in five low to moderately trained subjects during treadmill and cycling ergometry, intuitively proposing 10 ± 2 min as the ideal duration. A critical factor that determines the length of the test is the rate at which intensity is increased over time or ramp slope, usually expressed in Watts/min (or speed/min). Research carried out in the last 40 years has confirmed that a similarVO 2max can be attained despite wide differences in the magnitude of the increments and duration of the tests (Zhang et al., 1991;Takaishi et al., 1992). Midgley et al. (2008) concluded that durations of 5 to 26 min could be optimal to achieveVO 2max in healthy individuals, provided that those with shorter durations are preceded by proper warm-up. Moreover, even repeated allout sprint exercise has been shown to allow for the achievement of trueVO 2max values, particularly when an active recovery is applied during the resting periods (Dorado et al., 2004;Gelabert-Rebato et al., 2018).
The utilization of gas exchange indirect calorimetry evolved rapidly from the Douglas Bag into technologies capable of examining the time course of gas exchange during exercise more comprehensively, namely mixing chambers and breathby-breath systems. However, the Douglas Bag method is still considered as the gold-standard method (Shephard, 2017). Mixing-chamber automated systems measure expired gas fractions from several breaths collected into a small chamber (Wilmore and Costill, 1974;Wilmore et al., 1976), while breathby-breath technology analyses the gas content of each breath by measuring instantaneous expiratory flow in conjunction with simultaneous tidal concentrations of exhaled gases (Beaver et al., 1973(Beaver et al., , 1981. Modern breath-by-breath metabolic carts are equipped with fast responding O 2 and CO 2 sensors and permit a precise and accurate assessment of the end-tidal O 2 and CO 2 pressures (Losa-Reyna et al., 2015). The latter can be used in conjunction with pulse oximetry for an indirect evaluation of pulmonary gas exchange, thus enhancing the utility of ergometry in clinical populations (Macfarlane, 2001;Sun et al., 2002).
Nevertheless, several technical limitations may undermine the precision and accuracy of breath-by-breath analyses if not adequately addressed (Di Prampero and Lafortuna, 1989). Yet, with state-of-the-art, well-calibrated and fast-responding metabolic carts, it is possible to closely match the levels of accuracy and precision of the Douglas bag method (Medbo et al., 2012;Nieman et al., 2013;Perez-Suarez et al., 2018;Martin-Rincon et al., 2019).

CONFIRMING THE ATTAINMENT OḞ VO 2max : FROM THE PLATEAU PHENOMENON TO THE NECESSITY OF A VERIFICATION PHASE
Despite considerable technical improvements in metabolic carts, ergometers and exercise protocols, there is yet no universal agreement on the criteria for the attainment of a "true"VO 2max . Until recent years, the most referenced criteria to confirṁ VO 2max attainment has been the presence of a plateau or leveling off inVO 2 , even though several investigations have shown that the incidence of the plateau is population-dependent (Achten et al., 2002;Huggett et al., 2005) and may not be manifested despite the attainment of a validVO 2max (Day et al., 2003;Rossiter et al., 2006). In addition to the 150 mL or 2 mL.kg −1 .min −1 criterion previously proposed by Taylor et al. (1955) for the attainment of the plateau, other cut-off values have been postulated, as the V O 2 < 80 mL.min −1 proposed by Astrand (1960) or the presence of a slope not different from zero with an increase in exercise intensity (Myers et al., 1990). However, the efficacy of these criteria to detect the plateau depends on the specific exercise protocol (Duncan et al., 1997), the population studied (Doherty et al., 2003;Beltrami et al., 2014), the subjects' experience (Gordon et al., 2015) and the breathby-breath averaging strategy (Astorino, 2009;Nolan et al., 2014;Martin-Rincon et al., 2019). For example, attainment of a plateau may require to exercise with a marked recruitment of anaerobic metabolism during 1-3 min, entailing a high perception of effort and a strong central command to avoid task failure (Torres-Peralta et al., 2016a,b). Not surprisingly, subjects achieving a plateau present 4-5% higher levels of pulmonary ventilation (an indication of higher central command activation), respiratory exchange ratio (RER), and blood lactate concentration than those not reaching a plateau (Edvardsen et al., 2014). Given the limited duration of the plateau phase, its identification is facilitated by using shorter (15, 30 s) than longer (i.e., 60 s) time-averaging strategies (Astorino, 2009;Nolan et al., 2014). This is further complicated by the influence that different exercise protocols and modes (Millet et al., 2009) may have on the highestVO 2 value attainable (Midgley et al., 2008). Consequently, when uncertainty exists regarding whether ȧ VO 2max value is real or not, the highestVO 2 recorded is namedVO 2peak . Since many individuals reach a higheṙ VO 2max value during a supramaximal rather than during an incremental exercise test, it is necessary to include a verification test few minutes after the end of the incremental test (Poole and Jones, 2017), even in clinical populations (Moreno-Cabanas et al., 2020).

RECOMMENDED PROCEDURES FOR QUALITY CONTROL OF METABOLIC CARTS
Quality control procedures are mandatory to avoid technical errors. Most advanced metabolic carts are equipped with automated calibration routines that should be followed according to the recommendations of the manufacturers and using highgrade calibration O 2 and CO 2 gases. Besides the standard automated calibration, the flow sensors should be checked regularly with calibration syringes or commercially available metabolic simulators. Proper maintenance and replacement of flowmeters, O 2 cells, Nafion tubing and any other tubing and valves should follow the recommendations of the manufacturers. In addition, combustion tests of pure fuels such as methanol, butane or propane can be used to perform an integral check on the precision and accuracy of the metabolic carts. The combustion tests have the advantage of generating heat and some moisture, similar to the combustion present in living organisms (see, for example, Perez-Suarez et al., 2018). It is also advisable to perform regular checks at low and high exercise intensities, for example, on laboratory members holding a steady physical fitness level, as an additional biological check.
Verification of the performance of ergometers and treadmills is also necessary. In addition to the recommendations made by manufacturers, cross-validation biological checks with other ergometers are recommended. For example, repeated cardiorespiratory measurements at target intensities in the same person can be used to verify the stability of physiological variables. Treadmill speed and inclination can be easily verified manually or with odometer wheels.

IMPACT OF DATA PROCESSING STRATEGIES ON THEVO 2 IMPUTED AS THEVO 2max
Until recently, not much attention had been paid to the influence of the averaging interval or strategy on theVO 2 value imputed asVO 2max and its reproducibility (Myers et al., 1990;Hill et al., 2003;Midgley et al., 2007;Astorino, 2009;Nolan et al., 2014;Smart et al., 2015;Dideriksen and Mikkelsen, 2017). Despite the inherent high variability between breaths, there is no universal consensus on how to average indirect calorimetry data (Robergs et al., 2010). We have recently demonstrated that theVO 2 value imputed asVO 2max can fluctuate between 4 and 10% depending on the averaging strategy and the fitness levels (Martin-Rincon et al., 2019; Figure 1). We have shown that in subjects with ȧ VO 2max lower than ∼40 mL/kg/min (13-40 mL/kg/min), thė VO 2max value is ∼10% higher when using shorter (i.e., 10 breaths or seconds) averaging strategies rather than longer (i.e., 60 breaths or seconds in averaging block). This effect was slightly smaller (∼6.5%) in higher fitness (VO 2max between 40 and 60 mL/kg/min) subjects. Similarly, from a short (i.e., 10 breaths or seconds) to an intermediate length in the averaging block (i.e., 30 breaths or seconds), the decrease is ∼7.5% in low-fit and ∼5% in those with higher fitness. Therefore, with shorter averaging strategies theVO 2 value considered asVO 2max is larger, with this effect being more marked in untrained subjects, regardless of the use of breath-or time-based averaging strategies (Martin-Rincon et al., 2019). The time-and breath-averagedVO 2 values produce similar results for a given length of the averaging block

PHYSIOLOGICAL INTERPRETATION OF THEVO 2max
TheVO 2max value obtained in an exercise test should be checked against normative data (Edvardsen et al., 2013;Loe et al., 2013;Kaminsky et al., 2017), taking into consideration the exercise intensity achieved at the end of the test, exercise mode, age, sex, health status, fitness level, familiarization and the averaging strategy used to obtain the reference values. Since there is a close linear association between exercise intensity andVO 2 , there are several linear equations that can be used to predict theVO 2 associated with a submaximal load as well as theVO 2max (Storer et al., 1990;Hall et al., 2004). The predicted or normative value can be confronted with the measured one, having in mind that departures up to 20% are possible (Malek et al., 2004). In the case of large differences between measured and estimated or reference values, assessment errors should be ruled out.
Since theVO 2max is a lumped parameter reflecting the flow of O 2 from the atmosphere to the mitochondria, the flow in downwards steps of the transport chain can never be higher than in the precedent step.VO 2max cannot be higher than O 2 delivery, and O 2 delivery cannot be higher than the O 2 flow in the lungs. For the correct physiological interpretation of theVO 2max , it is critical to identify which the limiting step/factor is, holding in mind that the limiting factor may change with training (Zinner et al., 2016). In other words, it is essential to find out whether the limitation is mostly due to pulmonary ventilation, pulmonary gas exchange, cardiac output, muscle blood flow, arterial O 2 content, muscle O 2 diffusion capacity or the mitochondrial capacity to utilize O 2 . In general, in healthy people, including elite athletes, the main limiting factor forVO 2max is O 2 delivery (Saltin and Calbet, 2006). Nevertheless, couch potatoes may FIGURE 2 | RepresentativeVO 2 data during an incremental exercise to exhaustion in one demonstrative subject using different averaging blocks (breaths and seconds). Calculation of averaging blocks for the breath-based strategy refers to a "rolling" breath average, while time-based data were stationary time-averaged, both with the corresponding length of the block (e.g., 10 breaths or 10 s). Data are presented as ( be limited by their capacity to use O 2 , which can be readily tested by performing an incremental exercise test in hyperoxia (Cardus et al., 1998). Optimally, during the exercise test, the arterial hemoglobin saturation should be assessed indirectly by pulse oximetry. It is important to account for the impact that blood hemoglobin concentration may have onVO 2max , and thus it is convenient to measure this variable before the tests (Calbet et al., 2002).
To facilitate the interpretation of results, theVO 2max should always be reported in absolute terms (i.e., mL.min −1 ), together with the protocol used for its assessment and the averaging strategy. TheVO 2max value adjusted for body weight has different meanings depending on the body composition, although the relative value (i.e.,VO 2max in mL.kg −1 .min −1 ) predicts better performance than the absolute value in competitions lasting from 60 to 90 s to several hours (Joyner and Coyle, 2008). In training studies and clinical interventions, the absolute value should imperatively be reported such that a mechanistic explanation for the changes inVO 2max , connected to the limiting factors, can be elaborated. Lack of improvement inVO 2max with training does not mean that the training program has failed or that it has elicited no adaptations. Actually, in elite athletes, peak cardiac output may be increased, as well as mitochondrial oxidative capacity, without changes inVO 2max , due to an impairment of pulmonary gas exchange caused by the reduction of capillary blood mean transit time in the lung (Skattebo et al., 2020a). Moreover, peripheral adaptations may enhance thė VO 2peak and O 2 extraction during small mass exercise without increasing theVO 2max measured during whole-body exercise (Skattebo et al., 2020b). Meta-analyses on the effects of an intervention onVO 2max should be based on absolute values, after taking into consideration the averaging strategies used in each study and the exercise modality, as previously discussed (Martin-Rincon et al., 2019).

FUTURE PERSPECTIVES
Although theVO 2max value varies depending on the averaging strategy, the fact that the values obtained with different averaging strategies are closely related allows the assumption that they hold a similar physiological meaning. However, the fact thatVO 2max increases log-linearly with the shortening of the averaging interval (breath or time) (Martin-Rincon et al., 2019) implies that the realVO 2max value should be measured with a small number of breaths or a short time interval. Since the reproducibility is not significantly lowered by reducing the averaging interval between 6 and 60 breaths (or 6 and 60 s) (Martin-Rincon et al., 2019), the most appropriate approach would be using the shortest interval, since only one trueVO 2max value must exist. From a physiological perspective it is not the same to compute thė VO 2max value with a 6-s than a 60-s averaging strategy, since the latest may be 4-10% lower, depending on the fitness status. Given the relatively low trainability ofVO 2max (Montero and Lundby, 2017), scientists and clinicians may easily misjudge the real condition of patients or the effects elicited by an intervention without accounting for the impact of the averaging strategy. Imputing an underestimatedVO 2max has important implications for integrative physiology and pathophysiology. This is the case when comparing theVO 2max with maximal mitochondrial respiration values obtained in vitro, where the values ofVO 2max used to represent the whole-bodyVO 2max will be smaller, and the excess mitochondrial respiratory capacity will be overestimated. Since theVO 2max is used to calculate cardiac output by the direct Fick method, for a given systemic arteriovenous O 2 (a-vO 2 ) difference, aVO 2max value obtained with a longer averaging interval will result in a lower calculated cardiac output by the Fick method. Likewise, the calculated pulmonary O 2 diffusing capacity would also be greater the higher theVO 2max imputed. Moreover, defining theVO 2max with a shorter averaging interval results in a value that can be maintained for a shorter time, affecting the procedures to determine the velocity or intensity aṫ VO 2max (Billat and Koralsztein, 1996;Billat et al., 1999).
In the case that theVO 2max is used to calculate the systemic a-vO 2 difference using the indirect Fick method, attributing a higherVO 2max value would result in lower mixed venous O 2 contents and higher systemic O 2 extractions. Thus, when comparing the effect of different studies on variables determined by calculation from theVO 2max , it is crucial to have into consideration the averaging strategy applied in each investigation. Therefore, these facts may contribute to explain some of the variability in the literature when secondary-outcome variables have been based onVO 2max values computed with large or small averaging strategies.
Based on the reproducibility of measurements, no particular sampling strategy seem superior for assessment ofVO 2max . Notwithstanding, given that theVO 2max can be sustained for a limited time, a shorter averaging strategy offers a higher probability for capturing the trueVO 2max , while facilitating the identification of the plateauing criteria (Martin-Rincon et al., 2019). Thus, when using modern metabolic carts, averaging intervals including those between 15 and 20 breaths (or seconds) are preferable as a compromise between capturing aVO 2 value close to the trueVO 2max and identifying the plateau. So far, the time resolution of the assessment of cardiac output and blood gases has not been sufficient as to clarify what factors determine the plateau phenomenon.

AUTHOR CONTRIBUTIONS
Both authors contributed similarly and approved the final version of the manuscript.