Edited by: Johnny Padulo, University eCampus, Italy
Reviewed by: Nicola Luigi Bragazzi, University of Genoa, Italy; J.J. Duke, Ohio University, USA
*Correspondence: Matthew S. Tenan
This article was submitted to Exercise Physiology, a section of the journal Frontiers in Physiology
This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
Indirect calorimetry and oxygen consumption (VO2) are accepted tools in human physiology research. It has been shown that indirect calorimetry systems exhibit differential measurement error, where the error of a device is systematically different depending on the volume of gas flow. Moreover, systems commonly report multiple decimal places of precision, giving the clinician a false sense of device accuracy. The purpose of this manuscript is to demonstrate the use of a novel statistical tool which models the reliability of two specific indirect calorimetry systems, Douglas bag and Parvomedics 2400 TrueOne, as univariate normal distributions and implements the distribution overlapping coefficient to determine the likelihood that two VO2 measures are the same. A command line implementation of the tool is available for the R programming language as well as a web-based graphical user interface (GUI). This tool is valuable for clinicians performing a single-subject analysis as well as researchers interested in determining if their observed differences exceed the error of the device.
Since the original description of gas exchange indirect calorimetry (Atwater and Benedict,
A number of valuable methods have been proposed to examine and understand the effect of measurement error in exercise sciences and these methods can be applied to indirect calorimetry. William Hopkins has developed an ecosystem of tools to understand how reliability alters the understanding of measurements and noise (Hopkins,
In this model, W is the observed value of the mis-measured variable. X is the true variable measured, subject to error and U is the error which is assumed to be independent of X. In the present case, X is the actual VO2 (variable of interest) and W is the VO2 level actually measured by the device or system.
It is known that error in indirect calorimetry is not constant and has a non-linear measurement error based largely on the total flow rate (Macfarlane and Wu,
In this model, the error term is not independent of X and may be a linear or non-linear function based upon the value of X. The development of inferential statistical methods where differential measurement error is known are currently under development (Newton et al.,
The goal of the present manuscript is to detail the use of a statistical package that models the test-retest reliability of indirect calorimetry as univariate normal distributions accounting for non-linear measurement error. This tool is designed to provide researchers and clinicians a way of determining if two indirect calorimetry measures are likely to be “the same.” The utility of this novel statistical package will be detailed using five hypothetical examples: (1) baseline VO2 1.5 L/min vs. post-intervention VO2 1.7 L/min using the Parvomedics 2400 TrueOne, (2) baseline VO2 3.3 L/min vs. post-intervention VO2 3.5 L/min using the Parvomedics 2400 TrueOne, (3) baseline VO2 1.5 L/min vs. post-intervention VO2 1.7 L/min using the Douglas bag, (4) baseline VO2 3.3 L/min vs. post-intervention VO2 3.5 L/min using the Douglas bag, and (5) baseline VO2 3.0 with the Douglas bag vs. post-intervention VO2 3.3 with the Parvomedics 2400 TrueOne. The proposed tool has both advantages and disadvantages compared to previously proposed methodologies and these differences in both approach and use will be discussed.
Gas.Sim is written in the R programming and statistics language (R Core Team,
The error around each VO2 measurement is modeled as a univariate normal distribution. The parameters for the univariate normal distribution are defined by an analysis performed on the raw data contributed by Crouter et al. (
The two distributions are next overlapped and the overlapping coefficient is calculated (Inman and Bradley,
Gas.Sim presently has one primary function which implements the described analysis for VO2 data: VO2_sim. This function is implemented in R and is available upon request from the corresponding author. The function takes 5 inputs:
The “a” and “b” arguments are the VO2 values being tested. The present iteration of VO2sim is valid for use with the ParvoMedics 2400 TrueOne system and Douglas bag, which can be specified with either “parvo_2400” or “douglas_bag,” respectively. The system used to obtain each VO2 measure can be specified in the “system_a” or “system_b” argument. In cases where no system is specified, the algorithm defaults to the ParvoMedics 2400 TrueOne system. Depending on the needs of the user, the algorithm can also report only the probability that the two measures are the same (plot=FALSE) or can return a plot of the two distributions with the overlap visually depicted (plot=TRUE); by default, the algorithm simply returns the probability that the two VO2 arguments are the same.
It is pertinent to provide example data to illustrate the utility of the Gas.Sim package. For this purpose, we will examine the effects of theoretical training protocols for persons at a given constant workload. In these examples, repeated VO2 measurements will be made with the Douglas bag and with the Parvomedics 2400 TrueOne as well as one example where the baseline data is collected with the Douglas bag but the follow-up test was performed with the Parvomedics 2400 TrueOne. For the lower-end VO2 test, the baseline VO2 level for both systems is 1.5 L/min. After 1 year of training, the patient/athlete has a VO2 of 1.7 L/min, measured with both systems. For the higher-end VO2 test, the baseline VO2 level is 3.3 L/min and the post-intervention measure is 3.5 L/min. The fifth example assumes the first test was performed with the Douglas bag (VO2: 3.0 L/min) and the follow-up test was performed with the Parvomedics 2400 TrueOne (VO2: 3.3 L/min). VO2sim will be used to determine the probability that the change in VO2 observed for all pre- post-testing arise from the same distribution (i.e., they are the same measurement with no “true” change).
The change in VO2 after training protocols is an example of how VO2sim can be used to determine if repeated VO2 measurements are within the differential measurement error based on the specific system used to obtain the measurement. In examples 1 and 2, the measurements were obtained with the Parvomedics 2400 TrueOne system. When the baseline VO2 is 1.5 L/min and post-intervention VO2 is 1.7 L/min, there is a 10.3% probability that they are the same measure (Figure
In examples 3 and 4, the measurements were obtained with the Douglas bag method. When the baseline VO2 is 1.5 L/min and post-intervention VO2 is 1.7 L/min, there is a 17.2% probability that they are the same measure (Figure
Example 5 demonstrates the use of VO2sim to compare VO2 measures when they are obtained from different systems. When the baseline VO2 of 3.0 L/min is obtained with the Douglas bag and the follow-up VO2 measurement of 3.3 L/min is obtained with the Parvomedics 2400 TrueOne system, there is a 23.1% probability that they are the same measure (Figure
This study presents a novel descriptive methodology and tool to examine measurement error in gas exchange indirect calorimetry. This method is not susceptible to issues of statistical power, nor is it directly designed for any type of hypothesis testing. VO2sim adds an additional layer to ensure that clinical interpretations are valid as well as for didactic purposes within the classroom. To facilitate use by researchers and practitioners, this tool is available both as a statistical package within R and as a GUI.
In recognition that there are a wide number of researchers and practitioners who may benefit from VO2sim but may not be comfortable with the command line interface used in R programming (a suitable introduction to R is “R in a Nutshell”; Adler,
In cases where a single-subject analysis is performed, VO2sim can be used as the primary analytic tool. In research studies with multiple subjects, hypothesis testing should be performed prior to analysis with VO2sim. If VO2 normalized to body mass (typically, mL/kg/min) is desirable, normalized VO2 can be used in the hypothesis testing while the non-normalized VO2 data is used in the VO2sim analysis. If the hypothesis testing indicates that a statistically significant difference is observed between time points, VO2sim can be “stacked” or applied to each subject's data individually and the mean of the subjects' probability of similarity can be calculated to render a “net probability of similarity” between time points. The manual calculation of net probability of similarity with the VO2sim GUI can be time consuming depending on the number of subjects and also susceptible to human input error. However, when the net probability of similarity is calculated in the command line in R, this can be calculated using a single line of code:
In this example, where the default ParvoMedics 2400 TrueOne system is used, only the vectors or pre- and post-data collection points need to be supplied (pre_vector and post_vector, respectively). It is anticipated that as statistical methods for models of differential measurement error become more available and accepted (Newton et al.,
Since VO2sim needs to be applied on the raw data within a study, there are few published studies which can be directly evaluated. However, rough approximations of previous work can be performed based upon the reported mean values of VO2 and an assumed use of gold standard methodology (Douglas bag). The Gas.Sim tool may indicate the presence of a Type 1 statistical error, where there is an incorrect rejection of the null-hypothesis (i.e., “false positive”). In the present context, a Type 1 error may occur in small sample studies because VO2 measures within the error range happen to be obtained on one side of the distribution or in a larger sample study where there is statistical power to detect differences which exceed the accuracy of the device. This is especially likely in research when data is collected until findings are “significant” (Simmons et al.,
Variability and uncertainty is inherent in any testing methodology. Typically, devices with low measurement error can corroborate the findings of devices with higher measurement error. VO2sim is able to provide a context for the level of confidence in the VO2 metric apart from any corroborating data. For example, Lorenzo et al. (
It is important to consider the Gas.Sim package and VO2sim function in relation to other methods proposed to understand differences in repeated VO2 measurements for singular subjects. The methods proposed by both Hopkins (
Probably the most meaningful differences between the methodologies of Hopkins (
The Gas.Sim package has the benefit of returning a probability of similarity between two indirect calorimetry measures. This can be interpreted in a straightforward way: “there is a 30% probability that the two measures are the same.” A reasonable default threshold to state that the measures are “truly different” is 10% similarity. However, users of the Gas.Sim package are encouraged to consider what level of similarity is acceptable given their particular context.
The Gas.Sim package is presently limited to providing estimates for only two systems: ParvoMedics 2400 TrueOne and Douglas bag. As raw day-to-day validation data becomes available, new systems will be added to Gas.Sim's capabilities. The GUI is only available for VO2sim; however, the Gas.Sim package available for the R interface has functions capable of examining minute ventilation (VE) and carbon dioxide (VCO2). The current iteration of Gas.Sim is only valid for examination of day-to-day variability. This variability takes into account both the human-level variability and the system-level variability. Using VO2sim to determine the probability of VO2 differences within a testing session will likely result in an overly conservative estimate. As raw data becomes available which isolates the system-level variability, it will be added to the software package to estimate within-trial VO2 differences.
The Gas.Sim package relies heavily on the raw validation data provided by outside investigators (Crouter et al.,
Simulation of reliability data in gas exchange indirect calorimetry provides a method by which measurement error can be quantified and assessed. Both a command line and GUI implementation of the VO2sim function are presently available and described within the manuscript. Future iterations of the Gas.Sim package will include a greater number of indirect calorimetry devices as the raw validation data is made available. The described statistical tool provides an additional layer of security to understand and quantify the validity of clinical and research outcomes in exercise testing.
MT developed the methodology in the present manuscript and wrote the manuscript and underlying code for both the software package and web application.
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
The author would like to acknowledge the expert statistical advice and review performed by Vernon Lawhern Ph.D. and the subject matter expertise and review performed by Andrew Tweedell M.A. This manuscript would not be possible without Scott E. Crouter Ph.D. contributing the raw data from his previous validation studies.