Skin Temperature Measurement Using Contact Thermometry: A Systematic Review of Setup Variables and Their Effects on Measured Values

Background: Skin temperature (Tskin) is commonly measured using Tskin sensors affixed directly to the skin surface, although the influence of setup variables on the measured outcome requires clarification. Objectives: The two distinct objectives of this systematic review were (1) to examine measurements from contact Tskin sensors considering equilibrium temperature and temperature disturbance, sensor attachments, pressure, environmental temperature, and sensor type, and (2) to characterise the contact Tskin sensors used, conditions of use, and subsequent reporting in studies investigating sports, exercise, and other physical activity. Data sources and study selection: For the measurement comparison objective, Ovid Medline and Scopus were used (1960 to July 2016) and studies comparing contact Tskin sensor measurements in vivo or using appropriate physical models were included. For the survey of use, Ovid Medline was used (2011 to July 2016) and studies using contact temperature sensors for the measurement of human Tskin in vivo during sport, exercise, and other physical activity were included. Study appraisal and synthesis methods: For measurement comparisons, assessments of risk of bias were made according to an adapted version of the Cochrane Collaboration's risk of bias tool. Comparisons of temperature measurements were expressed, where possible, as mean difference and 95% limits of agreement (LoA). Meta-analyses were not performed due to the lack of a common reference condition. For the survey of use, extracted information was summarised in text and tabular form. Results: For measurement comparisons, 21 studies were included. Results from these studies indicated minor (<0.5°C) to practically meaningful (>0.5°C) measurement bias within the subgroups of attachment type, applied pressure, environmental conditions, and sensor type. The 95% LoA were often within 1.0°C for in vivo studies and 0.5°C for physical models. For the survey of use, 172 studies were included. Details about Tskin sensor setup were often poorly reported and, from those reporting setup information, it was evident that setups widely varied in terms of type of sensors, attachments, and locations used. Conclusions: Setup variables and conditions of use can influence the measured temperature from contact Tskin sensors and thus key setup variables need to be appropriately considered and consistently reported.


Supplementary
For the first objective on measurement comparisons, the original searches were performed with all dates considered but the inclusion criteria were modified to include in the formal analysis/synthesis only those published 1960 to present. The 1960s threshold was chosen due to commercialisation and advances in sensor technology. Further, the searches were found to be insufficiently effective in identifying 'old' (here, pre-1960s) articles. This observation was made by comparing the search results with pre-1960 articles identified via other means. We are confident that the search was effective in identifying the post-1960 articles because by manually searching the reference lists of included studies and citations of included studies, only one post-1960 study not returned in the formal search was located.
For data analysis and presentation, the data presented in forest plots (planned) were also summarised as groupings within a particular study (post-hoc decision). After producing the forest plots, we found the resultant number of comparisons displayed to be distracting from the data itself and, therefore, that figures summarising that information were better suited in the main text. The complete forest plots were retained (Supplementary Material) for reference and examination of specific comparisons of interest.

Supplementary Material Appendix 2. Extracted information
For measurement comparisons (objective 1), the following general study information was sought: if the study was performed in vivo and, if so, participant information (animal type, age, number of participants); if the study was performed using a physical model and, if so, the model composition and details; the number and type of contact sensors used for direct temperature comparisons; the number and type of attachments used for direct temperature comparisons; the sites used; whether single-site or pooled values were compared; and the number of replicates/sample size. For specific experimental comparisons, the following information was sought: temperature data from contact temperature sensors involved in a comparison of interest; surface type; notable other details about the specific comparison; environmental conditions (air temperature, relative humidity, air velocity); sensor and attachment type of the compared measurements; sensor calibration information; whether temperature was single-site or pooled values (and method of pooling if applicable); form in which data were presented; sample size.
For the survey of use (objective 2), the following information was sought: participant numbers and sex; activities performed during which T skin was measured; T skin sensor type and manufacturer/supplier and model information; calibration information; any details about sensor accuracy/uncertainty/precision etc; sensor attachment type and if the sensor was covered by the attachment; body sites at which T skin was measured; if mean T skin , mean body temperature, or other variables were calculated using T skin data; and how the data was presented (e.g., absolute values, change scores; if any data was presented for individual sites).

Supplementary Material Appendix 3. Descriptions of objective 1 outcome subgroups
Several key concepts underpinning T skin measurement were identified by the investigators and the outcome subgroups were defined according to these concepts: 1. Temperature disturbance of the surface underlying a surface sensor:  temperature measurement comparisons that indicate how the surface itself is influenced by the placement of a surface sensor on top.
2. Thermal equilibrium of the surface sensor with the underlying temperature:  temperature measurement comparisons that indicate if the surface sensors agree with measurements made 'more directly' of the surface below.
3. Influence of the attachment on surface sensors:  comparisons of measurements taken from the same surface sensor, but with different attachments under otherwise the same measurement conditions. 4. Influence of the pressure applied by surface sensors:  comparisons of measurements taken from the same surface sensor, but with different applied pressure under otherwise the same measurement conditions. 5. Influence of the environmental conditions on surface sensors:  equivalent comparisons of measurements, but with different environmental conditions under otherwise the same measurement conditions. 6. Influence of the type of surface sensor:  comparisons of temperature measurements from different surface sensors used under otherwise the same measurement conditions.

A) Mean difference and estimating standard deviation of the difference
Here, the mean of individual differences and the difference between separate comparator group means could be used interchangeably as the point estimate of measurement bias within a comparison and, therefore, we used 'mean difference' to encompass both.
Where limits of agreement (LoA) or the standard deviation of the mean difference ( ) were not available, was estimated from confidence intervals (CI) of the mean difference or standard deviations of the comparator group means separately ( 1 and 2 ).
 For cases in which the CI of the mean difference was available, was estimated using: SE = (upper CI limit − lower CI limit)/(2 • −1 ) and

= SE • √
where SE is the standard error, −1 is the corresponding critical value from the t-distribution, and n is the sample size.
 For cases in which only 1 and 2 were available, was estimated using (Williamson et al., 2002): where 1 and 2 are the standard deviations of comparators 1 and 2, respectively, and r is the correlation between the two comparators. In all cases it was a reasonable assumption that comparators had underlying measurement pairings (i.e., were not independent groups). When r was not available, it was estimated using data from a similar comparison in another study (such cases are noted below).
 For studies reporting LoA directly (using a multiplier of 2 or 1.96), we back-calculated and recalculated LoA using −1 . The purpose of the recalculation was for consistency in the calculation of LoA throughout the review.

B) Notes from included studies and estimates of mean difference and limits of agreement
Temperature disturbance of the surface underlying a surface sensor  Mahanty and Roemer (1979a), mean and of the comparisons at each heat flux.
Thermal equilibrium of the surface sensor with the underlying temperature  Lee et al. (1994). Human: used those with complete data from the 3 participants ('AR25, SS15, SS10, AR25, SS15, SS10'); for the repetitions pooled. Physical model: used the reported differences as replicates.
 Mahanty and Roemer (1979a), mean and of the comparisons at each heat flux.
Influence of the attachment on the temperature measured by surface sensors  Dollberg et al. (1994), data for individuals extracted from figure.
 Buono and Ulrich (1998), only s of group means given so estimated using correlation coefficient calculated using the 3-layer attachment data from elsewhere (Tyler, 2011).
 Deng and Liu (2008), the uncovered condition was not directly comparable so was not included.
Three thermocouples were used adjacent to each other with all three covered by the attachments.
We calculated a within-participant mean of the three thermocouples for each attachment, giving a point estimate (with no appropriate s for calculation of LoA Influence of the environmental conditions on surface sensors  Buono and Ulrich (1998), only s of group means given so estimated using correlation coefficient calculated using the attachment data from elsewhere (Tyler, 2011  Bach et al. (2015), information available in the paper for the grand means for rest, exercise, and recovery; the authors had made the data available (Bach, 2014) so, in addition, we calculated mean difference and for the additional timings that correspond to Figure 1 in the original article.   Yakovlev and Utekhin (1965), data given for maximal divergence (n=10) from which the mean difference and was calculated.

Sequence generation
Method used to allocate sensors/variables being compared. Here, 'sequence' encompasses testing order (temporal allocation) and, where possible, placement of the sensors (spatial allocation).
Low: allocation was achieved using randomised or pseudo-randomised (for balancing) methods and stated as such, or the allocation inherently represents low risk of bias.
Unclear: allocation sequence/methods not reported or not clear, therefore unable to determine order.
High: allocation was not random or balanced (e.g., likely that one sensor type always tested before the other; left and right sides of the body used but the same sensors are always on the same side).

Selection bias Allocation concealment
Knowledge about the forthcoming allocations. Allocation concealment is considered not relevant here and, therefore, will not be assessed.
Performance bias (systematic differences in exposure to factors other than the interventions of interest)

Blinding of participants and personnel
Steps taken to mitigate effects associated with knowledge of the sensor setup. For the studies likely to be identified, concealment is expected to be unlikely to be undertaken, although possible in principle. The actual implications of blinding on temperature measurements are not clear and, therefore, a lack of blinding is considered to be 'unclear risk of bias.' Low: blinding was performed adequately.
Unclear: blinding was partial, not performed, or not reported.
High: information that indicates participants or personnel were likely influenced by knowledge of the sensor type.

Detection bias Blinding of outcome assessment
Steps taken to mitigate effects associated with knowledge of the outcome assessors. Blinding of outcome assessment (i.e., temperature measurement) is considered not relevant here and, therefore, will not be assessed.
Attrition bias (systematic differences between groups in withdrawals from a study)

Incomplete outcome data
Exclusion of data from particular participants/models or sensors due to withdrawal or unavailability of data from particular participants/models or sensors.
Low: statement about complete data (e.g., participant completion, no missing measurements) or statement of incomplete data with suitable method reported for dealing with missing data.
Unclear: insufficient information to assess the likelihood of bias due to incomplete data.
High: information that indicates that the results were likely influenced by incomplete data.

Selective reporting
Reporting of data/results selected according to particular outcomes (e.g., those that are significantly different or demonstrate a particular feature/finding) rather than being based on a priori decisions. Includes data that was recorded and not made available. Also includes comparisons made but not reported that could bias the interpretation according to the review objective.
Low: the specific time points/periods reported and ways of pooling data is clearly intuitive from the methods reported, or a statement is made that the analyses were predefined and completed without changes, or all recorded data is available.
Unclear: insufficient information to assess the likelihood of selective analysis/presentation.
High: information that indicates data analysis and presentation is based on decisions made following data collection, or information that indicates particular data has been omitted without being justified.

Other bias 5. Consistency of other test conditions
Any experimental conditions, beyond the manipulated variables of interest, which may influence one measurement differently from the corresponding comparison measurement (e.g., environmental conditions, measurement timing).
Low: key conditions reported (e.g. environment, activities, timings) and tests performed in a suitably controlled fashion.
Unclear: insufficient information to assess whether the test conditions were suitably consistent.
High: information that indicates the test conditions could bias results (e.g., corresponding tests performed under different environments, measurements not taken at equivalent times).
Other bias 6. Calibration of sensors

Baseline comparability of the sensors under the same conditions.
Low: all sensors appropriately calibrated/corrected/checked by the investigators or on behalf of the investigators, or manufacturer provided a certified calibration prior to purchase.
Unclear: insufficient information to determine whether all the sensors were calibrated/corrected/checked.
High: one or more of the sensors were used according to manufacturer specifications without being certified or individually checked.

Involvement of partners that may have competing interests in the study outcome.
Low: contains equivalents of both 'conflict of interest' and 'acknowledgements' sections/statements and appears to be free of support (funding or provision of equipment) from partners with potentially competing interests.
Unclear: insufficient information to determine study support; does not contain equivalents of both 'conflict of interest' and 'acknowledgements' sections/statements.
High: The study received support (funding or provision of equipment) from partners with potentially competing interests. The probes were fixed to the arm of a microbalance (note that the authors use the mass in g and refer to the mass itself as 'pressure'). • Nickel wire (diameter 70 μm) wound in a zigzag grid "in a single plane". "Between the wire loops of the sensor is a space (0.5 mm) through which sweat freely passes and evaporates." Sensor height "equal to the diameter of the nickel wire"; area 15mm x 40 mm. • Type NR; "point" sensor (diameter 0.    Figure 3. Thermal equilibrium of the surface sensor with the underlying temperature (A and B). The data of Psikuta et al. (2014)  , each with 0.5 and 1.2 m/s air velocity. Data is also available at 21 and 31°C for 0.5 m/s but is not included and only point estimates are given here, each for practicality and clarity of presentation. Note that the groupings by sensor (three top panels) and groupings by attachment (three bottom panels) represent the equivalent data, shown separately to highlight both sensor type (top) and attachment (bottom).   aluminium; attach, attachment; env, environment; ex, exercise; L, layer; LoA, limits of agreement; PRT, platinum resistance thermometer; uncov, uncovered