Development of a Novel Vital-Signs-Based Infection Screening Composite-Type Camera With Truncus Motion Removal Algorithm to Detect COVID-19 Within 10 Seconds and Its Clinical Validation

Background: To conduct a rapid preliminary COVID-19 screening prior to polymerase chain reaction (PCR) test under clinical settings, including patient’s body moving conditions in a non-contact manner, we developed a mobile and vital-signs-based infection screening composite-type camera (VISC-Camera) with truncus motion removal algorithm (TMRA) to screen for possibly infected patients. Methods: The VISC-Camera incorporates a stereo depth camera for respiratory rate (RR) determination, a red–green–blue (RGB) camera for heart rate (HR) estimation, and a thermal camera for body temperature (BT) measurement. In addition to the body motion removal algorithm based on the region of interest (ROI) tracking for RR, HR, and BT determination, we adopted TMRA for RR estimation. TMRA is a reduction algorithm of RR count error induced by truncus non-respiratory front-back motion measured using depth-camera-determined neck movement. The VISC-Camera is designed for mobile use and is compact (22 cm × 14 cm × 4 cm), light (800 g), and can be used in continuous operation for over 100 patients with a single battery charge. The VISC-Camera discriminates infected patients from healthy people using a logistic regression algorithm using RR, HR, and BT as explanatory variables. Results are available within 10 s, including imaging and processing time. Clinical testing was conducted on 154 PCR positive COVID-19 inpatients (aged 18–81 years; M/F = 87/67) within the initial 48 h of hospitalization at the First Central Hospital of Mongolia and 147 healthy volunteers (aged 18–85 years, M/F = 70/77). All patients were on treatment with antivirals and had body temperatures <37.5°C. RR measured by visual counting, pulsimeter-determined HR, and BT determined by thermometer were used for references. Result: 10-fold cross-validation revealed 91% sensitivity and 90% specificity with an area under receiver operating characteristic curve of 0.97. The VISC-Camera-determined HR, RR, and BT correlated significantly with those measured using references (RR: r = 0.93, p < 0.001; HR: r = 0.97, p < 0.001; BT: r = 0.72, p < 0.001). Conclusion: Under clinical settings with body motion, the VISC-Camera with TMRA appears promising for the preliminary screening of potential COVID-19 infection for afebrile patients with the possibility of misdiagnosis as asymptomatic.


INTRODUCTION
As of March 2022, the number of people who have been infected with COVID-19 worldwide was 434 million, resulting in approximately 5.9 million deaths (WHO, 2022). Although nucleic acid amplification tests, such as the reverse transcriptase-polymerase chain reaction (RT-PCR), are recommended to identify active infection as a gold standard, the tests require specialty resources and time-intensive tasks (WHO, 2020a). RT-PCR test takes approximately 2 h, including specimen collection and handling time.
As a preliminary daily screening prior to the RT-PCT test, an infrared thermometer-based body temperature (BT) screening has been widely used to conduct rapid infection screening at entrances of mass gathering places such as schools, offices, and hospitals (WHO, 2020b). However, recent cohort studies have reported a high rate (70-75%) of afebrile cases in patients with COVID-19 infection, which will be undetected in BT screening and can spread the virus (Goyal et al., 2020;Richardson et al., 2020).  induced vital signs alterations in addition to BT, such as higher heart rate (HR, HR>100 bpm) than that of normal controls and higher respiratory rate (RR, RR>24 bpm) (Rechtman et al., 2020). Moreover, a cohort study in the United Kingdom revealed that patients with COVID-19 experienced significant increases in RR and slight increases in HR; however, they had no significant increases in BT (Pimentel et al., 2020).
We have developed vital signs (RR, HR, and BT) based infection screening system which can detect afebrile patients (Matsui et al., 2010;Dagdanpurev et al., 2019). However, the proposed system had a limitation of accuracy in orthostatic RR measurement because of the patient's non-respiratory motion, such as truncus front-back motion, for maintaining upright posture ( Figure 1). Using Doppler radars, we have previously developed non-contact screening systems for sepsis and pneumonia Otake et al., 2021). However, Doppler radars have limitations in noise reduction induced by body movements. Therefore, in this paper, we adopted image sensors used in our previous studies instead of Doppler radars, such as a depth camera (Takamoto et al., 2020) and a red-green-blue (RGB) camera (Unursaikhan et al., 2021).
In order to conduct an accurate orthostatic RR assessment, we propose a truncus motion removal algorithm (TMRA). A truncus motion is measured using depth-camera-determined neck movement. Adopting TMRA in addition to the ordinary region of interest (ROI) tracking algorithms, we developed a mobile and vital-signs-based infection screening composite-type camera (VISC-Camera). VISC-Camera detects COVID-19 possibly infected patients within 10 s using a stereo depth camera for RR determination, an RGB camera for HR estimation, and a thermal camera for BT measurement ( Figure 2). These three vital signs are indices used to determine the systemic inflammatory response syndrome (SIRS) score (Kaukonen et al., 2015).
To conduct a rapid preliminary COVID-19 screening, we developed a portable vital-signs based COVID-19 screening system using multiple image sensors (VISC-Camera) with body (trunk) motion removal algorithm (TMRA). The clinical study in a Mongolian hospital revealed that VISC-camera enabled accurate afebrile COVID-19 patients screening due to patients' respiratory rate (RR) increase induced by inflammatory responses in relation to FIGURE 1 | The vital-signs-based infection screening composite-type camera (VISC-Camera) measurement setup in the standing position. In addition to respiratory anterior thorax (Z AT ) region of interest (ROI) tracking algorithms, the VISC-Camera incorporates a non-respiratory truncus motion (red dotted line) removal algorithm (TMRA) using Z truncus movements determined as neck motions.

The VISC-Camera System Structure
The VISC-Camera with a seven-inch touchscreen display incorporates three kinds of image sensors, i.e., a stereo depth camera with a Red-Green-Blue (RGB) camera (424 × 240 pixels, 30 [frames/s], Intel Real sense D435i), and a thermal camera [80 × 60 pixels, 9 (frames/s), FLIR Lepton 2.5] ( Figure 2B). A circular polarizer lens is installed in front of the RGB camera to reduce the reflection of glossy skin. The VISC-Camera has a built-in singleboard computer (Raspberry Pi 4B) for data processing. Images of a stereo depth camera with an RGB camera and a thermal camera are transferred to the single board computer through USB 3.0 and serial peripheral interface (SPI), respectively. A built-in Li-ion battery enables continuous operation for over 100 measurements by a single battery charge. The housing of the device is printed by a 3D printer with overall dimensions of 22 cm (L) × 14 cm (W) × 4 cm (H). The total weight of the VISC-Camera is 800 g.

The VISC-Camera Data Processing Algorithms
The software of the VISC-Camera is stand-alone and was developed in Python programming language (Python Software Foundation). The overview of the software block diagram is shown in Figure 3 (Supplementary Video). We used the Open computer vision (OpenCV) library for image processing and the multiprocessing module from Python to execute the image analyses simultaneously (Bradski, 2000). The VISC-Camera executes parallel processes in a quad-core processor, i.e., image capturing and RR, HR, and BT determinations. As shown in The Truncus Motion Removal Algorithm for Respiratory Rate Estimation, the VISC-Camera extracts RR from depth images while measuring the distance from the VISC-camera to the examinee. RGB images enable HR estimation through facial landmarks detection with a facial tracking algorithm. BT is derived from RGB-determined ROI using the thermal images. Logistic regression analysis (LRA) was adopted from the Python Scikit-Learn machine-learning library (Pedregosa et al., 2011) to classify COVID-19 infection. The linear equation determined by LRA is expressed as follows: where log p 1−p is the predicted logit score, β 0 is a constant, and β 1 /β 3 are regression coefficients corresponding to the LRA explanatory variables of RR, HR, and BT ( Figure 3D).
In order to keep the computational load as minimum as possible, we adopted the following procedures to determine regions of interest. ROI BVP (HR), ROI thermal (BT), ROI AT (RR), and ROI neck (for truncus motion determination) are the regions of interest corresponding to face (wide), face (narrow), anterior thorax, and neck, respectively, as shown in Figure 3. Using a stereo depth camera, the facial area was determined using the bounding box algorithm via OpenCV library and the average human facial dimension (Zhuang et al., 2010). For extracted facial area, an RGB camera was adopted to determine facial landmarks A to D [ Figure 3 (B; Chart 2)]. ROI BVP was FIGURE 2 | The VISC-Camera structure and vital signs-related signals were derived in the COVID-19 isolation unit. (A) VISC-Camera screening for a patient with COVID-19 within 10 s (B) The three cameras of VISC-Camera: stereo depth camera for respiratory rate (RR) monitoring, RGB camera for heart rate (HR) determination, thermal camera for body temperature (BT) measurement. (C) Vital signs-related signals of the patient comprising respiration signal, heartbeat signal, and facial region of interest (ROI, orange rectangle) for BT determination.
Frontiers in Physiology | www.frontiersin.org June 2022 | Volume 13 | Article 905931 determined using A to C, and ROI neck (3 × 3 cm) was placed below point D corresponding to the jaw. ROI thermal (5 × 7 cm) was placed 3 cm above B. Using typical human facial dimensions, ROI AT [20 (H) cm ×18(W) cm] was fixed 12 cm below D (Bellemare et al., 2003).

Heart Rate Estimation
The VISC-Camera determines HR by sensing facial skin tone changes induced by an arterial pulsation called blood volume pulse (BVP) using an RGB image-based remote photoplethysmography (rPPG) method ( Figure 3B). To extract the BVP signal, the VISC-Camera detects the human face using the Haar cascade classifier from the OpenCV library (Viola and Jones, 2004). The median flow object tracking algorithm was adopted from the OpenCV library to track facial ROI for HR measurement without being affected by body motion, including headshake (Kalal et al., 2010). To efficiently acquire the BVP signal, we selected ROI BVP based on the human facial arterial anatomy using facial landmark points A-C [ Figure 3 (B; Chart 2)] (von Arx et al., 2018). The landmark points are determined by a neural network-based facial landmark detection algorithm from the DLIB library (King, 2009). The BVP signal is extracted from the spatial average of the RGB green color signal of the ROI via multiresolution analysis of the maximal overlap discrete wavelet transform (MODWTMRA) with order four Symlet wavelet filter, decomposition level 4, and bandpass filter (0.7-2.5 Hz). To avoid aperiodic waves, we adopted the autocorrelation function (ACF) on the extracted BVP. Finally, HR is estimated using the root multiple signal classification (MUSIC) algorithm with elements of one at a sampling rate of 30 Hz (Barabell, 1983).

The Truncus Motion Removal Algorithm for Respiratory Rate Estimation
The VISC-Camera determines RR by monitoring the examinee's volume of the anterior thorax [V AT (t)] using a stereo depth camera. The truncus motion removal algorithm (TMRA) enables accurate RR measurement from the respiratory anterior thoracic motion by extracting nonrespiratory front-back truncus movements determined by neck motions (Figure 1). Neck as ROI was adopted to estimate non-respiratory front-back motion because there are no respiratory muscles on the neck. V AT (t) is determined as follows: ROI xy 1, if x, y ∈ROI AT 0, otherwise where Z AT,xy (t) are the distances from VISC-Camera to an anterior thorax point (x, y), Z truncus is the truncus distance determined as the average distance of the ROI neck from VISC-Camera, 424 and 240 are the maximum pixel numbers of the stereo depth camera corresponding to horizontal and vertical directions, respectively. Δt is a sampling interval (33 ms). S pixel (t) is the area of a pixel at a distance of Z truncus (t) as shown below, 86°and 57°are the stereo depth camera's horizontal and vertical field of view angles, respectively.
The facial landmark point D, which was coordinated in the HR estimation process, determines the location of the ROI neck and the ROI AT to reduce computational load [ Figure 3 (A; Chart 2)]. In addition, a bandpass filter (0.1-0.6 Hz) excludes higher frequency artifacts induced by the examiner's handshake. To estimate RR from a short-time signal, we used ACF to find the periodicity of the respiratory signal and root MUSIC algorithm (with elements of one at a sampling rate of 30 Hz) for RR determination.

Body Temperature Estimation
The VISC-Camera estimates BT from the ROI thermal using a thermal camera ( Figure 3C). The ROI thermal is located 3 cm above the facial landmark point B [ Figure 3 (C; Chart 2)]. We conducted linear regression analysis to estimate BT from the facial temperature (T face ). The linear equation is expressed as follows: where T face is an average temperature within the ROI thermal . We excluded pixels indicating temperatures below 34°C or above 42°C in the ROI thermal . A and B are regression coefficients.

Image Quality Assessment
Taking stable images from a proper position is essential in measuring reliable signals using image-based remote methods. Therefore, the VISC-Camera screening is according to the following procedures. First, with 3D visual assistance on the screen, the examiner can orientate the device to the examinee. Next, to begin measurement, the VISC-Camera assesses the examinee's posture with respect to angles, i.e., roll and yaw, using facial landmark points A-C. After measurement starts, the VISC-Camera detects sudden motions using a second derivative-based blur detection method while capturing images (Pech-Pacheco et al., 2000). We used the Laplacian operator as a second derivative operator with the following 3 × 3 kernel: The pre-defined motion detection threshold value of the variance of the absolute value was Laplacian Variance < 100.

Clinical Testing of VISC-Camera at the First Central Hospital of Mongolia for COVID-19 Patients
We conducted clinical testing of the VISC-Camera in 154 patients who tested positive for COVID-19 according to RT-PCR (aged 18-81 years; 87 males, 67 females) within the initial 48 h of hospitalization at the First Central Hospital of Mongolia. A control set comprised 147 healthy volunteers (aged 18-85 years; 70 males, 77 females). All patients were afebrile (BT < 37.5°C) following administration of antiviral agents (umifenovir 200 mg/day or favipiravir 200 mg/day). Febrile patients in the intensive-care unit were not included. All healthy volunteers were examined using COVID-19 symptoms-based questionnaires and showed no symptoms for 1 week before and after VISC-Camera screening. We did not use criteria in body temperature, which excludes an examinee with and without COVID-19. A summary of the demographic of the participants is shown in Table 1. Figure 1 shows the VISC-Camera measurement setup in the standing position. We did not give any instructions on posture to examinees. Standing, sitting, and supine examinees are included in our study. There are possible effects of posture on RR and HR; however, the RR of COVID-19 patients drastically increased compared to healthy volunteers. The clinical tests were conducted indoors with moderate illumination levels (600-1,100 lux). For reference, a respiratory belt, visual respiratory counting, a fingertip pulsimeter, and a thermometer were used. We used a respiratory belt and visual respiratory counting as references of RR for healthy volunteers and COVID-19 patients, respectively. Because person-to-person contact was strictly limited in the isolation unit of COVID-19. Prior to a clinical study, we tested the concordance between respiratory belt-derived RR and that determined by visual respiratory counting. In order to evaluate false positive cases induced by another infectious disease with systemic inflammation, we recruited 33 pediatric pneumonia inpatients (aged 1-14 years; 18 males, 15 females) measured by the VISC-Camera at the Central Hospital of Songinokhairkhan District in Mongolia. This study was approved by the Tokyo Metropolitan University ethics committee (approval number H20-038). All participants gave written informed consent. We obtained the photograph licensing of Figure 1 from the patient.

Statistical Analysis
Pearson's correlation coefficient and Bland-Altman plots were used to analyze the correlation between the measurement of the VISC-Camera and the references. The LRA classification model results were used to calculate the sensitivity, specificity, negative predictive value (NPV), and positive predictive value (PPV). A t-test was conducted to statistically assess the vital signs' mean rate of the two groups. Ten-fold crossvalidation was performed, and the receiver operating characteristic curve was calculated to evaluate the accuracy of LRA models.

RESULTS
The heartbeat signal measured by the RGB camera showed similar periodic fluctuation to that determined by fingertip photoplethysmography ( Figure 4A). Respiration signal determined by stereo depth camera indicated cyclic oscillation same as that derived by the respiratory belt ( Figure 4B). Respiratory signals determined as V AT (t) with (right) and without (left) TMRA are shown in Figure 5. TMRA drastically improved the cyclic feature of the respiratory signal ( Figure 5A). TMRA greatly increased the correlation coefficient between RR determined by the reference and VISC-camera-derived RR from 0.51 ( Figure 5B; left; without TMRA) to 0.93 ( Figure 5B; right; with TMRA).
BT, defined as the same as the axillary temperature, is determined using thermal camera-monitoring of the temperature (T face ) of the facial region of interest (ROI; Figure 2C) using the following equation: BT 0.78 × T face + 8.99 The level of agreements between the VISC-Camera and the references were assessed using the Pearson correlation coefficient (n = 301) and the Bland-Altman plot. Correlation scatter plots for HR, RR, and BT are shown in Figures 6A,B,C, respectively. VISC-Camera-determined HR, RR, and BT significantly correlated with those measured using references (HR: r = 0.97, p < 0.001; RR: r = 0.93, p < 0.001; BT: r = 0.72, p < 0.001). The root mean squared errors of HR, RR, and BT were 2.7, 1.6, and 0.2, respectively. Figures 6D,E,F show the Bland-Altman plots for HR, RR, and BT determined by the VISC-Camera and the references. The 95% limits of agreement of HR, RR, and BT measurements ranged from −4.7 to 4.2 bpm (σ = 2.3), -3.1 to 3.1 bpm (σ = 1.6), and -0.39 to 0.40°C (σ = 0.2), respectively.
The 3D plot of RR, HR, and BT determined by the VISC-Camera is shown in Figure 7A. Although all patients with COVID-19 were afebrile, the 3D plot shows two groups: the patient and the healthy volunteer groups. This is because the RRs of infected patients were significantly higher than those of healthy volunteers (patients with COVID-19: mean RR = 22.1 bpm (σ = 2.7); healthy volunteers: mean RR = 15.1 bpm (σ = 2.4); two-tailed p < 0.05). To separate patients from healthy volunteers more accurately, the following logit regression analysis (LRA) equation was determined by adopting RR, HR, and BT as explanatory variables: Logit Score log p 1−p −9.192+0.505×RR−0.006×HR+0.0101×BT Logit Score≥0 0Suspected as COVID−19 Logit Score<0 0Healthy where p is the probability, p 1−p is the corresponding odds. In the screening of afebrile patients with COVID-19, a 10-fold crossvalidation revealed 91% sensitivity, 90% specificity, 91% positive predictive value (PPV), and 90% negative predictive value (NPV) ( Figure 7C). From the receiver operating  Figure 7B). Also, to evaluate false positive cases induced by another infectious disease with systemic inflammation, we tested our LRA model by adding 33 recruited pediatric pneumonia patients to 154 patients with COVID-19 and 147 healthy volunteers. The result revealed 91% sensitivity, 74% specificity, 75% PPV, and 90% NPV. Even the false positive rate increased; however, the most important parameters in quarantine screening, i.e., sensitivity and NPV, did not change.

DISCUSSIONS
The VISC-Camera can be used as an Internet of Things device for an infectious disease surveillance platform, as proposed in our previous study . It can also be used routinely in hospitals as a daily vital signs recording tool because it determines RR, HR, and BT within 10 s and correlates significantly with reference values. Generally, healthcare professionals take more than 3 min to measure and record vital signs. Therefore, in addition to screening, the proposed system may reduce the  In addition, sequential organ failure assessment (SOFA) has long been used to predict ICU mortality. However, in recent years, and in particular, following the publication of the Sepsis-3 guidelines (Singer et al., 2016), quick SOFA (qSOFA) has been more commonly used in clinical practice than SIRS or SOFA scores, which are better suited to evaluation during the early stages of sepsis screening. The qSOFA approach is also used when a sudden change in patient status is suspected. Although qSOFA uses only three parameters (RR, blood pressure, and consciousness), the traditional RR measurement is relatively laborious for medical staff. Therefore, a device to measure RR quickly and easily could be a helpful tool in an emergency clinical setting.
Although limited to supine position, Dong et al. recently reported non-contact COVID-19 screening using continuouswave radar-based non-contact sleep monitoring equipment via XGBoost and LRA model (Dong et al., 2022). However, the VISC-Camera enables non-contact COVID-19 screening in standing and sitting postures, as well as in supine positions. In addition, our study focuses on not only technical backgrounds but also clinical findings, such as the success of non-febrile COVID-19 patients screening by non-contact vital signs monitoring.
In our study, the patients were treated with antiviral medication in advance; thus, their conditions differed from the initial characteristics. Despite their lack of fever, we found a drastic increase in RR in patients with COVID-19 infection. Therefore, the ability of the proposed system to accurately measure RR during preliminary screening may contribute to the reduction of false-negative test results for COVID-19 infection.
Our study has some limitations. Critically ill patients with COVID-19 were excluded, and all participants were recruited from one country. Therefore, further data are needed that encompass examinees with various degrees of COVID-19 severity, diverse ethnicities from various countries, and a wide range of ages to increase the general-purpose versatility of the VISC-Camera. In this study, the VISC-Camera was used in air-conditioned areas with moderate illumination levels (600-1,100 lux). Therefore, the VISC-Camera should also be tested in other, less optimal environments. The VISC-Camera detects COVID-19induced inflammatory responses rather than the SARS-CoV-2 virus itself. Therefore, there are possibilities to misdiagnose the other infectious diseases as COVID-19, such as pneumonia or influenza. The VISC-Camera was developed for preliminary screening and not to define the diagnosis.
The portable VISC-Camera described here, which conducts infection screening within 10 s in a no-contact manner, appears promising for the preliminary screening of potential COVID-19 infection in afebrile patients.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusion of this article will be made available by the authors, without undue reservation.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by the Tokyo Metropolitan University ethics committee (the approval number H20-038). The patients/ participants provided their written informed consent to participate in this study. Written informed consent was obtained from the individual(s) for the publication of any potentially identifiable images or data included in this article.

AUTHOR CONTRIBUTIONS
BU and TM designed the research. BU, KH, and TM wrote the manuscript. GA and KH supervised the medical aspects of screening, and GA performed clinical testing. GS, OP, and LC contributed to the image-processing, signal-processing methods, and 3D designing. All authors reviewed the manuscript.

ACKNOWLEDGMENTS
We express our sincere thanks to all the participants from the First Central Hospital of Mongolia. The authors sincerely thank Khishigjargal Batsukh, the vice director of clinical affairs of the First Central Hospital of Mongolia, for permission to carry out the clinical testing. We sincerely thank Saeko Nozawa for her contributions to manuscript preparation.