Test-retest reliability of high-resolution surface electromyographic activities of facial muscles during facial expressions in healthy adults: A prospective observational study

Objectives Surface electromyography (sEMG) is a standard method for psycho-physiological research to evaluate emotional expressions or in a clinical setting to analyze facial muscle function. High-resolution sEMG shows the best results to discriminate between different facial expressions. Nevertheless, the test-retest reliability of high-resolution facial sEMG is not analyzed in detail yet, as good reliability is a necessary prerequisite for its repeated clinical application. Methods Thirty-six healthy adult participants (53% female, 18–67 years) were included. Electromyograms were recorded from both sides of the face using an arrangement of electrodes oriented by the underlying topography of the facial muscles (Fridlund scheme) and simultaneously by a geometric and symmetrical arrangement on the face (Kuramoto scheme). In one session, participants performed three trials of a standard set of different facial expression tasks. On one day, two sessions were performed. The two sessions were repeated two weeks later. Intraclass correlation coefficient (ICC) and coefficient of variation statistics were used to analyze the intra-session, intra-day, and between-day reliability. Results Fridlund scheme, mean ICCs per electrode position: Intra-session: excellent (0.935–0.994), intra-day: moderate to good (0.674–0.881), between-day: poor to moderate (0.095–0.730). Mean ICC’s per facial expression: Intra-session: excellent (0.933–0.991), intra-day: good to moderate (0.674–0.903), between-day: poor to moderate (0.385–0.679). Kuramoto scheme, mean ICC’s per electrode position: Intra-session: excellent (0.957–0.970), intra-day: good (0.751–0.908), between-day: moderate (0.643–0.742). Mean ICC’s per facial expression: Intra-session: excellent (0.927–0.991), intra-day: good to excellent (0.762–0.973), between-day: poor to good (0.235–0.868). The intra-session reliability of both schemes were equal. Compared to the Fridlund scheme, the ICCs for intra-day and between-day reliability were always better for the Kuramoto scheme. Conclusion For repeated facial sEMG measurements of facial expressions, we recommend the Kuramoto scheme.


Introduction
Facial electromyography (EMG) is a standard tool in clinical studies and psychological experiments to assess facial muscles during specific facial expressions and to analyze the association to specific emotions (Hubert and de Jong-Meyer, 1991;Guntinas-Lichius et al., 2020;Hofling et al., 2020). The recordings in psychological settings usually are performed on the surface of facial muscles via multi-channel surface EMG (sEMG) (Tassinary et al., 1989;Barrett et al., 2019). Multi-channel sEMG is needed, because the facial muscular system forms a complex interdependent and interwoven system of facial muscles that is connected to the skin (Cattaneo and Pavesi, 2014). Hence, specific facial movements lead to a complex sEMG activation of several or even almost all facial muscles (Schumann et al., 2010(Schumann et al., , 2021Cui et al., 2020). Actually, two different facial sEMG recording schemes are established: Most popular is the scheme developed by Fridlund and Cacioppo. They recommended to record the sEMG always from 10 specific facial and one masticatory muscle (Fridlund and Cacioppo, 1986). As an alternative, Kuramoto et al. (2019) recommended to even cover the complete face by using 21 sEMG electrodes in an EEG-like arrangement. Recently, we showed that a geometric and symmetrical sEMG recording from the entire face like it is recommended by Kuramoto et al. (2019) seems to allow a more specific distinction of facial muscle activity patterns during various facial expression tasks than the more frequently applied scheme by Fridlund and Cacioppo (Mueller et al., 2022).
In a typical psychological or clinical experiment, participants or patients are examined several times, for instance by varying the stimuli or at different days before and after an intervention. Hence, it has to be guaranteed that the respective sEMG scheme can be applied in a reliable manner to exclude a variability of the EMG recording related to variability of the electrode application. Any fixed sEMG scheme is influenced by the inter-electrode distances, crosstalk, and the influence of both on the sEMG recordings. Surprisingly, although very important, the test-retest reliability of facial high-resolution sEMG has not been analyzed in detail yet (Hess et al., 2017). If only some or single facial muscles are recorded in psycho-physiological research, the test-retest reliability can be low (Hess et al., 2017).
Therefore, we wanted to study the test-retest reliability of both the sEMG electrode schemes of Fridlund and Cacioppo and of Kuramoto et al. (2019) in the same healthy probands as in the previous study (Mueller et al., 2022). Both schemes were applied simultaneously during specific facial expressions. One sEMG session included three trials of these standardized facial expressions. The session was then repeated on the same day. The entire procedure (i.e., two sessions with three trials, respectively) was repeated 14 days later. Hence, it was possible to calculate the intra-session reliability, the intra-day reliability, and the betweenday reliability of high resolution facial sEMG.

Healthy participants
The study included the same 36 healthy adult volunteers as published recently (Mueller et al., 2022). Nineteen women and 17 men were included (age range: 18-67 years). Exclusion criteria were: neurological disease, history of facial surgery or facial trauma. The ethics committee of the Jena University Hospital approved the study (No. -1539. All participants gave written informed consent to participate in the study.

Standardization of repeated facial exercises
The participants were instructed about the sequence of the examination. The instructions for the facial expressions presented by a video were explained. Details of the video tutorial are presented elsewhere (Volk et al., 2019). Briefly, the participants sat in relaxed upright position in front of a computer screen and followed a self-explanatory video tutorial. A human instructor explained and showed the following eleven facial expressions: Face at rest (no movement), wrinkling of the forehead, closing the eyes normally (gentle eye closure), closing the eyes forcefully (forceful eye closure), nose wrinkling, smiling with closed mouth, smiling with open mouth, lip puckering (pursing lips), blowing-out the cheeks (cheek blowing), snarling, and depressing lower lip. The participants performed each expression three times (three trials) before the next expression was explained. One session contained all expressions. On day t1 two sessions were performed with a time lag of about 20 min. The row data of t1 were already presented in the previous publication (Mueller et al., 2022). The same two sessions with the same time lag were repeated on day t2 14 days later (Figure 1).

Facial surface electromyography (sEMG) registration
The sEMG protocol was published recently (Mueller et al., 2022). Briefly, a multi-channel EMG system (gain: 100, frequency range 10-1,861 Hz; sampling rate 4,096/s; resolution: 5.96 nV/bit; DeMeTec, Langgöns, Germany) was used for the sEMG recordings using monopolarly connected reusable surface electrodes (Ag-AgCl discs, diameter of 4 mm, DESS052606, GVB-geliMED, Bad Segeberg Germany). Electromyograms were recorded from both sides of the face. To account for artifacts, signals were centered and bandpass filtered between 10 and 500 Hz. A 50 Hz notch filter was applied to account for interferences from the electrical circuit. Two electrode arrangements were applied simultaneously: The schemes developed by Fridlund and Cacioppo (1986) and by Kuramoto et al. (2019) were used (Figure 1). In the following, the two schemes are labeled as "Fridlund" and "Kuramoto". In total, 58 electrodes were placed on the face (including one ground and two connected reference electrodes). For the Fridlund scheme, from the monopolarly measured electrodes bipolar channels were calculated by subtracting the signals from the respective electrode pairs. Data for the Kuramoto scheme were monopolarly analyzed. sEMG amplitudes were quantified as mean rms values during the steady state contraction phases of every facial expression and sEMG channel. Between the two sessions on day t1 and t2, electrodes were not removed, i.e., all electrodes remained in place for the two sessions in one day.
To ensure the use of identical electrode positions at t1 and t2, rigid laminated foils were used at both time points (Figure 2). Punched holes in the foils were used to mark the electrode positions on the face of the participants.

Statistics
All statistical analyses were performed using IBM SPSS Statistics 25 (Chicago, IL, USA). Intraclass correlation coefficient     (ICC) statistics expressed with lower and upper borders (i.e., minimal and maximal values) were used to analyze the retest reliability of the normalized EMG amplitudes between (a) the three trials in each session (intra-session reliability), (b) between the two sessions at one day (intra-day reliability), and (c) between the two days of measurement (between-day reliability). The higher the ICC value, the more precise is the estimate. ICC values less than 0.5 are indicative of poor reliability, values between 0.5 and 0.75 indicate moderate reliability, values between 0.75 and 0.9 indicate good reliability, and values greater than 0.90 indicate excellent reliability (Aniss and Sachdev, 1996). To allow a comparison to other data sets, the dimensionless coefficient of variation (CV) was calculated additionally (Koo and Li, 2016). The CV was calculated as the ratio of the standard deviation to the mean as a percentage ([CV = standard deviation/mean] × 100) for each individual and the different settings. The results are presented as means of the study sample. The lower the CV, the more precise was the estimate.

Intraclass correlation coefficient statistics
In all tables minimum ICC values, the mean ICCs (values Fisher-z corrected, averaged, and the result inversely Fisher-z corrected), and the maximal ICC values are shown, respectively.
Reliability of the sEMG recordings with the Fridlund scheme Table 1 shows the results for the intra-session reliability, intraday reliability, and the between-day reliability for all electrode positions. The mean intra-session ICCs were excellent for all electrode positions on both sides of the face (0.935-0.994). The least ICC values for the intra-session reliability occasionally were poor, but mainly moderate to good (0.117-0.891). The mean ICCs for the intra-day reliability were all moderate to good (0.674-0.881). The lowest ICC values for the intra-day reliability always showed poor values (0.011-0.488). The mean between-day ICCs values were poor to moderate (0.095-0.730). The minimum ICCs for the between-day reliability except one value always were poor (−0.152 to 0.298). Overall, systematic differences between the different muscles were not obvious for the Fridlund scheme. If poor values occurred they frequently could be detected on both sides.  Figures 3A, B. The mean intra-session ICCs were excellent for all facial expressions on both sides of the face (0.933-0.991). The lowest ICCs for the intra-session reliability rarely reached only poor values, but were mostly good to sometimes excellent (0.117-0.920). The mean ICCs for the intra-day reliability were all good to moderate (0.674-0.903). The lowest ICC values for the intra-day reliability mostly reached only poor results (0.011-0.577). The mean between-day ICCs values were poor to moderate (0.385-0.679). The lowest ICCs for the between-day reliability always were poor (−0.152 to 0.220). Overall, differences between the different exercises were not seen for the Fridlund scheme, but depressing lower lip somehow marked the lower border (compare with Table 2). Table 1 shows all results from the perspective of the electrode positions. The mean intra-session ICCs were excellent for all electrode positions when using the Kuramoto scheme (0.957-0.970). The lowest ICCs for the intra-session reliability reached moderate to good values (0.504-0.871). The mean ICCs for the (D) E9/E10 in the chin area. R, face at rest (no movement); wFH, wrinkle forehead; gCE, closing the eyes normally (gentle eye closure); fCE, closing the eyes forcefully (forceful eye closure); wN, nose wrinkling; cMS, smiling with closed mouth; oMS, smiling with open mouth; pL, lip puckering (pursing lips); bCH, blowing-out the cheeks (cheek blowing), s, snarling; dLL, depressing lower lip; S1, session 1 at t1; S2, session 2 at t1; S3, session 1 at t2; S4, session 2 at t2. intra-day reliability were all good (0.751-0.908). The lowest ICCs for the intra-day reliability reached poor to moderate and also two good values (0.291-0.763). The mean between-day ICCs values were all moderate (0.643-0.742). The lowest ICCs for the between-day reliability always were poor (0.071-0.495). Overall, clear differences between the different electrode positions were not seen for the Kuramoto scheme. The centrally positioned electrodes showed no other ICCs than the lateral ones.

Reliability of the sEMG recordings with the Kuramoto scheme
The intra-session reliability of both schemes were equal. Compared to the Fridlund scheme, the mean ICCs and especially the lower border ICCs for the intra-day and between-day reliability were larger for the Kuramoto scheme. Table 2 shows the results for all facial expressions when using the Kuramoto scheme. Two examples for the EMG recording results for all expressions are shown in Figures 3C, D. The mean intra-session ICCs were excellent for all facial expressions (0.927-0.991). The lowest ICCs for the intra-session reliability reached moderate to excellent values (0.504-0.971). The mean ICCs for the intra-day reliability were excellent to good (0.762-0.973). The lowest ICCs for the intra-day reliability reached poor Again, the intra-session reliability of both schemes were equal. Compared to the Fridlund scheme, the mean ICCs and especially the lower border ICCs for the intra-day and between-day reliability were always better for the Kuramoto scheme when focusing on the facial expressions.

Coefficient of variation statistics
The CVs are presented as means and standard deviation of the study group. The results are summarized in the Tables 3, 4.
Overall, the results of the CV analyses confirmed the results of the ICC statistics.
Regarding the recordings from different facial muscles, the CVs were good for the intra-session reliability for both schemes, but better for the Kuramoto scheme (range: minimal mean CV of 4.78 to maximal mean CV of 9.16 of all measurements) than for the Fridlund scheme (3.32-14.09). The intra-day reliability was mainly moderate for the Fridlund scheme (13.99-36.18), whereas the Kuramoto scheme (7.28-20.21) showed sometimes good estimates. The between-day reliability was moderate or below moderate for the Fridlund scheme (19.08-36.34), and moderate for the ). The central electrodes showed worse results for the intra-session reliability than the other electrodes (overall, 4.14-20.06).
Regarding the EMG recordings during the different facial movement tasks, the CV values were overall worse than for the recordings from different facial muscles. The CVs were good to moderate for the intra-session reliability for both schemes (Fridlund: 6.71-21.03; Kuramoto: 5.21-18.16). The intra-day

Discussion
Facial sEMG is an important and standard instrument for the evaluation of emotional expressions or as a diagnostic tool to analyze facial muscle function. Typically, participants or patients are examined several times, for instance by varying the stimuli, or at different days before and after an intervention (Hess et al., 2017). To detect differences in the EMG activity related to the experiment or clinical changes, it is important to guarantee a high test-retest reliability of facial sEMG, i.e., to rule out that different EMG findings at different recording instances are not just the result of the variability of the respective EMG recordings, as this would disqualify the method from clinical application.
Traditionally, the placement of the sEMG electrodes on the facial skin is oriented to topography of the subdermal topographical position of the facial muscles and the direction of the muscle fibers. The electrode scheme of Fridlund and Cacioppo (1986) is most frequently used. Pairs of electrodes are placed in a constant distance of 1 cm on 10 facial and the masseter muscle on each side of the face. The original publication contains detailed anatomical descriptions of the electrode positioning. While the distance between the electrodes of one pair is exactly defined by 1 cm, the anatomical description allows some variability. Therefore, it was very important to work with laminated foils with exactly defined electrode positions in the present study. At the same time this is a limitation, as it means that it might be necessary to use such foils in any psychophysical experiments with repeated measurements.
In general, it is surprising that test-retest reliability data for any sEMG scheme that is following the muscle topography (as the most frequent approach) are sparse. Demeco et al. (2021) analyzed the results of only four electrodes per side but wireless sEMG recording. Three trials of each participant were analyzed in one session with 4 min of rest between sessions. The exercises were: Frowning, closing eyes forcefully, showing teeth, and pursing lips. The test was repeated after 10 days. The intra-subject reliability (intra-session reliability was not analyzed) was very good for all the analyzed movements with 0.94 (CI 0.90, 0.98), i.e., lower than the intra-session reliability of the present study. Only Hess et al. (2017) used the test-retest methodology like in the present study. They measured a sEMG on the corrugator supercilii muscle (while frowning), orbicularis oculi m. (during wrinkles around the eyes), the levator labii superioris m. (lifting the upper lip in disgust), and the zygomaticus major (lifting the corners of the mouth while smiling) two times with a time interval of 15 or 24 months. Both studies cannot be directly compared because Hess et al. (2017) used images as stimuli material for non-voluntary reactions whereas we used video instructions for voluntary facial expressions. We assume that our instructions lead to more reproducible facial expressions. Emotional reactions should have a higher variability (Aniss and Sachdev, 1996;Rymarczyk et al., 2011). This might explain why the retest reliability was only high for the M. zygomatic major with ICC = 0.93, whereas the ICC were all <0.7 for all other settings in the study by Hess et al. (2017).
A critical factor during different sessions is the reproducible positioning of the electrodes when using the classical Fridlund scheme. Any repeated electrode application has also influences on inter-electrode distances and therefore electrode crosstalk variability. The EMG signal is highly sensitive to changes of the electrode positions with respect to the facial muscles (Frank et al., 2021). Therefore, it appears plausible that the test-retest reliability was better for the Kuramoto scheme. Recently, we have shown that the Kuramoto scheme performs better than the Fridlund scheme to differentiate distinct facial expressions (Mueller et al., 2022). This was surprising since any monopolar montage by nature contains more cross talk than a bipolar one (Mohr et al., 2018). The present study was the first to study the testretest reliability of the Kuramoto scheme. Probably, especially this monopolar geometric and symmetrical electrode positioning is more robust against slight but unavoidable position changes. We can now conclude that the Kuramoto scheme is also the more suitable scheme when using facial sEMG for psychophysical experiments at repeated sessions with same participants. We did not perform a direct comparison of the Kuramoto scheme to high-density sEMG (HD sEMG) settings applying >90 electrodes with inter-electrode distance of ≤5 mm (Drost et al., 2006;Cui et al., 2020). At least the results for the intra-session reliability and intra-day reliability seem to be good enough to recommend the Kuramoto scheme for standard psychophysical experiments with small re-test intervals. HD sEMG is very time-consuming per experiment and probands. It remains open if HD sEMG can deliver a better re-test reliability, especially a better between-day reliability.
The present study has limitations. Using the video selftutorial to demonstrate the facial movement tasks seems to us as the most reliable instruction technique (Volk et al., 2019). Nevertheless, it does not rule out a variable performance of the participants, as no feedback is implemented. It is proposed to evaluate only strikingly different but good defined facial expressions to minimize intra-individual variability (O'Dwyer et al., 1981;Demeco et al., 2021;Jung and Im, 2022). The disadvantage is that the probands perform artificial facial expressions when asked to perform the expressions demonstrated in the self-tutorial video. Furthermore, the preformed foils to mark the electrode positions, of course, still allow some remaining variability of the positioning of the electrodes. In the future, we plan to use screen-printed adhesive electrode arrays (Inzelberg et al., 2018). It looks like as these adhesive arrays allows a very reliable electrode placement, and therefore EMG recording (Gat et al., 2022). This seems important to us to establish an easy to use setting for psychophysical experiments especially when performed in large sample sizes or by non-EMG experts.

Conclusion
High-resolution sEMG recordings of healthy probands showed an excellent intra-session test-retest reliability in regard of the Fridlund and the Kuramoto scheme, both in regard of the electrode positions and for the different facial expressions. The test-retest intra-day reliability and the between-day reliability was consistently better for the Kuramoto scheme. When using the Kuramoto scheme, a good to excellent mean intra-day and between-day reliability seems to be achievable.

Data availability statement
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Ethics statement
The studies involving human participants were reviewed and approved by the Ethics Committee of the Jena University Hospital. The patients/participants provided their written informed consent to participate in this study. Written informed consent was obtained from the individual(s) for the publication of any potentially identifiable images or data included in this article.

Author contributions
OG-L, GFV, and CA: conceptualization and supervision. OG-L and CA: first draft preparation. VT, NM, A-MK, and MH: data acquisition. VT, NM, and CA: data analysis. All authors contributed to the article and approved the final version.

Funding
VT and NM received a doctoral scholarship by the Interdisziplinäres Zentrum für Klinische Forschung (IZKF) of the Jena University Hospital. OG-L acknowledges support by the Deutsche Forschungsgemeinschaft (DFG), grant no. GU-463/12-1.