Effects of Sleep Deprivation on Surgeons Dexterity

Sleep deprivation is an ordinary aspect in the global society and its prevalence is increasing. Chronic and acute sleep deprivation have been linked to diabetes and heart diseases as well as depression and enhanced impulsive behaviors. Surgeons are often exposed to long hour on call and few hours of sleep in the previous days. Nevertheless, few studies have focused their attention on the effects of sleep deprivation on surgeons and more specifically on the effects of sleep deprivation on surgical dexterity, often relying on virtual surgical simulators. A better understanding of the consequences of sleep loss on the key surgical skill of dexterity can shed light on the possible risks associated to a sleepy surgeon. In this paper, the authors aim to provide a comprehensive review of the relationship between sleep deprivation and surgical dexterity.


INTRODUCTION
Sleep deprivation is defined as "Abnormal sleep that can be described in measures of deficient sleep quantity, structure and/or sleep quality" (1). A wide body of literature supports the evidence that daily sleep duration plays an essential role in maintaining general healthy functioning. Moreover, sleep deprivation (SD) prevalence is increasing in the society (2), creating a raising challenge in managing daily performance deficits due to sleep loss. The detrimental effects associated to SD are of particular importance in specific populations as a cause of impaired performance, implying dangerous to deadly consequences (3). In parallel to the widely studied military population, surgeons are traditionally exposed to considerable levels of acute total and chronic SD (3)(4)(5), causing significant concerns on safety issues associated with sleepy surgeons (6,7).
To decrease the chance for a work-related accident to occur, it is necessary to prevent excessive sleepiness during work hours. It has been shown, in fact, that having sleep restricted to 6 h per day for an entire week leads to the same neurobehavioral performances observed in subjects sleep deprived for the whole night (8). This impairment is also comparable to the one recorded in someone considered alcohol intoxicated under the law limits permission (9).
Given the relatively broad spectrum of physiological sleep duration in humans, there is no consensus in advising about a specific recommended sleep duration. Individuals do not show common SD related deficits when sleeping at least 8 h per day (8). Therefore, a similar amount of sleep is also recommended by the national sleep foundation guidelines for adults subjects (10).
In the medical context, interns working more than 80 h per week have a significantly higher number of attentional deficits during overnight shifts (11). Limitations on work hours are varies across countries in the world. The European working time directive regulates maximum work hours in Europe for all workers. The maximum number of consecutive hours allowed is 13 and weekly average duty hours are limited to 48 (12). In the United States of America, the Accreditation Council for Graduate Medical Education (ACGME) regulates physicians in training duty hours. Thirty hours is the current limit of consecutive hours on duty, moreover a weekly work hours limitation is set to 80 h (13). In New Zealand physicians in training should not work more than 16 consecutive hours, weekly work hour limit is set to 72 h (14). Taking a broader perspective, it is worth mentioning that a marked reduction in average work hours in industrialized countries was observed in the last century (15).
The number of serious medical errors in intensive care unit were also found to be higher when a work schedule including long night shifts (24 h or more) were adopted (4). Interns are at higher risk of percutaneous injuries after a night shift or during night time (16). The risk of incurring in a motor vehicle accident in the next month was found to be raised by 9.1% (95% CI, 3.4-14.7%) for each scheduled 24-h shift during that month (17). Alertness, measured objectively using the Psychomotor Vigilance Task (PVT), a standardized task based on the measure of reaction times (18,19), was significantly worse in interns (although pertaining to an internal medicine department and not to a surgical one) after oncall shifts when compared to regular shifts (20). Heavy call rotations (calls occurring every fourth or fifth night) induced neurobehavioral deficit that were comparable to those related to a Blood Alcohol Concentration (BAC) of 0.04 to 0.05%, inters had 40% more attentional lapses and their subjective ability to evaluate their impairment level appeared to be limited and non-uniform across tasks (21). Activities occurring in the day after an on-call night could also be somewhat altered by the residual effects of SD. In this situation a 24% reduction of correct adenoma detection during colonoscopies, performed by experienced surgeons, were found (22). During on-call night shifts interns slept on average 6.93 h (6.84-7.03 h; 95% CI), significantly less than the average 7.18 h (7.06-7.30 h; 95% CI) recorded in residents who did not undergo shifts. Moreover in 17.5% of the nights residents did not sleep at all. Residents were shown to present an amplified negative emotional response to disruptive events as well as reduced positive effects of goalenhancing ones (23). A nationwide survey study conducted among 2,737 American residents of various specialties showed Abbreviations: ACGME, Accreditation council for graduate medical education; BAC, Blood alcohol concentration; CANTAB, Cambridge neuropsychological test automated battery; EEG, Electroencephalography or Electroencephalogram; EKG, Electrocardiogram; FLS, Fundamentals of laparoscopic skills; ICSAD, Imperial college surgical assessment device; GRS, Global rating system; KSS, Karolinska sleepiness scale; NREM, Non-REM; PVT, Psychomotor vigilance task; SI, Sleep Inertia; SD, Sleep deprivation; SSS, Stanford sleepiness scale; TMT, Trial making test; VEST, Virtual endoscopic surgery trainer; VR, Virtual reality; WCST, Wisconsin card sorting. that extended duration shifts, and their frequency, raised the risk of self-reported medical errors, patient fatalities, and attentional failures (3). In the same study, it was reported that the chance of incurring in attentional failures within the medical settings (e.g., falling asleep during patient examination or surgical procedure) was significantly higher for residents that followed a working schedule that included long shifts. It is important to note that self-reported medical errors are known to significantly worsen physicians' quality of life and also that they enhance burnout feelings (24). Associated to these consequences were a decreased empathy toward patients and an increase in the probability of incurring in self-perceived medical errors, hence building a negative spiraling toward, again, more perceived errors and higher distress.
Although the detrimental effects associated to SD have been widely investigated, only few studies have explored the effects of SD on surgical performance and the review of the conclusions of those papers lead to little or inconclusive evidences (25)(26)(27)(28)(29)(30).
This review aims to list and analyse the most relevant papers about SD and its effects in surgeons' dexterity and abilities, pointing out the differences of the findings and paradigms of investigation. Moreover, hereby we discuss methodological issues in the systematic review of the relevant studies, as well as practical suggestions to surgeons and health organizations.
The first part of the paper reviews the current literature and the state of the art about the effect of SD on surgical dexterity. Then, we will move on discussing the effects of psychostimulants on sleep deprived surgeons. Finally, we will review current safe countermeasures that could be adopted in the attempt to counteract the detrimental effects of SD on the work place.
A comprehensive search of PubMed until December 2018 was performed. Only papers written in English were included. The search query selected was: Sleep AND deprivation OR restriction OR fatigue AND surgeon * which produced 1,802 results.
Included study type were: previous systematic reviews, experimental studies comparing direct measures of surgical performance obtained using previously validated surgical rating scales or assessment methods (e.g., laparoscopic simulators), randomized controlled trials, non-randomized studies comparing groups' or subjects' performances. Studies administering experimental modifications that included exposure of subject to acute sleep deprivation or chronic sleep restriction were included in the review.
Papers lacking direct measures of sleep in either a subjective or objective way or concerning purely the study of surgeons physical or neuromuscular fatigue were excluded. Studies regarding retrospective analysis of surgical outcome after sleep deprivation or restriction were also excluded. Finally, communications to national or international meetings were excluded in the review.
After application of study inclusion and exclusion criteria, we performed an analysis of the references reported in included studies to identify further relevant papers following previously defined criteria. Therefore, starting from 1,802 results, a total number of 16 studies were included in the review.

SURGEONS SLEEP DEPRIVATION AND ITS EFFECTS ON DEXTERITY
In the recent years, great effort has been devoted into developing objective measurements of technical ability in surgeons. Surgeons expertise involves a broad spectrum of skills, ranging from technical abilities to leadership. Dexterity, defined as "the skill of performing tasks, especially with the hands" (31), is considered a necessary requirement for an accomplished surgeon. However, there is not a unique way to objectively assess dexterity. Commonly adopted approaches are: direct-indirect observation (e.g., using video recordings) with subsequent expert evaluation of the procedure, possibly defining a priori scoring criteria (31), use of animal models and autoptic product evaluation, motion analysis of surgical instruments (32) or surgeons hands (33). Several interfering factors, both modifiable and unmodifiable, can influence dexterity: (i) low ambient and skin temperature contribute to lower dexterity (34,35), (ii) being younger is correlated to better dexterity scores as well as being able to produce higher grip forces (36), (iii) wrong sized gloves impair dexterity (37), instead appropriate double gloves (with optimized thickness and gloves disposition) neither alter dexterity nor tactile sensitivity (38)(39)(40).
One of the first studies focused on understanding the link between surgeons dexterity and sleep was conducted by Goldman et al. (41). Specifically, the analysis of surgical performance was achieved via simultaneous video recordings and electrocardiogram (EKG) signals of the performing surgeon. Afterward, the videotapes were analyzed to evaluate surgical performance according to "previously described criteria which included adequacy of exposure, inappropriate motions and inefficiencies due to indecisive or inflexible activity." In summary, 5 of the 33 surgical procedures included in the study were carried out by a surgeon who slept less than 2 h during the preceding night. The procedures conducted under SD were evaluated as "marked with indecision." Moreover, the authors reported that under SD "poorly planned maneuvers exceeded 30% of the operative time." This study was the first to quantify, to the best of our knowledge, a tentative link between sleep quality and surgeons' dexterity.
In surgical settings, the acquisition of many skills, either technical or not, is moving from the operating room to the surgical skills laboratory through the use of simulations, allowing the interactive performance of the trained operator in an environment that replicates a real-world clinical scenario. Simulation can include anything from the use of standardized patients and synthetic materials or animal tissues to highfidelity advanced virtual reality (VR) systems (42). Most of the studies conducted so far on surgical dexterity under SD have employed laparoscopy simulators. However, despite a VR simulator may provide useful information about surgical performance, a perfect training model would be one closely resembling the complex reality of the operating room (43). In fact, it has been demonstrated that simulation training based on in vivo porcine models might lead to an ameliorations in surgical performance (44).
To this regard, the effects of SD on microvascular anastomoses, performed in a murine in-vivo model, were studied by Basaran et al. (45). In this study, only one experienced surgeon was recruited to conduct surgeries in 48 Wistar Hannover rats, thus representing the main limitation in this investigation. Anastomosis were performed on the proximal and distal stumps of the femoral artery, previously exposed and isolated. Many parameters were used to evaluate surgical performance, such as: (i) anastomosis time, (ii) error score calculated using the Selber checklist (46), (iii) the widely used global rating system or GRS (47), (iv) an autopsy score measured the day after the procedure, and (v) patency also tested the day after the procedure using the milking test (also known as double occlusion test). Results of the study showed that a progressive shortening of anastomosis time took place across the three sessions. The authors suggest that this could be due to an increased urge of the surgeon to conclude the procedure, anyway it cannot be ruled out that a progressive learning could have reduced the time needed to perform the experimental task. Performance measured with the Selber error checklist (46) was found to be significantly worsened by SD. More in detail, none of the scores about dexterity significantly changed, as opposed to the visuospatial ability found significantly worsened as well as the anastomotic leaking score. The overall score, assigned during the post-mortem examination, was also found significantly worsened when comparing the post-work or the SD session to the pre-call one. Overall the surgical procedure, as measured by the GRS standard (47), was found to be significantly worsened in SD condition. Moreover, the authors reported that all GRS items changed significantly in a pejorative way. The GRS score is articulated in different sub-items evaluating specific aspects of the surgical procedure. In this study the GRS sub-items evaluating the presence of repeated and unnecessary movements of the surgeon and the one evaluating the degree of time efficiency changed in a very significant (p < 0.001) manner.
The use of the in-vivo animal models has certainly several benefits, in terms of condition as it mimics conditions occurring during live surgery such as bleeding, pulsation and liquid filling of the vascular tree. Nevertheless, this method is rarely applicable because of its costs, ethical issues, and complexity of design. These reasons led to the development of various type of virtual simulators, as an objective mean of evaluation able to reproduce with a high degree of accuracy and content validity several surgical procedures or propaedeutic exercises of basic surgical dexterity (e.g., suturing). Simulation has also several drawbacks as usually it is not really linked to the daily tasks that a surgeon has to deal with (e.g., patient supervision outside the surgical theater).
A laparoscopic virtual simulator study conducted under SD pressure was designed by Taffinder et al. They directly investigated the effects of SD on surgical dexterity, taking advantage of the Imperial College Surgical Assessment Device (ICSAD), and a Mist-VR laparoscopic virtual reality simulator (48). Six residents took part in the study. Time to completion and number of errors were found to be inversely and linearly related to sleep duration. Surgeons exposed to SD made more errors (20%) compared to the undisturbed sleep condition group and spent longer procedure times (14%) compared to the undisturbed sleep condition group. Those increased impairments in performance were correlated with increases in stress and decreases in arousal levels. The recorded worsening in performance remained significant after arousal was taken into account as a covariate, further highlighting that SD induces stress as an independent variable.
Likewise, other studies have detected significant increase in the number of errors (45,(49)(50)(51)(52)(53). Grantcharov et al. assessed the effect on simulated surgical procedure on 14 surgery interns belonging to a gastroenterological surgical unit (50). A MIST-VR (Mentice Medical Simulation, Gothenburg, Sweden) laparoscopic surgery simulator was employed (54). The employed simulator embeds six different tasks that were selected after the study of the ergonomy of laparoscopic procedures (55). The median sleep duration during the night on call was of 1.5 h, with a range of 0-3 h. In SD conditions subjects took significantly more time to complete all task but the number 2 (all values are in seconds, 5.4 vs. 7.6, 5.6 vs. 7.8, 6.7 vs. 8.1, 15.0 vs. 18.1, and 18.2 vs. 23.8). The number of errors was significantly higher for tasks 1 and 6 (median number of errors, 0.6 v 1.0 and 1.4 v 3.5). A significant increase of the number of unnecessary movements was detected in task 5 and (7.8 v 9.4 and 6.1 v 8.2), the latter was the most complex and the suitable to better investigate the surgical performance outcome, as was previously tested in a porcine animal model (56). Another study by Eastridge et al. (51) used the same simulator and tasks adopted by Grantcharov. Thirty-five surgery residents participated. Experimental sessions were administered in a "rested pre-call" and "SD post-call" conditions. The cumulative number of errors recorded obtained by summing errors made in each of the three tasks was significantly higher in the SD condition. Economy of motion of either hands and time to complete the given tasks did not change significantly.
Another simulation study under SD was performed by Tsafrir et al. (52) who put care at monitoring and controlling possible confounding factors. A Lap Mentor (Simbionix) simulator was used. The study population included 26 residents who were divided in expert and naïve groups (those who attended at least 20 laparoscopic procedures were deemed as experts). No statistical difference between the two groups was present in terms of self-reported sleep duration, coffee consumption, handedness, gender, experience in computer or video games. After an initial supervised training phase, no significant differences in simulator's performance metrics were found between expert and naïve subjects. In both novice and expert groups, the simulation performance significantly worsened in the SD condition, specifically in efficiency (e.g., time to complete task), and safety (number of errors) performance metrics. Novices performance deterioration was significantly worse in some tasks when compared to experts. As an example, in the naïve subjects the total time to complete the task in the camera manipulation at a 30-degree was increased by 12.5% after SD (average 136 s after SD vs. 119 s during baseline), while in the experts there was a significant increase of 8.0% after SD (average 112 s after SD vs. 103 s during baseline).
All the studies reported so far have investigated the effects of acute sleep restriction-deprivation on surgical performance. However, a more realistic scenario is one in which surgeons are exposed to the effects of chronic SD rather than a single night without sleep or with restricted sleep (<1 h). In normal subjects, the cumulative effects of even a mild chronic sleep restriction (6 h of time in bed per night) can lead to neurocognitive impairments comparable to those following two consecutive nights of total SD in less than 2 weeks (8). Moreover, it has been shown that the subjective capacity to evaluate the effects of SD is limited and suffers from ceiling effects in the perception, leading the subject to underestimate the extent of the detrimental effects of sleep curtailment (8,57). In this context, a study has been performed in order to investigate the effects of being on call for seven consecutive days (49). The study enrolled 21 surgical interns from different specialties. A MIST-VR (Mentice Medical Simulation, Gothenburg, Sweden) laparoscopic surgery simulator was employed. Before the beginning of the study, participants were trained on a series of six different exercises of growing complexity. The maximal deterioration of performance in the two surgical dexterity assessment tasks was registered after the first night on call. The time to complete each of the two tasks (time in seconds for task one: bsl 86 ± 4.5 vs. SD 100 ± 12.3 p = 0.025; and task two: 36 ± 9.1 vs. SD 40 ± 10.2) as well as the number of errors in the first one (task one: bsl 86 ± 4.5 vs. SD 100 ± 12.3 p = 0.025) was significantly increased. Interestingly, the sub-group of emergency medicine interns exhibited a constant and significant decrease in performance throughout the week as opposed to all other sub-groups. Emergency medicine interns were the ones to sleep the least (they underwent total SD during each of the seven nights, self-reported by questionnaire), to cover the highest median number of steps per day (6453, range 5042-7905) and attended the largest sample of patients (median 13, range 12-15). Other studies compared the effects of SD on simulated surgical performance with mild levels of alcohol intoxication, as the effects of this second condition on attention are known to be similar to that of SD (9). In the study of Mohtashami et al., sleep deprivation was defined as a total sleep time in the previous 24 h of <3 h. Alcohol administration was regulated to reach the legal intoxication level of >0.08% mg/mL Blood Alcohol Concentration. Nine experienced gynecologists were included in the study. Technical dexterity was measured using a physical box trainer, three laparoscopic exercises (cup drop, rope passing, pegboard exchange) of increasing difficulty were performed. The performance resulted impaired after a night with <3 h of sleep (53). The comparison of performance achieved when intoxicated with alcohol or in SD condition was found to be similar. When intoxicated subjects were faster compared to the SD condition, but their performance was significantly poorer in the most difficult task. The same difficult task was the only one in which surgical performance was considered negatively altered after a review of the task, carried on in a blind condition by 3 expert reviewers. Another study (58) compared mild SD (average sleep duration 3.75 h, range, 3-5 h) without alcohol to 0.43 mg/L BAC condition. Five surgeons were recruited and performed three different experimental sessions, baseline (no SD and no alcohol consumption), SD (only sleep restriction) and lastly a combination of SD and alcohol consumption. Each subject was evaluated three time for each of the three conditions. Surgical dexterity was evaluated using a MIST-VR laparoscopic surgery simulator. Authors reported a significant worsening of performance for all metrics (average task duration, average number of errors, average diathermy time and average diathermy time erroneously applied outside the target).
Lehman et al. carried out a study (61) to better understand the psychomotor and cognitive effects of a 24 h shift on surgical residents. The study population was divided in two groups: 17 surgical residents (test group) and 13 medical students (reference group). The reference group was included in the study to include a clear picture of possible effects of residual learning that may occur during the testing. The reference group included only medical students with no prior experience in laparoscopy. The study used a surgical simulator called Virtual Endoscopic Surgery Trainer (VEST), the apparatus embeds haptic feedback and was validated in a previous study (67). Three different tasks were selected, for each task authors calculated two performance parameters named scoreTP and scoreE. ScoreTP represented a measure of time and instrument path length, while scoreE summarized error related scores. Two standard neuropsychological tests were also included in the testing sessions, (1) an attention and concentration performance test, the d2T test, (2) and a visuoperceptual and visual attention test, the trial making test (TMT). All subjects underwent 5 training sessions of 1 h each before the beginning of the test phase to control for possible confounding factors introduced with learning. No significant performance impairment was found in the comparison of scoreTP and scoreE obtained during the pre-call session and the post-call (after SD exposure). Mean self-reported sleep duration recorded during the on-call time preceding the post-call testing was 2.9 ± 1.4 h.
Jakubowicz investigated the effects of 24 h on call on a simulated sinus surgery (66). Unlike the previous papers reviewed, the simulator chosen for this study (ES3 v2.0.2, Lockheed Martin) was developed to simulate a complete surgical procedure and not only simple psychomotor tasks. All participants were trained with the simulator before attending the actual experimental session until saturation of the learning curve. Each subject was tested twice before a 24 h on-call period and after being post-call. Overall surgical dexterity score did not decrease significantly after 24 h on call, moreover the performance metrics showed learning during the administered experimental sessions.
Veddeng et al. focused their attention on gynecologist interns (64). Twenty-eight gynecologists were enrolled in the study; three expert surgeons were used as reference for the laparoscopic task. Each of the subjects received a training session on the VR simulator. Moreover, the subjects have been divided in different sub-groups according to their level of expertise. The surgical procedure simulated was a salpingectomy. Participants skills were evaluated according to three metrics: time taken to complete procedure, the total length of instrument movements path (centimeters), and an estimation of blood loss (milliliters). A cognitive test, the Cambridge neuropsychological test automated battery (CANTAB), has been also administered before or after the VR test. The study showed no significant differences in laparoscopic skills assessed after the on-call shift compared to the baseline measurements. The CANTAB testing scores assessed after being on-call recorded significantly higher reaction times, a significant improvement in the Paired Associates Learning test.
A similar study design was adopted by Elizabeth et al. (60), who investigated the effect of SD while using an ophthalmic surgical simulator (Eyesi, VRmagic, Mannheim). Nine residents were included in the study. Each of them was tested in three different conditions: (i) pre-call (with at least 7 h of sleep during the previous 24 h), (ii) post-work (same SD condition as previous condition but after an 8 h work day), and (iii) post-call (<3 h of sleep during the previous day). No caffeine consumption was allowed during a period of 12 h preceding the test. An attempt to control for chronic SD was made, as subjects were asked to sleep at least 6 h a day during the preceding week, but no objective assessment was clearly implemented. Subject were asked to self-report wake-up time, number of hours slept during the previous night and the number of hours slept during the previous week. The average self-reported sleep duration during the three conditions of the study was: 7.6 ± 0.6, 7.2 ± 0.4, and 1.9 ± 1.2 h. No statistically significant differences were found in both simulated tasks in any condition of testing.
Another study failing to identify difference in simulated performance after SD has been performed by Yi et al., who investigated surgical dexterity and psychomotor performance before and after one night-float shift (65). Nine surgical residents undergoing a night-float rotation were enrolled in the study and assigned to one of two experimental groups (night-float and 24 h call). For the study a LAP mentor (Simbionix) laparoscopic surgery simulator was used. Subjects had to fill out a questionnaire about sleep quality and quantity in the previous week and previous night, hours slept on shift, cups of coffee or other stimulants consumed while on shift (e.g., cigarettes), number of calls from staff during the shift, as well as the Epworth sleepiness scale. Even though residents being on-call for 24 h saw more patients (28 vs. 11 for the night shift only, p = 0.008) and walked farther (10,731 vs. 4990 p = 0.0037, measured suing a pedometer) their level of fatigue and sleepiness at the end of the shift was comparable to those working a 12 h night-float shift (ESS scores 15 vs. 11, p = 0.14). Surgical performance was assessed using a set of metrics available from the simulator, capable of measuring the number of instruments movements, the overall accuracy, the economy of movement, the time to complete the task and the speed of movements. None of the performance metrics revealed any statistical difference between respective baseline, recorded before the beginning of the shift, and the session administered after 12 or 24 h call. It might be relevant that the baseline number of hours of sleep for both groups was as low as 5.6 h for the 12 h shift and 5.4 h for the 24 h one. Since this sleep duration is rather low, experimental results may have been confounded by the effect of the superimposed chronic sleep restriction accumulated before the study begin.
A study conducted by Olasky et al. (62) tried to understand whether experienced surgeon's performance would have been differently affected by SD, compared to less experienced colleagues. Twenty-two surgical residents and novices were asked to use two different simulators, a box trainer complaint with the Fundamentals of Laparoscopic Skills (FLS) guidelines and Virtual Basic Laparoscopic Surgical Trainer (VBLaST). Both simulators were used to study the performance in the peg transfer task, a standard pick and place exercise. To understand the level of surgical expertise, subjects were classified based on post-graduate year and self-reported relevant experience. Moreover, they were asked to fill a form about overall fatigue level (well-rested/tired) and sleep-hour during the night before the task. Results reported that there was no significant difference in sleep hours or fatigue level across experience levels (F = 0.98, p = 0.05; F = 0.91, p = 0.05). While, no correlation was found between experience level and sleep hours (r = 0.16, P = 0.05), between experience and fatigue level (r = −0.04, p = 0.05), and remarkably between sleep hours and fatigue (r = −0.10, p = 0.05). Experience was instead found to be related with a positive relationship to performance on both FLS and VBLaST simulators (F = 6.14, p = 0.022, and F = 6.87, p = 0.016).
Schlosser et al. conducted a study on 38 surgeons assessing the effect of acute partial SD on surgical performance and analysis of bio-physiological responses to the different experimental conditions (63) (prior to a 24 h call, post-call, and after 24 h of rest); this is the first study that used objective measure of fatigue. This study employed a LapSim virtual reality laparoscopic surgical simulator. All measurements (Stanford-Sleepiness Scale (SSS), Saliva Cortisol-ELISA, pupillography, and the d2 Paper-Pencil Test) were assessed at the same time of day, between 09:00 a.m. and 10:00 a.m. To avoid for possible bias introduced by different experience levels, subjects were divided in groups of homogeneous experience level. Reported sleep duration relative to the three experimental conditions were the pre-call 6.7 ± 0.16 h, post-call 4.09 ± 0.28 h, and after rest 6.47 ± 0.15 h. Experienced surgeons felt significantly less sleepy when compared to junior residents and interns in the post-call condition (SSS sleepiness scores: senior residents 2.9 ± 1.2 vs. interns 3.95 ± 0.97 and junior residents 3.89 ± 0.78), despite a similar amount of sleep post-call. No significant changes of pupillary unrest indices and saliva cortisol concentration measured pre-call were found. Although, in the post-call condition, interns' cortisol saliva concentration was significantly higher if compared with the other two groups levels (interns, 7.52 ± 4.24 ng/ml, junior residents 4.34 ± 2.6 ng/ml and senior residents 4.55 ± 4.34 ng/ml). The results of the d2 Paper-Pencil alertness test improved in a constant fashion from the first pre-call condition to the last after-rest testing session indicating a possible effect of learning occurring during the test itself. Surgical performance metrics changed significantly only for the parameters measuring economy of motion, performance scores were found to be improved in the post-call session.
Uchal et al. tried to measure the impact of SD after a 24 h on-call shift (post-call group), comparing it to a standard 8-h work shift (post-work group) (59). Product quality and procedure effectiveness were evaluated in suturing a perforated ulcer on a laparoscopic foam hollow stomach. Subjects were randomly allocated to each group. No significant differences were detected in all the parameters taken into account and the surgeons operating in post-call group did not show any worsening in performance. Neither the accuracy of movements nor the number of errors made the experimenters concluding that the SD would have interfered somehow with their tasks. It is noteworthy that this is the only randomized controlled trial (RCT) conducted. RCTs are regarded as the highest level of methodological quality. Even though this care in experimental design, this study did not find significant differences analyzing the simulated surgical procedures. Beside this control and SD groups had significantly different exposures to SD (self-reported time slept in the 24 h before the experiment, 1.5 vs. 6.5 h), hence strengthening the theoretical ability of the study to identify any difference.
Tomasko et al. compared the surgical dexterity of two groups of medical students randomly assigned to a controlrested (at least 6 h of sleep in the preceding night) condition or to SD (<2 h of sleep) (68). Thirty-one subjects were recruited for the study. Two virtual reality laparoscopic surgical simulators were employed, i.e., the RapidFire (Verefi Technologies, Elizabethtown, PA) and the EndoTower (Verefi Technologies, Elizabethtown, PA). Both simulators provided performance scores including number of errors, time to complete the task, efficiency of movement and time spent with off-axis camera. All subject underwent a preliminary phase of training. In addition to the surgical dexterity measures subjects were asked to fill the Epworth sleepiness scale and the NASA Task Load Index. Subjective sleepiness was significantly higher only in the SD group: ESS score at baseline 5.07 ± 2.79 vs. SD 12.27 ± 6.77. The between group comparison of the RapidFire simulator scores relative to the rested vs. SD groups revealed no statistically significant difference. The comparison of performance scores of the SD group recorded during rested and SD condition revealed a significant improvement of the scores during the SD test session only for the simplest task (RapidFire Level1 scores, 94.0 ± 8.4 vs. 99.7 ± 9.2, p = 0.003). Instead the control group registered a significant improvement only in the most difficult task (RapidFire Level3 scores, 67.2 ± 8.7 vs. 70.4 ± 8.6, p = 0.03). The SD group showed significantly higher NASA TLX scores in the items: total workload, frustration, performance, physical demand, and temporal demand when compared to the restedcontrol group. Overall SD did not alter surgical dexterity but increased the subjective mental and physical effort of the subjects to achieve comparable surgical performance scores.

OBJECTIVE AND LONGITUDINAL TRACKING OF SLEEP AND DROWSINESS
Drowsiness can be measured and tracked using subjective and objective approaches, depending on the aim and contingencies dictated by the boundary conditions of experimental design, subject compliance, and so forth. A subjective assessment takes into account a personal estimation coded in a measure exploiting validated scales. The most used scale of drowsiness measurement is the Karolinska Sleepiness Scale (KSS), which consider a ninepoint rating scale with verbal anchors attached to each discreet level. This kind of approach has a modest time resolution (when compared to objective instrument-based techniques). Moreover, an active participation of the subject is requested for each measure and hence a perturbation in his inner state is produced (e.g., eliciting an arousal response). Regardless these limitations, these methods have an easy application and can be used to collect data from large pools of subjects (e.g., using mail questionnaires).
Objective measures of drowsiness can be obtained using behavioral or physiological measures. The real-time monitoring of vigilance state has been extensively studied in the transportation industry (e.g., aviation, railway workers) and by using of various type of driving simulators [for in depth reviews see: (69)(70)(71)]. The standard process of vigilance state scoring requires the use of system capable of recording six EEG channels, two channels electrooculogram, and two-channels electromyogram (72).
Among behavioral measures of sleepiness, simple reaction times are well known to be affected by SD and sustained wakefulness [e.g., (8)]. The standard version of the PVT test lasts exactly 10 min, although shorter variants of 3 min were studied and compared to the longer one (19). As subjects are required to actively carry on the task itself, this technique might be unappropriated where absolute unobtrusiveness is required.
Physiological measures can be used to detect, measure and track drowsiness and sleep. Miniaturized single channel EEG systems were able to distinguish with high accuracy [98.3 ± 4.1% vs. standard AASM EEG scoring (72)] sleep-drowsiness in aviation pilots during long-haul flights (10 ± 2.0 h) (73). Another algorithm reached 83.6% of accuracy in detecting drowsiness using a neural network model fed with 7 descriptors extracted from wavelet analysis of spectral data calculated from three EEG channels (74). It has also been proposed to use 1D convolutional neural networks to create predictive models of drivers cognitive performance starting from EEG data (75). It is also interesting to note that some of the available drowsiness detection systems are computationally efficient and can be used in real-time systems. This kind of approach could be effectively replicated in studying surgeons during night-shifts or while sustaining long sleep curtailment periods, in order track sleepiness. Currently, there are various proprietary and open systems that can be used and/or adapted to the surgical scenario in order to estimate vigilance state (76) and workload metrics (77,78). Other physiological measures of drowsiness have been repeatedly shown to be useful for predicting drowsiness in specific context, such as (i) PERCLOSE (79,80), (ii) blinking rate, (iii) eyelid movements (81), (iv) head nodding, (v) leap stretch (82,83), and so forth slow eye movements were shown to correlate with EEG theta and delta power band as well as with the nadir tympanic temperature and subjective sleepiness (84). Although these relations were proven to be robust only in a closed eye condition, hence limiting severely the usefulness of this metric in the field. Some of these physiological markers of drowsiness can be used to effectively prevent accidents caused by low vigilance (85). Also urinary melatonin and cortisol in saliva were found to be significantly different when comparing precall with post-call values (86).Hearth rate variability is another physiological biomarker that can be used to track SD effects on alertness and reaction times (87). The power spectrum of RRinterval in the frequency between 0.02 and 0.08 Hz classified subject performance at the PVT with sensitivity and specificity similar to that of PERCLOSE and superior to that recorded using EEG power spectrum metrics. RR power in the aforementioned frequency band correlated with the 40 h trend of PVT lapses.
An indirect source of information regarding sleepiness and drowsiness may come from the longitudinal monitoring of sleep schedule through the use of Actigraphy. Actigraphic recordings also shows that during the on-call day and in the first day post call sleep time was significantly increased. In the first post call day, actigraphic activity was recorded to be at the lowest level. Actigraphy can be used to track sleep non-invasively over long time span (88). This technique has been used to track and measure the exposure of the SD in surgeons and other groups particularly exposed to SD (e.g., the military, pilots, transportation industry). McCormick et al. used actigraphic data to objectively track sleep duration and timing (89). The mean sleep duration was 5.3 h with mean individual average sleep time than spanned from 2.8 to 7.2 h. The study used sleep data to feed the so-called sleep, activity, fatigue, and task effectiveness (SAFTE) model that can be used to estimate the individual performance of certain cognitive domains (e.g., attention, reaction times, decision making) (90). They found that residents were fatigued during waking state for 48% and impaired during 27%.
Moreover, the spectrum of non-medical devices validated against medical equipment counterparts, or at least developed aiming to track sleep variables, is constantly growing. As current literature is still relaying commonly on subjective evaluations of sleep metrics, such non-medical devices may add valuable information to future research. Anyway, to guarantee the reliability and reproducibility of measures, non-medical devices used for research purposes should be validated for the specific application of sleep monitoring before being enrolled in actual research studies [e.g., (91,92)].

COUNTERMEASURES AGAINST DETRIMENTAL EFFECT OF SD
As a general guideline to ensure optimal performance throughout the work shift, proper sleep hygiene, and napping scheduling should be used. There are meta-analytic evidences suggesting that napping is an effective counter measure to mitigate the adverse effects of SD (93). Short naps (5-15 min) can positively affect vigilance in the next 1-3 h (94) and are not followed by significant sleep inertia (SI) impairments (95,96), as they almost do not show slow wave sleep. Naps longer than 30 min can produce positive effects on cognition for several hours. The effects of SI should be thoughtfully considered when scheduling naps in an operational setting (97). Dissipation of SI detrimental effects over time is thought to happen within 20 min from awakening (98) but is affected by a multitude of factors. Among them: the amount of slow wave sleep during the nap, the actual sleep stage while being awaken, circadian point. Interactions of these factors may worsen SI detrimental effect and determine a lengthening of SI effect over time. Suddenly waking up due to an emergency response might lead to act while in an altered state due to SI, perhaps affecting both cognitive and motor sphere (97). In a population of surgery residents, the administration of a self-regulated length nap, had significant positive effects on executive skills (99). The nap group slept on average 147 ± 71.0 min during the intervention night and recorded a better performance in a task switching test and better reaction times measured using the standardized GoNoGo task. While programming napping, another parameter that can be optimized is nap timing. To attain the best positive effects, already SD individuals should take naps as sleep pressure begins to rise, while rested individuals take more benefits in extending waking as long as sleep pressure is not an issue (100). Concerning sleep hygiene, it has been shown in a population of interns, that there is an inverse correlation between total sleep duration and the amount of work hours (11). Moreover, it was demonstrated that optimizing the work schedule by removing night shift of more than 16 h and reducing the total amount of work to 80 h per week, resulted in a reduction of the number of attentional lapses during work and of the number of serious medical errors in an intensive care unit (4).
In the context of sleep and performance optimization, the circadian modulation of sleep propensity should also be considered. This due to the evidence that some part of the day are associated to physiologically different sleepiness and sleep propensity (101), knowing this work schedule can be done in order to avoid peaks of sleepiness and also try to exploit naturally low sleepiness periods to address complex and difficult tasks. In context were scheduling is unpractical or impossible to implement, bright light therapy could be used to counteract at some extent the endogenous drive to sleep (102). Another approach might be the optimization of sleep quality during naps or nocturnal sleep. As an example, subject exposed to pink noise while sleeping (103) demonstrated better sleep quality measured as a significantly lower sleep fragmentation.
In specific situations of emergency or when fatigue avoidance and a sleep optimization approach cannot be adopted, a possible alternative countermeasure that could be adopted to temporarily alleviate the effects of SD on cognition is the use of stimulants and wake promoting drugs.
A widely available substance acting as a wakefulness promoter is caffeine [for an in depth review on the topic e.g., (104)]. Caffeine intake was shown to reduce sleepiness and to reduce waking EEG theta activity and 0.75-2 Hz band during recovery sleep after SD even if caffeine saliva concentrations were extremely low (below the detection limit for some subjects, population average concentration 1.8 ± 1.0 µmol/l) (105). The relationship between caffeine dosage and response is still unclear, attention performance seems to be affected even with relatively low doses but it might reach maximal effect with middle-high doses of about 200 mg (106). Associated to this very same middlehigh dosage, caffeine can induce several undesired side effects that can potentially harm surgical procedures, such as: anxiety, nausea (107) and more importantly for dexterity, an increase of hands tremors: in surgeons 200 mg of caffeine were found to enhance tremor by 33% (108,109). While it is recognized that caffeine enhances performance in simple tasks, its impact in complex tasks is still controversial. Moreover, the caffeine effects need to be considered in pre-dose level of arousal (110,111). Caffeine effect is also known to be mediated by specific genetic variations. A variation of the adenosine deaminase (112), expected to be occur in 8-12% of individuals, was found to affect both sleep and waking EEG. Subject with the mentioned variation showed higher amplitude delta oscillations as well as theta-low alpha activity. The latter two EEG frequency bands are known biomarkers of sleepiness and overall homeostatic sleep regulation (113,114). Moreover, it was shown that subjective sensitivity to caffeine effects are paired with significantly different responses to SD (115). Adenosinergic mechanisms were found to modulate changes in both behavioral measures of the effects of sleep deprivation (specifically using the PVT) and on the same EEG frequency bands mentioned before. Moreover, subjects showing more marked detrimental effects of SD where also those to obtain the greater benefits (as shown by PVT) from caffeine consumption, independently from their sensitivity to caffeine.
A study conducted by Aggarwal et al. focused on the understanding of how the administration of caffeine plus taurine may affect the surgeon's performance on the Minimally Invasive Surgical Trainer VR (MIST-VR), in sleep deprived condition (116). Authors compared rested condition (baseline), with SD and placebo administration and SD with the supplementation of caffeine and taurine as stimulants support. Eighteen medical students were recruited. Participants underwent a preliminary phase of supervised training using the laparoscopic simulator. Subjects were randomly allocated in one of the three study arms: (i) SD-placebo (vitamin C and calcium tablets), (ii) SDcaffeine (150 mg), and (iii) SD plus caffeine and taurine (150 mg + 2 g, respectively). During the experimental sessions, the various experimental tasks were administered in the following order (i) surgical simulator assessment, (ii) PVT, stroop task, (iii) Wisconsin card sorting test, and (iv) mental arithmetic test. Surgical simulator performance scores significantly worsened in SD-placebo condition when compared to baseline (time taken to complete the task, median 41 vs. 35 s; p = 0.016; economy of movement 3.25 vs. 2.95 m; p = 0.016; number of errors 66 vs. 59; p = 0.021). When only caffeine was given in the experimental manipulation, there was no statistical difference between simulator metrics pertaining to the baseline and SD plus caffeine conditions, except for the number of errors of the manipulate diathermy task that were significantly higher in the SD-placebo condition (63 vs. 59; p = 0.046). PVT recorded reaction times were significantly longer in SD-placebo condition (377 vs. 299 ms; p = 0.008), in the caffeine and caffeine plus taurine conditions reaction times were not different from the one recorded in the baseline condition (307 vs. 299 ms; p = 0.214 and 326 vs. 299 ms; p = 0.110, respectively). Subjective sleepiness was higher for all conditions including those with experimental manipulation. Scores obtained in the Stroop, WCST and mental arithmetic task did not change significantly in any of the experimental conditions. The data collected in this study suggest that the lack of sleep negatively influences simulated laparoscopic psychomotor skills. This hold true even though the administration of psychostimulants, such as caffeine and taurine.
Among stimulants drugs, modafinil is used for the treatment of excessive sleepiness syndrome in shift work (117) and in narcolepsy (118). Particularly important in surgeons performance, this drug does not induce tremors, anxiety and nausea, that are instead associated to high dosages of caffeine (107). Moreover, it has been observed that the administration of 200 mg of modafinil in SD physicians can improve the performance at a CANTAB neurophysiological test but not in basic procedural tasks (119).

CONCLUSION AND DISCUSSION
Current evidence emerging from the literature does not provide a compelling ground to draw consistent conclusions about the effects of SD on surgical dexterity. A number of studies found no significant variations following either acute SD or sleep restriction (59, 61, 62, 64-66, 68, 116, 120), while others found a worsening of quantity and quality performance (45,(48)(49)(50)(51)(52) or even improvement in SD condition (63,121). More specifically, a technical limitation in a systematic overview concerns the array of heterogeneous metrics of performance used to evaluate dexterity, hence preventing a robust statistical analysis or metaanalysis. Moreover, the inclusion criteria concerning the extent of sleep deprivation vary considerably between studies and some of these criteria overlap with the broader condition of fatigue, resulting not only from the lack of sleep. Unfortunately, no study in the field included sleep extension as a control condition, further limiting the available experimental tools for the investigation of sleep-related performance. Sleep extension could be implemented asking experimental subjects to comply to a predefined daily time in bed duration of about 10 h for several consecutive days. The well-established PVT could be paired with this sleep extension approach to objectively measure and define sleep extension duration for the specific experimental population. Such experimental methodology can dissipate previous detrimental effects of chronic sleep restriction and acute SD on neurobehavioral and physical performances (122). A good example of the use of this methodological approach can be found in (123) were sleep extension was paired with the study of objective and subjective sleep metrics and to the assessment of both cognitive and motor spheres. Observed inconsistency in results could be also explained by several other factors (see Table 1 for more details). A methodological issue found in many studies is the lack of objective and longitudinal assessment of sleep quantity and quality, either before the study onset and across its execution. Sleep metrics are often measured relying only on subjective methods (e.g., questionnaires), which are known to be relatively unreliable estimators of the effects of SD (8,57,124). Moreover, the extent of chronic sleep restriction, which sums up to the usually acute SD imposed in many experimental designs, is usually not measured or taken into account. This can lead to the underestimation of the contribution of over imposing chronic sleep restriction, thus confounding any sleep-related measure on performance. Moreover, a possible ceiling effect in performance could limit changes in surgical dexterityperformance as the chronic exposure to SD could impair the baseline performance, thus masking the acute SD effects. Actual exposure of experimental subjects to SD (e.g., calculating the percentage of waking during the SD phase using an objective examination such as polysomnography) and the effect of brief episodes of sleep during the SD phase, are key variables and should be collected and properly analyzed in future work. To solve this issue, we discussed several available methods to track SD and drowsiness (see section Objective and longitudinal tracking of sleep and drowsiness). Specifically, literature about fatigue modeling on the workplace gives many valuable and detailed clues on how to properly track and measure SD related risk in safety critical workplaces (125,126). The need for non-invasive, longitudinal and objective tracking of SD should also be considered as a standard feature for future studies, with the specific aim of gathering normative data about sleep hygiene and SD in surgeons, which are now lacking. Good examples of the use of objective measures to track sleep among physicians are the studies of Basner et al. (20), and the one of McCormick et al. (89). The aforementioned methods could be implemented in future experimental paradigms where surgical errors, defined a priori as violations of approved procedural checklists, should also be included. In line with an approach already successfully implemented in the investigation after aviation accidents (127). This methodological care could pave the way to a better understanding of the relationships between one surgeon's sleep history and medical errors. Remarkably only one study (49) directly assessed the effects of chronic sleep restriction on surgical dexterity. Since sleep restriction is the most prevalent condition to which subjects are exposed, future research should focus specifically on implementing experimental paradigms useful to shed light on the effects of this condition on surgical dexterity.
Another common confounding factor frequently not taken into account is the use of stimulants such as caffeine (see Table 1), which can by itself alter neurobehavioral performance and attention, as debated in detail in section Countermeasures Against Detrimental Effect of SD. Caffeine effects on surgical dexterity is still a matter of debate, as specifically reviewed in Fargen et al. (109), and needs more research to be safely and consciously used in the field. In those cases where the use of caffeine cannot be avoided, its uptake should be considered as a confounding factor, given the power of new emerging modeling tools capable of predicting caffeine effects on cognition and attention (128,129).
It is also well-know that resistance to SD effects is a traitlike individual characteristic (130), and that some genetic polymorphisms modulate the individual neurobehavioral response to the effects of sleep curtailment and deprivation (131)(132)(133)(134)(135). Understanding genetic coding of subjective SD vulnerability could be useful to identify the most vulnerable subjects, enabling early intervention to optimize work schedule and sleepiness of these individuals according to their physiological characteristics.  Marked with an asterisk, papers published in the last 5 years. The effect of SD on surgical dexterity is simplified with the following symbols: no effect ↔, detrimental effect ↓, positive effect ↑, NA, not applicable. Statistical power for null shown: express if a statistical power analysis was explicitly reported in the study; this to prove that the statistical analyses performed in the study had enough statistical power to have the ability to detect significant negative changes.

Frontiers in Neurology | www.frontiersin.org
A relevant open research question that should be addressed in future studies is whether it exists a difference in the effects of SD between groups of experienced and naïve subjects. Some studies addressed directly this question (6,52,63). A significantly worse surgical performance only of the unexperienced group (n = 14 vs. n = 12 for the experienced group) was found under SD by Tsafrir et al. The study of Lehman and associate found no differences in the SD condition between experts and naïve subjects (n = 9 and n = 8, respectively). In the same study the two groups of subjects scored differently only in the first two of the five training sessions that took place before the baseline assessment. Same conclusion obtained in the study of Schlosser et al. where no differences in surgical performance score was found between inters (n = 19), junior (n = 9), and senior residents (n = 10) sub-groups in any of the experimental condition. In the study of Olasky et al., participants experience was classified using Post Graduate Year (PGY), there were 5 PGY1, 5 PGY2, 2 PGY3, 3 PGY4, 3 PGY5, and 3 surgical fellows. Experience correlated significantly only with surgical performance scores. No difference in sleep hours or fatigue level were found for each level of experience. The rather small sample size of the groups in these studies may have hampered the capacity to reach an adequate statistical power, although an a priori formal testing of this hypothesis is difficult as the effect size for the experience variable is largely unknown. Two other studies (66,136) reported in their discussion the hypothesis of a higher impact of SD on naïve subjects. Perhaps this hypothesis could be supported by evidence coming from basic research that highlighted the relevance of sleep in motor learning. Sleep may play a role in mediating the changes needed to link neuron-scale behavior to the actual movement (137-139). Sleep is thought to contribute to establish favorable plastic changes that ultimately link sparse neural activation patterns to specific motor behaviors (140). Moreover, during sleep consolidation of motor engrams may occur (137, 138), determining an offline learning process that ultimately enhance learned motor skill. Thus, exposure to SD should be actively reduced during the initial phase of training of interns, as motor learning may be happen sub-optimally as well as academic performance (141).
Experience, and especially the effects of learning, may also be a confounding factor in measures of surgical dexterity. A simple solution to this problem is to enroll in future studies only proven experienced surgeons. A drawback of this approach may be the paucity of such skilled surgeons able to join such studies. Alternatively, authors should rely on validated and repeatable training curricula [e.g., laparoscopy (142), robotically assisted surgery (143)] to ensure that the saturation of the learning curve of the study task is reached by each study participant before the actual experimental trials. As different experimental tasks may show different learning curves, the careful selection andor aggregation of experimental tasks should be carefully taken into account during experimental protocol definition. Regarding learning, several studies included in this review opted to include various form of training procedures (48-52, 60, 61, 64, 66, 68) or recruited experienced surgeons (45), while others did none of the previous (53,59,62,63,65). All studies that did not employed training of participants either failed to find any difference between the SD and baseline condition or recorded better performances after SD. It is noteworthy that even using scheduled training some studies found no difference while subjects were SD. Hence, it may be speculated that different learning curves may interact in different ways with task training and more importantly with the effects of SD. To counteract this confounding factor associated with "on-line" learning, future studies should use only reproducible surgical dexterity measures taken with validated surgical simulators [e.g., Lap Mentor (144), LapSim (144),MIST-VR (55,56), Eyesi (145,146)] or during in-vivo procedures using reproducible and standardized animal models.
Surgical population may also suffer from a self-selection bias as individuals, who are particularly vulnerable to SD, do not pursue surgical specialization or intentionally avoid participating in this kind of studies. Taken together, these lines of evidence may play a role in population selection of many studies within the perimeter of this field and significantly bias the results. Some of the subjective peculiarities in the response to SD can be tracked and measured using objective data, e.g., using the widely validated PVT task. Some bio-mathematical models of alertness (built on PVT data) can account for individual variability in performance (147,148) and hence could be progressively used and developed in order to classify subjects as resistant or vulnerable to SD effects, within certain skill competencies. Moreover, this kind of models should be included in the development of novel or experimental work schedules, has they provide a safe way to simulate the impact of various schedules on physicians performance (120), thus emphasizing the managerial implications.
From SD literature we know that higher levels of SD (either chronic or acute) are linked to raised state instability (149). Several fMRI studies showed that SD is linked to a reduction of the signal coming from the dorsolateral prefrontal cortex (150,151), an anatomical area particularly important for the execution of sustained attention tasks. This reduction in cortical activity was also found during visuo-spatial attention tasks in the intraparietal sulcus and extrastriate visual cortex (150). During prolonged wakefulness a progressive raising of theta and delta EEG power can be observed (152), this change happens together with a modification of cortical and corticospinal excitability (153). The increase in EEG theta band power can be observed in both waking and sleep EEG, suggesting that sleep homeostasis processes affect in a closely related fashion both sleep and wake. Moreover, the raised theta power could underlie compensating mechanisms against the effect of prolonged wakefulness. During wakefulness the raised theta power can be used as an objective measure of sleepiness (152). Cortico-spinal excitability and standardized motor threshold were measured experimentally using transcranial magnetic stimulation (153), in SD conditions a cortical excitability was found to be lowered and a raise of the standardized motor threshold was observed. In the same study authors were able to link the raised power of delta and theta EEG activity with the cortical excitability. It is also noteworthy to mention that sleep can also occur locally even if it is generally considered a global state of the brain (154). Local sleep occurs when part of the brain is transitioning to a more sleep like state, regardless the conscious and awake state of the subject. Recent findings, based on intracerebral recordings in humans, showed local and selective alteration of single neuron spiking after SD (155).
Moving further to the analysis of dexterity and fine movements, we did not find a comprehensive agreement between studies on the effects of SD on these skills. The bimanual fine motor coordination measured using the Purdue pegboard task was found to be impaired during SD in one study (156); however, it was observed substantially unchanged in another one where pegboard testing changed significantly only in the bimanual subtask (157). Another study using the O'Connor dexterity apparatus (158), found that after a 24-h call, finger dexterity changed in a pejorative way. A previous study used a comparable hand-eye coordination test but no differences were detected in the scores obtained across the two conditions (159). Talking about movement dexterity per se, we found that the majority of studies (5 studies out of a total of 6), that used the MIST-VR virtual laparoscopy simulator (see Table 1), found worsening in objectively measured surgical dexterity. Putting things in perspective, the overall number of studies that found worsening of surgical dexterity were 6 out of 16 included studies, for a comparison 8 out of 16 did not find any significant difference. The MIST-VR simulator is characterized by a steep learning curve, can differentiate surgeons vs. non-surgeons users (55) and provides surgically relevant metrics validated in animal models (56). It should be then considered to investigate in deep why such a simulator was able to measure significant differences in surgical dexterity while others did not. To further investigate these topics, the interested reader may consult a comprehensive reviews of currently available VR surgical simulators (160) and embedded tasks classification (161). Another major point to understand, as partially explained earlier, is the impact on final measurement of subject experience and expertise level, that may be mitigated by the relative short time needed to familiarize with the MIST-VR system (55).
Overall, the available literature does not provide a clear understanding of the effect of sleep deprivation on surgical dexterity. Available research presents inconsistencies in the measures of surgical dexterity and in the experimental designs employed. Known confounding factors acting on surgical dexterity measures, such as surgical expertise level, use of stimulants and possibly circadian performance variations are not consistently and objectively measured and included in the analysis of the results. Moreover, current studies did not directly and objectively measured sleep quantity and, also importantly, quality metrics, relying instead on subjective sleep measures that are known to be affected by subjective bias. These heterogeneities prevent the possibility to perform a meta analytical analysis of the available studies in the field, hence preventing us to achieve an objective understanding of the effects of sleep deprivation on surgical dexterity, thus encouraging further investigations.

AUTHOR CONTRIBUTIONS
TB and EC performed literature review and wrote the manuscript. PdA, PD, and AM participated with insightful discussions during preparation and manuscript revisions. UF and GC supervised the entire review activity and contributed in the manuscript organization of contents, actively participating with insightful discussions during manuscript preparation and revisions.