Development and Initial Validation of an Acute Readiness Monitoring Scale in Military Personnel

Personnel in many professions must remain “ready” to perform diverse activities. Managing individual and collective capability is a common concern for leadership and decision makers. Typical existing approaches for monitoring readiness involve keeping detailed records of training, health and equipment maintenance, or – less commonly – data from wearable devices that can be difficult to interpret as well as raising privacy concerns. A widely applicable, simple psychometric measure of perceived readiness would be invaluable in generating rapid evaluations of current capability directly from personnel. To develop this measure, we conducted exploratory factor analysis and confirmatory factor analysis with a sample of 770 Australian military personnel. The 32-item Acute Readiness Monitoring Scale (ARMS) demonstrated good model fit, and comprised nine factors: overall readiness; physical readiness; physical fatigue; cognitive readiness; cognitive fatigue; threat-challenge (i.e., emotional/coping) readiness; skills-and-training readiness; group-team readiness, and equipment readiness. Readiness factors were negatively correlated with recent stress, current negative affect and distress, and positively correlated with resilience, wellbeing, current positive affect and a supervisor’s rating of solider readiness. The development of the ARMS facilitates a range of new research opportunities: enabling quick, simple and easily interpreted assessment of individual and group readiness.


INTRODUCTION
Maintaining capabilities spanning physical, cognitive, emotional, skill and team domains are common to many professions, for example: construction (Yip and Rowlinson, 2006;Nahrgang et al., 2011); nursing (Kuiper and Pesut, 2004); medicine/surgery (Elfering et al., 2017); sales/marketing (McFarland et al., 2016), and many more . Research in occupational psychology offers frameworks such as the Job Demands-Resources (JD-R) model , which specifies that the balancing of these capabilities against job demands predicts: job burnout (e.g., Demerouti et al., 2001;Bakker et al., 2005), organizational commitment, work enjoyment (Bakker et al., 2010), connectedness (Lewig et al., 2007), and work engagement . At present, the majority of this balancing -through careful management, planning, training and recruitment -remains a challenging, time-demanding and slow (or infrequent) task, with diverse approaches and methods deployed in different contexts (Chuang et al., 2016;Der-Martirosian et al., 2017;Shah et al., 2017;Sharma et al., 2018). Against this backdrop, offering a brief, psychometrically validated, easily interpreted and widely applicable assessment of acute performance readiness may substantially improve the performance planning of personnel and their managers.
In the military, just as in other occupations, individuals and groups are developed, supported and assessed in relation to their integrated capabilities, encompassing the physical, cognitive, and psychological, as well as role-specific skills, team functioning, and equipment (Peterson et al., 2011;Reivich et al., 2011;Training and Doctrine Command, 2011). Indeed, military personnel around the world perform important roles extending beyond direct combat, to include: remote operations; peacekeeping (Bester and Stanz, 2007); protecting key assets (borders, heritage sites, endangered species); cyber activities; emergency responding (e.g., floods, bushfires, earthquakes); and collaborating with other nations (Rollins, 2001;Faber et al., 2008;Apiecionek et al., 2012;Roy and Lopez, 2013). In this way, the diversity of military roles reflects many professional and occupational settings. This diversity comprises a complex array of duties, priorities, and constraints: requiring individuals to be highly capable, strategically adaptive, and prepared to the highest standards (Rutherford, 2013). Accordingly, personnel must remain ready to perform their job role, across a wide array of capabilities, often for prolonged periods of time.

Readiness Profiles as Indicative of Resilience
As well as seeking to facilitate improved management of role readiness in acute timeframes, we also sought to respond to recent critiques of the concept of resilience -which has been adopted in workplaces around the world (Robertson et al., 2015;Crane, 2017). Current research has tended to focus on stable individual attributes that predict positive adaptations to adversity ("stable protective or mitigating personal factors"e.g., Rutter, 1987;Connor and Davidson, 2003). This focus has constrained the ability to also examine resilience's theoretical emphasis on the acute interaction of person-in-environment and the dynamic, interactive nature of resilience (Windle et al., 2011;Pangallo et al., 2015). Further, most current research in resilience characterizes the "positive adaptations" as exclusively referring to mental health or subjective wellbeing, when the positive/desirable response might also refer to performance (i.e., sport) or physical recovery. Responding to the research priorities of the Australian Army we set out to develop an instrument that would assess short term performance capability -to capture the desired resilience profile of responding to change-and-adversity that their personnel should demonstrate (Gilmore, 2016). Accordingly, this conceptualisation of resilience should encapsulate performance capability: (a) persisting/thriving through challenging situations; (b) rebounding following setbacks; (c) mitigating detrimental effects if-and-when they occur; and (d) learning and improving from challenges -including deliberate and planned challenges (e.g., training) as well as unexpected or unplanned challenges (ranging from the theater-of-war to also include family issues, interpersonal conflict and equipment failures -cf. Richardson, 2002-see also Forces Command Resilience Plan, 2015. This focus on the dynamic maintenance of individual and group/team capability through various circumstances may represent a different definition of resilience to some offered in the literature, some of which focus on relatively stable attributes of the individual Indeed, The Australian Army currently defines resilience as "the capacity of individuals, teams and organizations to adapt, recover and thrive in situations of risk, challenge, danger, complexity and adversity" (Forces Command Resilience Plan, 2015). Another key difference between "resilient force capability" and previous work on resilience is the emphasis on overall performance capability as opposed to health, mental health, and stress coping -the traditional foci of resilience research. Nevertheless, the elements of a dynamic process (Luthar et al., 2000) responding to different forms of challenge or adversity (Rutter, 1987;Lee and Cranford, 2008) and a repertoire of resources and tendencies (Agaibi and Wilson, 2005) all remain consistent with resilience research.
To better reflect this emphasis on resilient performance capability, we focused instead on the concept of "readiness, " to reflect an acute, situational state, and assessing perceptions of what can be achieved or attempted in the immediate future (cf. Grier et al., 2012). In this research, we proposed that changes in such a readiness state in response to recent events may, over time, be used to infer individual and group resilience. For example, an individual who maintains high levels of readiness and capability through significant challenges could be viewed as more resilient than someone whose readiness was impaired by the same, or even less significant, adversity (similar logic could apply for inferring physical or emotional resilience from acute response profiles). An instrument assessing readiness in this way would not only reflect a response to recent critiques of resilience measurement (e.g., Windle et al., 2011;Pangallo et al., 2015), but it would better facilitate the comparison and analysis of situational psychological perceptions to objective indices of physiological stress and resilience indicators such as cortisol, testosterone, and heart-rate variability (e.g., Hellewell and Cernak, 2018).

Defining and Operationalizing Readiness
Based on this proposition that acute subjective ratings of "readiness" represent a potentially useful tool for supporting performance and training, then the next step is to conceptualize acute readiness. We operationally defined "readiness" as an acute state of preparation and capability to perform any key task or role, in the immediate future. We expected this readiness state to fluctuate in response to recent tasks, for example those causing fatigue. This conceptualization differs from some established approaches, detailed below, which reflect a more general, chronic development of skills or capability. To be useful for informing immediate performance management and key decision-making, the instrument we developed needed to focus on the acute assessment of readiness.
As an example of how readiness has previously been monitored, the United States Army defined unit readiness as "the ability of a unit to perform as designed" (Dabbieri, 2003;p. 28), assessed in four areas: personnel, equipment-on-hand, equipment serviceability, and training (Army Regulation 220-1, 2003). Personnel readiness indicated the extent to which key roles in the unit were occupied by trained and capable individuals -e.g., percentages of fulfilled positions, whether they were available for deployment, and whether they were qualified/trained for their assigned positions. Equipment-onhand referred to the extent that the necessary equipment was available to perform the unit's role/mission: typically measured as a percentage of available-versus-specified equipment. Equipment readiness indicated the extent to which the equipment-on-hand was functional and operational. Training readiness indicated how soldiers individually, and the unit collectively, are prepared to execute assigned tasks and missions -exampled indices might include commander's rating of soldier individual performance of wartime tasks, rating of the unit's collective performance, and estimated number of training days for soldiers and the unit needed to be ready to perform such tasks (Griffith, 2006). Under this approach to monitoring readiness, numerous bureaucratic and administrative records had to be combined, often taking significant time and resources, to build a representation of a unit's readiness. These evaluations would be, by necessity, infrequent and after-the-fact: relying on a combination of information from diverse sources. Such a combination of measures is not uncommon, and military contexts are often viewed as strong examples of organization and structure. Nevertheless, approaches like this may limit the ability of readiness monitoring tools to inform key decision-making. A more immediate and more readily integrated approach to monitoring may be available through the frequent use of short and psychometrically sound measures.
Where reviews and research have attempted to conceptualize readiness, their focus has been both broad, and longer term, spanning multiple constructs, including: self-efficacy; commitment; perceived organizational support; physical fitness; sense-of-community; technical competence; family-life; and job satisfaction (Adams et al., 2009;Blackburn, 2014). While these narrative reviews offered a broad overview of readiness, they included more stable traits and attributes, and did not offer clear advice for measurement, noting for example: "Measures that do exist are generally tailored to a specific domain and their generalizability is unclear. The models related to individual readiness are still relatively few and often lack empirical support. These have not been broadly validated" (Adams et al., 2009, p. iii). Hence the necessity of developing a suitable psychometric measure of acute readiness is increasingly clear.

Existing Psychometric Measures of Readiness
Psychometric research studies have examined different forms of readiness, including: exercise readiness (e.g., freshness/energy and fatigue - Strohacker and Zakrajsek, 2016;Strohacker et al., 2021); readiness to return to sport following injury (skills/fitness and confidence/self-efficacy - Conti et al., 2019); cognitive readiness (e.g., operational and strategic -Grier, 2012;Grier et al., 2012); and -considering a military context -readiness for combat (e.g., discipline and "military climate" - Bester and Stanz, 2007; see also Wen et al., 2014). None of the existing scales span all the different roles fulfilled by military personnel -reaching beyond combat -which necessitated the development of a new specific measure. In reviewing such scales, it is clear that "readiness" can be conceptualized as multidimensional, spanning multiple domains including physical, cognitive, emotional, social, skills/training, and even equipment/resources. In combination, these different components may combine to form an overall indication of acute readiness. Further, important distinctions can be made between acute readiness versus attributes developed over time, such as skills-and-training, as well as a distinction between group-level constructs such as climate and individual states such as fatigue or freshness. We set out to develop a scale to focus on acute, individual, multidimensional readiness -in order to (when implemented) detect short-term changes in individual and group capability.

Objective Measures and Wearables
Aside from time-consuming and resource-heavy medical examinations, the main alternative to a psychometric approach for monitoring readiness would be the use of wearable devices to monitor physiological signals such as heart rate, heart-rate variability, skin temperature, and galvanic skin response (Domb, 2019;Seshadri et al., 2019). Biochemical markers are also available through the sampling of sweat and sometimes blood to capture metabolites and electrolytes (Bandodkar and Wang, 2014;Lee et al., 2018). Decreasing production costs, increasing portability and the opportunity for real-time monitoring has made these popular options. Nevertheless, these devices can demonstrate inconsistent reliability between different circumstances, and are often dependent on communications network, with implications for battery life and data-security (Evenson et al., 2015;Baig et al., 2017;Peake et al., 2018;Seshadri et al., 2019). In many settings, including the military, construction, food processing, nursing and allied health professions, complications are caused via: (a) exposure to wide variations in temperatures; (b) frequent heavy usage; (c) impacts; and (d) challenges such as water, sweat, grit and sand. Keeping such devices operational over long periods can generate a significant impost, for example, through charging, maintenance and software updates. The information extracted from the user is often removed, analyzed and stored elsewhere, which may not 'empower' the user in terms of increasing awareness or facilitating immediate decision making (Seshadri et al., 2019). Without access to suitable network connectivity or local information processing capabilities, the raw information gathered by these devices is often uninterpretable by the user, and over extended periods may simply be deleted or become outdated before it is uploaded. There are also reported occasions where a user's own perceptions and performance differ substantially from the interpretation offered by wearables, such as athletes being removed from competition despite feeling good and performing well (Blair et al., 2017;Coyne et al., 2018). For these reasons, coupled with the inability of wearables to evaluate all dimensions of readiness (for example skills/training, team-functioning an equipment), a "low-tech" psychometric instrument may be beneficial at least to complement wearable technology: both to compensate in instances when conditions limit the effectiveness of the wearable, and also in facilitating better alignment between subjective perceptions and objectively measured data.

Delineating Acute Readiness From Related Concepts
Finally, in reviewing existing psychometric measures of relevant and related concepts, several key issues support the development of a new instrument. Measures such as stress-recovery (e.g., (Kölling et al., 2015;Nässi et al., 2017), daily hassles (e.g., (Holm and Holroyd, 1992) and the Task-Load Index (e.g., Hart and Staveland, 1988;Byers et al., 1989) all assess subjective perceptions of recent past events. These measures do not assess perceptions of what a person is able to do in the immediate future, and while we may expect a close relationship, performers in many settings must often complete challenging tasks but remain "ready" for another challenge -indeed this is a core requirement of the role. Measures of current subjective state such as affect (e.g., Watson et al., 1988;Crawford and Henry, 2010), anxiety or wellbeing (Ryff, 1989;Van Dierendonck, 2005) likewise do not assess the perception of what may be attempted in the immediate future -what one is ready for. Similarly, while there may be a relationship, many personnel (especially in military) are asked to perform regardless of affect or anxiety, and so a more meaningful question would be: "what are you ready for right now?." The ability to gain insights into this "readiness" state would be invaluable to individuals and their managers, for whom planning of actions and responses is often time-limited (for example, 2-to-4 h was typical in Burr, 2018). The other related concept might be self-efficacy (Bandura, 1977;Sherer et al., 1982): which can be applied to specific tasks or broader skillsets. While clearly relevant to the concept of readiness, especially with reference to roles and skills, we argue that immediate readiness has a more specific focus than the broad judgment of being capable that typically informs self-efficacy. Likewise, selfefficacy is more likely to be relatively stable over time (Ryckman et al., 1982;Lane et al., 2004;Chesney et al., 2006), indicating broad perceptions of capability, but not reflecting immediate judgments of situational readiness as a function of recent events such as sleep, nutrition, trauma, or physical and mental fatigue. Responding to these concerns, we developed a simple instrument that would be meaningful to users and their immediate linemanagers: thus necessitating the use of plain language in both the questions/items and the higher-level constructs.

Present Research
A systematically developed measure of acute readiness, with items that are applicable across diverse professions and job roles, is necessary for timely, informative and psychometrically sound assessments of readiness. We aimed to develop and evaluate the initial evidence of validity for an instrument we called the Acute Readiness Monitoring Scale (ARMS). The ARMS was designed as a multidimensional measure assessing individuals' perceptions of readiness for imminent challenges, drawing on capabilities from the domains of physical, cognitive, emotional, social, skills-training and equipment. We collected data from personnel across all three phases of the Army's "Force Generation Cycle 1 , " spanning diverse roles and ranks, and then conducted two analyses to assess the internal structure (to determine the extent to which the items of a measurement instrument are in line with the construct of interest via factor analyses; Chan, 2014). We also sought to evaluate the correlations of ARMS factors to other related variables. These steps are in accordance with the Standards for Educational and Psychological Testing (The Standards; developed by the American Educational Research Association [AERA] et al., 2014). Additionally, we sought to examine evidence for reliability and discriminant validity of the subscales of the ARMS.

OVERVIEW OF STUDY 1
The aim of Study 1 was to first develop a pool of items to assess acute readiness in the form of: (a) overall readiness; (b) physical readiness; (c) cognitive readiness; (d) emotional readiness (termed "readiness for threat-and-challenge" to be more acceptable to the user-group); (d) the social readiness of the unit, group or team; (e) the suitability of one's training and skills for immediate tasks; and (f) equipment readiness. Second, we set out to evaluate the internal structure, internal consistency, and discriminant validity of the subscale scores of the new measure.

Participants
We targeted a sample of five to ten participants per item (Anthoine et al., 2014), acknowledging that (Comrey and Lee, 1992) asserted that a sample size 300 is typically viewed as appropriate. The total final sample consisted of 770 Australian Army personnel (N male = 677, N female = 93), with a mean age of 26.5 years (SD = 7.0 years). Seven participants declined to participate at the informed consent stage, and any corresponding data were destroyed. One participant spoiled their answers and was excluded. Participants were drawn from all three phases of the "Force Generation Cycle" (N ready = 358; N readying = 186; N reset = 226) and from a wide variety of trades, spanning: infantry, artillery, engineers, signals, chefs, armored divisions, clerks, and more, but not including special forces. Participants were drawn from a wide range of ranks (N PR = 572; N PR(P) = 15; N LCPL = 46; N CPL = 71; N SGT = 18; N WO2 = 16; N WO1 = 1; N LT = 10; N CAPT = 6; N MAJ = 13; N LTCOL = 1). Career length in the Army ranged from 0.5 -42.5 years (mean = 5.2 years, SD = 5.5 years). Ethnicity was self-reported using the Australian Bureau of Statistics reporting system (Australian Bureau of Statistics, 2016), coded as follows: (a) Aboriginal and Torres Strait Islander (n = 40); (b) Arab, North Africa and Middle East (n = 11); (c) Africa (n = 5); (d) Americas -North and South (n = 4); (e) Asia (n = 31); (f) Caucasian and Western European (n = 570); (g) Melanesia, Polynesia and New Zealand Peoples (n = 12); (h) South and Eastern European (n = 14); (i) Mixed/Other (n = 24); and (j) no answer (59) Table 1. The total sample was randomly divided into n = 500 for the Exploratory Factor Analysis (Study 1), and n = 270 for the Confirmatory Factor Analysis (Study 2) and this split was checked to ensure no significant differences in composition. The subsequent evaluation of convergent and divergent validity was conducted using the whole sample.

Acute Readiness in Monitoring Scale
The ARMS items were designed to assess participants' perceptions of readiness for immediate challenges and tasks that could be demanding physically, cognitively, emotionally, drawing on specific skills/training, teamwork and team functioning, or equipment. Items were also developed to provide an overall appraisal of readiness. An initial pool of items was developed based upon the operational definition of acute readiness. These items were then reviewed by the rest of the research team, who made suggestions for improvements and/or proposed alternative items. Items were kept brief (but not single word items), were not double-barreled in syntax, and did not borrow heavily from any one existing measure. Reverse-scored items were included. The content of items was informed by existing self-report measures of readiness, affect, perceived task load, stress-recovery, coping, and fatigue (e.g., stress recovery - Kölling et al., 2015;Task-Load Index -Byers et al., 1989;affect -Watson et al., 1988;wellbeing -Ryff, 1989; self-efficacy - (Sherer et al., 1982). The initial item pool is listed in Supplementary File 1. The items were scored on a seven-point Likert scale from zero (does not apply at all) to six (fully applies), and introduced with the phrase "Please answer the following questions in relation to how ready you feel for any upcoming task or challenge." The 7-point response format is common in sport and performance psychology (e.g., Bartholomew et al., 2011;Ng et al., 2011) and consistent with survey takers' preferences: performing well in terms of their discriminative power (Preston and Colman, 2000). Through this process, our team generated 12-14 items assessing perceptions of each construct, with the intention of selecting approximately four items per factor/subscale in the final product. The proposed items were subsequently evaluated by a user group (n = 23) and an international panel of experts (n = 7) -with feedback leading us to (for example) de-emphasize resilience concepts (initially a focus from the industry stakeholders), and remove references to "right now" which was already implied in the question stem. We also presented and discussed the proposed approach to a group of senior stakeholders at a workshop held on 8th April 2019.

Procedure
Ethical approval was obtained for both the expert panel consultation (HREC, 1739) and the main data collection (DST LD 08-19 and HREC 2193). Subsequently, task orders were issued from Australian Army Headquarters to make troops available for data collection visits lasting approximately 1-h. Data-collection took place in person, at locations across Australia, using paperand-pen surveys -typically in groups of between 15 and 100 personnel in one sitting. Commanding officers were not required to be present, but some chose to attend and complete the task with their units (see Participants, above). Prior to survey administration, participants were advised of the wider intention of the project to generate a readiness monitoring instrument and shown mock-ups of how such a system could look and be used in practice. Participants were assured that were no right or wrong responses, reminded of the anonymity of their responses, and encouraged to respond honestly. We recorded the time of day that survey completion began, as well as the most recent task that the participant had completed, as well as how demanding they found that using the NASATask Load Index (TLX). Only personnel who were on-base at the time of data collection were included. Participation in the study was voluntary, and this was emphasized through the informed consent process as well as in the small presentation preceding the data collection. All participants completed a written informed consent form prior to taking the survey, which was administered in person immediately prior to the data collection. Participants were informed they could return to other tasks if they did not wish to participate, although -as implied above -several chose to complete the survey with their team and then withheld consent for the data to be used.

Data Analyses
Exploratory factor analysis (EFA) identifies the dimensionality of constructs by examining relations between items and factors (Netemeyer et al., 2012). For this reason, EFA is typically performed in the early stages of developing a new or revised instrument (Wetzel, 2012). In this study, seven candidate factors were developed: (a) overall readiness (b) physical readiness; (c) cognitive readiness (d) threat-challenge readiness; (e) skillsreadiness; (f) group readiness; and (g) equipment readiness. This hypothesis was used to inform how we explored the structural pattern of the preliminary scale, along with a scree plot and eigenvalues (Thompson, 2004). Scree plots are useful to estimate where a significant drop occurs in the strength of possible factors (Cattell, 1966;Netemeyer et al., 2003).
We developed the factorial structure of the new measure using EFA, Exploratory Structural-Equation Modeling (ESEM) and CFA. ESEM incorporates aspects of the CFA process, such as specifying item-combinations and relationships, within the exploratory phase of the scale development (Asparouhov and Muthén, 2009). As such, ESEM is useful in clarifying key issues such as cross loading and potential shared error variance before moving to the CFA stage. Statistical analyses were conducted in SPSS (IBM Corp, 2016 -data cleansing, collating emerging factors, checking, internal reliability) and MPlus 8.3 (Muthén and Muthén, 2012 -running models, assessing each factor, model, and fit indices). We worked from the Pearson correlation matrix that is the default in MPlus, and we used oblique "geomin" rotation in line with guidance for EFA (Dien, 2010). For CFA modeling, latent factors were permitted to correlate, with crossloadings of items on unintended factors being constrained to zero. Similar to CFA, as the analysis progressed into evaluating ESEM models, items could load on their predefined latent factors, while estimating cross loadings and considering this in the development of the evolving model.

Item Distribution
Prior to the factor analyses, data were scanned for univariate normality regarding the assumption for the use of maximum likelihood estimation method. Median values for skewness and kurtosis for the 88 candidate items were 0.69 and 0.11, respectively, and ranged from −2.21 to 1.36 for skewness, and −1.01 to 5.16 for kurtosis. Two heavily skewed items were removed from the analysis ("I am sick today" and "I am injured today" -both very frequently scored as zero or "zero inflated"). We observed up to 2% missing data in some variables, and so data were analyzed using a robust maximum likelihood estimator (MLR). MLR yields robust fit indices and standard errors in the case of non-normal data and operates well when categorical variables with a minimum of five response categories are employed (Rhemtulla et al., 2012;Bandalos, 2014).

Configurations of the Proposed Factors
The exploratory factor analysis, using the subset of n = 500 respondents, process began with an initial analysis run to obtain eigenvalues for each factor in the data. Next, the Kaiser-Meyer-Olkin (KMO) Measure of Sampling Adequacy (KMO = 0.962) test and Bartlett's Test of Sphericity (χ 2 (3655) = 37578.3, p < 0.001) were employed to determine that the data collected were appropriate for an exploratory factor analysis. Having established the data were suitable for EFA, we continued to explore the data. The initial analysis suggested 13 factors with an eigenvalue over one, whereas the scree plot suggested a firm transition ("scree") after nine factors. As such, we used MPlus to generate factor solutions between 7 and 14 factors, and reviewed the solutions in relation to: (a) the hypothesized factor structure; (b) feedback from the expert panel (prioritizing conceptual clarity); and (c) user-group workshop and stakeholder inputs (for example prioritizing interpretability, brevity and parsimony). By reviewing the multiple factor solutions simultaneously, we recorded which items consistently loaded together, and which items consistently cross-loaded, or failed to load meaningfully. Items largely loaded consistently into the expected factors, but two proposed factors in particular -physical readiness and cognitive readiness -contained two clusters of items loading meaningfully onto separate factors: that we interpreted to distinguish between "readiness" and "fatigue." We investigated this development further using the process detailed below. The proposed threat-challenge factor split into items clearly pertaining to "readiness" alongside others capturing simply "affect, " leading to these latter items to be discarded as they were considered inconsistent with the targeted emphasis on performance capability.
At this stage, we examined each of the hypothesized factors using EFA, ESEM and CFA to systematically remove problematic items, and then re-run the resulting ESEM model with the best performing items. For these analyses, we included CFA steps as they were helpful in comparing models and selecting items with strong primary factor loadings to ultimately inform the final ESEM model (see also, Bhavsar et al., 2020). Model misspecification was identified through assessments of standardized factor loadings and modification indices, in a manner similar to item reduction approaches used in previous scale development procedures (e.g., Rocchi et al., 2017). Alongside these statistical criteria, we also considered the conceptual coverage of the items. Items with standardized factor loadings below 0.30, as well as items with multiple (two or more) moderate-sized or large modification indices (over 10) were reviewed and considered for deletion. As such, 56 of the 88 items were deleted in a systematic manner over several iterations. The resulting nine one-factor models each had excellent fit (see Table 2). After removing one item that cross loaded (SK3), the revised combined/overall model demonstrated good fit [χ 2 (428) = 1034.867, p < 0.001; χ 2 /428 = 2.4; CFI = 0.95; TLI = 0.95; SRMR = 0.05; RMSEA = 0.05 (90% CI: 0.05, 0.06)]. The result of EFA in Study 1 was a refined measure that included 32 items arranged into nine factors (see Table 3).
Standardized factor loadings were significant and above 0.30 (range 0.49 to 0.90; see Table 3). Four cross-loadings greater than 0.20 on unintended factors were present, and this was not viewed as a concern warranting their deletion (see also Bhavsar et al., 2020). Subscale correlations ranged from 0.26 to 0.77 and were in the expected directions (see Tables 4, 5). Cronbach's alpha reliability coefficients are also reported in Tables 4, 5. These were over 0.70 for all factors.

OVERVIEW OF STUDY 2
The aims of Study 2 were: first to test the revised items and factor structure from Study 1 with a previously unused sample (n = 270 from the original sample of 770); and second, to test the nomological network of the readiness scale, by examining their relations with indication of recent stress and demands, as well as protective traits such as resilience and self-control. Based on the theoretical rationale that resilience represents an array of traits that mitigate the effects of stress and demand on current readiness (e.g., Rutter, 1987;Connor and Davidson, 2003) we proposed that: (a) job demands, daily hassles, and perceived load in the most recent task would influence current readiness; (b) also influenced mitigating traits including resilience, stress mindset and self-control. We further predicted that (c) current readiness should be correlated to current affect, current distress, and a supervisor's rating of current soldier readiness. Specifically, perceived readiness should be positively correlated with positive affect and supervisor's ratings of soldier readiness, but negatively correlated with reported levels of distress and negative affect.

Method -Study 2
Confirmatory Factor Analysis (CFA) was conducted on the remaining sample of 270 participants, randomly separated from the data used in Study 1. Subsequently, using the full 770 participants, correlational analysis was conducted using Pearson correlations, their 95% bootstrapped confidence intervals (

Measures
We selected a range of measures to assess convergent and discriminant validity of the newly developed ARMS. In the following paragraphs, we report the Cronbach alpha reliability from the original validation or a cited revalidation: for internal reliability scores in the current study see Tables 4, 5. To assess recent stress and load experienced by participants, we included three measures of recent stress: (a) chronic load using the 22-item Demand-Induced Strain Compensation Questionnaire to assess both cognitive and emotional occupational strain (Bova et al., 2015; α = 0.55-0.78); (b) recent low-level stresses using the 117item Daily Hassles Scale (Kanner et al., 1981;Holm and Holroyd, 1992; α = 0.80-0.88); and (c) perceptions of how demanding the most recent task was -in terms of mental demand, physical demand, time-pressure, frustration and effort -using the NASA Task Load Index (NASA-TLX - Hart and Staveland, 1988;Xiao et al., 2005; α > 0.80).
To assess other potential indices of acute readiness, we assessed:   n/a n/a n/a n/a n/a Cells include r-value with 95% confidence-interval values in parentheses. *Denotes p < 0.05 and **Denotes p < 0.01. We used † † to denote correlations that were statistically significant but with confidence intervals close to including zero. Gray font denotes no meaningful correlation. Black and bold indicates a significant correlation.
rating by the supervisor, using a Visual Analog Scale (VAS) from 1 to 20 (1 = "not ready for likely duties, " 10 = "Suitably ready for likely duties"; 20 = "Beyond capable in relation to likely duties"). This VAS was developed specifically for this study. Participants gave additional consent for supervisor's rating to be sought.

Analysis
Concurrent validity would be demonstrated by a significant correlation (p < 0.05 and confidence intervals not including 0) between corresponding factors (e.g., overall readiness and positive affect). Discriminant validity would be supported when two criteria were met: (1) 95% CI of the correlation between ARMS factors and the other factors did not include 1 or −1 (Bagozzi et al., 1991;Chan et al., 2018) and (2) either the Average Variance Extracted (AVE) of each factor was larger than their shared variance (i.e., square of correlation) with the other factors (Fornell and Larcker, 1981), or a visual inspection of the scatter plot between the two variables did not suggest any linear or curve-linear relationship.

Results -Study 2
Using CFA, the model specified in Study 1 showed acceptable fit to the new data [χ 2 (428) = 902.363, p < 0.001; χ 2 /428 = 2.1; CFI = 0.94; TLI = 0.93; SRMR = 0.06; RMSEA = 0.06 (90% CI: 0.06, 0.07)]. These are similar fit indices to those from Study 1, and broadly interpreted as on the boundary between "acceptable" and "good" model fit. Given the intended implementation by the stakeholders, we agreed that no further refinements were necessary to improve model-fit at this stage. For concurrent validity, ARMS factors positively associated with each other, as well as positive affect, while negatively correlating with K10-distress and negative affect (see Table 4). For discriminant validity, the correlation 95% CI spanned zero between the ARMS factors and most aspects of the NASA-TLX, consistent with the notion that recent task-load alone is not a sufficient predictor of immediate readiness. Nevertheless, TLX-frustration demonstrated small correlation with ARMS subscales, suggesting that the emotional experience of frustration may be more relevant in determining perceptions of immediate readiness than other forms of task-load (see Table 4). Similarly, DISC-Q scores for emotional strain were negatively correlated to ARMS factors, although DISC-Q cognitive strain showed only small correlations. Thus, emotional load and emotional strain seem to be more promising indicators of readiness perceptions than recent cognitive load and/or strain. As expected -noting that fatigue items were reverse coded -all ARMS subscales were positively correlated with the resilience-promoting traits of resilience (CD-RISC), stress-mindset, self-control, and psychological wellbeing; while being negatively correlated with the total severity of recent hassles. ARMS subscales were positively correlated with positive affect, and negatively correlated with negative affect and K-10 distress, while meeting the criteria to demonstrate discriminant validity. The final Cronbach alpha for the overall ARMS scale was 0.949.

DISCUSSION
We developed and validated a psychometric instrument suitable for assessing acute readiness: the ARMS. The intent behind this tool was to facilitate rapid, reliable indications from personnel themselves of current individual and group capabilities for immediate tasks; as well as the ability to monitor how individuals and groups respond to training and deployment challenges. One data sample, conducted with Australian Army, was divided into two analyses, with findings supporting the key aspects of construct validity in this new psychometric tool.

Summary of Findings
In Study 1, a total of 32 items for the ARMS received support in the forms of user-group endorsement, expert panel clarification, and then the demonstration of a factor structure that was largely consistent with expectations, including good model fit indices. The inclusion of four items to indicate "overall readiness" as a brief scale helps to facilitate the relatively un-intrusive monitoring of day-to-day readiness. The factor structure was kept simple, with no additional modeling beyond simply tallying the nine factors, with no sharing of error variances or additional modeling performed. In Study 2, the factor structure developed in Study 1 was supported, showing acceptable fit in a fresh sample. Further, the concurrent validity of the ARMS was evaluated by examining the correlations between subscales, as well as with targeted constructs such as recent task-load, time-of-day, affect, distress and supervisorratings of readiness. The subscales of the ARMS showed smallto-moderate intercorrelations as might be expected, and were also moderately associated with affect and K10-distress. The subscales "overall readiness" and "equipment readiness" showed small but statistically significant correlations to the supervisor's rating of readiness. Most aspects of recent task-load did not associate to ARMS scores, although frustration in the most recent task was consistently correlated with ARMS scores. Additionally, ARMS subscales were largely correlated with recent stress, as indicated by severity of recent "daily hassles" and evaluations of occupational emotional demands (DISC-Q -De Jonge and Dormann, 2003). The initial validation of a psychometric tool for monitoring readiness across a range of contexts and job-roles provides the basis for further cross-sectional and longitudinal evaluations. The main divergence from expectations was the emergence of separate "readiness" and "fatigue" factors under both the "physical" and "cognitive" subscales. While correlated, these items were not able to be modeled within single factors: i.e., they were not two ends of the same spectrum. A similar observation was made in a recent study of exercise readiness (Strohacker and Zakrajsek, 2016), with "freshness" and "fatigue" modeled as separate factors. Informal reviews with the user-group supported the interpretation that -while perhaps counter-intuitive -one can be fatigued by recent commitments and yet still "ready-to-goagain." Likewise, one can feel neither fresh nor fatigued. As such a respondent could indicate physical or mental readiness-versusfatigue to be any configuration of: (a) high:high; (b) high:low; (c) low:high or (d) low:low. In the studied population -perhaps faced by frequent combinations of physical and mental load, as well as certain cultural norms around admitting to weakness or vulnerability -it is possible that the observed pattern is unique to military: but that would largely support the need for further research to assess the suitability of the ARMS for different contexts. Upon returning to the wider literature, however, we did find examples of this pattern in other research. For example, Boolani and colleagues have characterized different correlates of both trait (Boolani and Manierre, 2019) and state fatigue-versusenergy (Boolani et al., 2018), as well as different responses to physical activity (Boolani et al., 2021). As such, our findings may be adding to a growing awareness that the experiences of energy and fatigue may be separate.

Comparison to Previous Findings
We developed the ARMS as a highly useable and easily interpreted scale. In comparison to the bureaucraticadministrative exercise of monitoring training, equipment servicing and medical reporting processes (cf. Dabbieri, 2003), the ARMS represents a new and complimentary opportunity. This new psychometric instrument allows leaders/managers to quickly request validated, meaningful readiness scores from an individual, team or wider group: for example, in rapidly planning a new project, tender or crisis response. Not only would the resulting scores from the ARMS be almost immediate and in a consistent format, they would be provided by personnel with direct access to the relevant information. Further, information provided from the ARMS is readily integrated, synthesized and interpretable. These are all advancements complementing the existing management practices reliant on finding, studying and synthesizing information from a wide range of records: generated infrequently, and stored in different formats. Further, in comparison to the data from wearable technology that monitors physiological signs such as heart-rate variability, sleep or physical activity logs (cf. Domb, 2019;Seshadri et al., 2019), data from the ARMS should be more readily interpreted with minimal additional processing, and more readily collated at the group level to give information that planners and decisionmakers may need. For example, information about lack of sleep or sustained physical exertion during operations may simply represent quite typical role requirements from longdistance travel or night-shift work, and so may tell the receiver little about the individual or group's capabilities for further actions. Likewise, in a situation with limited connectivity and/or constraints on battery usage, the ARMS could be completed quickly using paper-and-pens, collated and interpreted in situ, without computational processing.
While previous psychometric instruments have been developed, the ARMS is arguably the only instrument to balance brevity (i.e., 4 core items + 28 expansion items) with capturing the multidimensional nature of acute readiness, across various roles and performance capabilities (cf. Bester and Stanz, 2007). Existing measures have either focused on specific aspects of readiness (e.g., cognitive readiness -Grier, 2012; exercise readiness -Strohacker and Zakrajsek, 2016; return from injury readiness -Conti et al., 2019) or broad overall appraisals of self-efficacy or anxiety. These measures have a much higher-ratio of items-per-component, representing a higher burden on respondents. Combining multiple different scales is also less suitable for immediate integration into a multidimensional model representing acute readiness. Similarly, psychometric measures of recent hassles, stress recovery, self-efficacy, affect, or anxiety are typically either: (a) too long; (b) not specific enough to be used and interpreted by users (i.e., broader academic concepts); (c) focused on past events, not the immediate hereand-now; or (d) some combination of a-c. The ARMS is the only instrument designed to be brief, readily interpreted by users themselves, and specifically focused on acute readiness.

Next Steps
We recognize that the scale may require further validation in other contexts and groups in order to assess the generalizability. The sample in this study was, relatively young, well-educated and predominantly Caucasian, which may necessitate caution in seeking to apply these findings in other contexts/populations. Likewise, the emergence of physical and mental factors in "readiness" and "fatigue" warrants further examination in future research, although it is consistent with at least one previous study (Strohacker and Zakrajsek, 2016). Further, while the ARMS has been developed to be implemented into a regular program of monitoring, for example through an app or regular reporting practices, it has not been tested in that context. Hence, it would be important to determine how the instrument behaves when completed frequently, and whether there are ways of optimizing the utility of using the instrument -for example through carefully managing how often, what time-of-day, and after what events each subscale should be completed. This type of research would allow for the estimation of test-retest reliability, in circumstances where consistent scores would be expected, and also sensitivity to changes. While one aim of developing the ARMS was to facilitate a novel method for evaluating resilience -by observing acute psychological responses to challenges, adversity and training activities -the next logical step would be to evaluate the extent to which the instrument enables this evaluation. Future research assessing the use of the ARMS as an indicator of resilience is needed and would require repeated administration of the ARMS before and after a defined challenging event, or series of events.

Limitations
The entire sample in this study included military personnel who were "on-base, " and so much less likely to be experiencing the fatigue generated by sustained operations and time away-fromhome (or, for example, the fatigue caused by working from home over an extended period). Hence, it would be important to test the ARMS under diverse circumstances -both more civilian-relevant professions and also in more demanding military circumstances: potentially evolving items or factors to capture the effects of those activities. Finally, through the conduct of this study including talking to respondents and reviewing qualitative comments, coupled with the literature reviewed in preparing the ARMS, it is possible that there are other dimensions of readiness not assessed in the current measure. These forms might include social support from family and friends, rather than focusing on one's unit or team, and also behavioral readiness such as having good routines throughout one's day, or good diet and sleep patterns.

CONCLUSION
Overall, therefore, the present study has developed and provided initial validation of an instrument to assess acute multidimensional readiness with a focus on performance capability. The resulting instrument offers many opportunities both to resolve or circumvent limitations in other measures, as well as opening up new avenues of research in further refining the scale, and optimizing the implementation into realworld settings. Furthermore, the development of the ARMS has advanced the conceptual understanding in this topic, by identifying a potentially important distinction between readiness as "freshness" versus "fatigue" as separate but correlated constructs. In addition to facilitating a new avenue of research on acute readiness itself -a previously under-researched conceptthe development of the ARMS opens up opportunities to resolve real-world problems around readiness-monitoring, with implications for inferring resilience beyond cross-sectional trait measures, and potential to be adopted and applied in other contexts such as sport, occupational setting and healthmonitoring. Finally, by offering the ability to capture subjective perceptions of readiness, the new scale also may facilitate research on how wearable technology and objectively measured indices relate to these subjective perceptions, and how this "nexus" of different forms of information might be useful in managing training, performance and recovery in a wide array of performance contexts. Overall, the ARMS makes it possible to assess acute readiness, for the group and individual, in a way that is both un-intrusive, easily interpreted and yet is psychometrically validated. As such, numerous new possibilities open up following the development of this scale.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by Defense (Australia) Human Research Ethics Committee (Low Risk) University of Canberra Human Research Ethics Committee. The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
RK conceptualizing the project alongside co-authors and stakeholders, collecting and analyzing data, and preparing the written report. AF, BR, and MS provided important conceptual and methodological inputs in designing, implementing and interpreting the study (BR also participated in data collection). TN and MW are statisticians who provided critical oversight in planning, implementing and interpreting the analysis of data. LM was the Army liaison who ensured access to data collection opportunities and oversaw the secure collection and storage of data, as well as assisting in design and interpretation phases of the project. DC was the lead contributor from Defense Science Technology Group, playing a crucial role in the design, planning and facilitation of the study: spanning conceptualization, ethical approvals, methods and the planning of data collection. All authors contributed to the article and approved the submitted version.