Construction Industry Training Assessment Framework

The construction industry suffers from a lack of structured assessment methods to consistently gauge the efficacy of workforce training programs. To address this issue, this study presents a framework for construction industry training assessment that identifies established practices rooted in evaluation science and developed from a review of archival construction industry training literature. Inclusion criteria for the evaluated studies are: archival training studies focused on the construction industry workforce and integration of educational theory in training creation or implementation. Literature meeting these criteria are summarized and a case review is presented detailing assessment practices and results. The assessment practices are then synthesized with the Kirkpatrick Model to analyze how closely industry assessment corresponds with established training evaluation standards. The study culminates in a training assessment framework created by integrating practices described in the identified studies, established survey writing practices, and the Kirkpatrick Model. This study found that two-thirds of reviewed literature used surveys, questionnaires, or interviews to assess training efficacy, two studies that used questionnaires to assess training efficacy provided question text, three studies measured learning by administering tests to training participants, one study measured changed behavior as a result of training, and one study measured organizational impact as a result of training.


INTRODUCTION
Formal learning and training have been shown to increase an employee's critical thinking skills and informal learning potential in any given job function (Choi and Jacobs, 2011). Evaluating training through appropriate assessment is an important aspect of any educational endeavor (Salsali, 2005), especially for assessing training efficacy in real world studies (Salas and Cannon-Bowers, 2001). Examples of training assessment abound in literature across disciplines, for both professionals and non-professionals. For example, bus drivers who attended an eco-driving course achieved a statistically significant 16% improvement in fuel economy (Sullman et al., 2015); recording engineers with technical ear training achieved a statistically significant 10% improvement in technical listening (Sungyoung, 2015); and automatic external defibrillator training of non-medical professionals resulted in a statistically significant reduction in the time to initial defibrillation by 34 s, translating in a 6% increase in survival rate (Mitchell et al., 2008).
Many advancements have been made in construction education assessment at the university level (e.g., Mills et al., 2010;Clevenger and Ozbek, 2013;Ruge and McCormack, 2017). However, within the industry itself, the dearth of workforce training research (Russell et al., 2007;Killingsworth and Grosskopf, 2013) extends to the assessment of construction industry training, particularly assessments of how learning major construction tasks affects project outcomes (Jarkas, 2010). Love et al. (2009) found that poor training and low skill levels are commonly associated with rework, which is a chronic industry problem, representing 52% of construction project cost growth (Love, 2002). Given the potential for loss within the construction industry, in both economic and life safety terms (Zhou and Kou, 2010;Barber and El-Adaway, 2015), it is reasonable to expect that integration of construction industry training assessment practices across the industry would yield improved effectiveness amongst those trained.
To understand and improve current practices for industry training assessment, the following research questions are undertaken: • What practices have been used to assess construction industry training? • How closely do construction industry training assessments adhere to established training evaluation standards? • What survey science practices are typically not integrated in construction industry training? • What practices (i.e., optimal standards) are appropriate for implementation in construction industry training program assessment?
This paper presents a framework for construction industry training assessment that identifies established practices rooted in evaluation science and developed from a review of archival construction industry training literature. The Kirkpatrick techniques (Kirkpatrick, 1959) for training evaluation serve as the foundation for the framework and relevant survey science best practices are identified and integrated. Assessment methodologies contained within the studies that meet the inclusion criteria are summarized through comprehensive case review and categorized according to the Kirkpatrick Model (Kirkpatrick, 1959) levels. The identified assessment methods are then linked with Kirkpatrick Model guidelines to analyze how closely construction industry training studies have adhered to established training evaluation standards. By analyzing the identified studies and established survey science literature, optimal standards for assessing construction industry training programs are extracted and presented within a construction industry training assessment framework.
The contribution of this research is the creation of a framework with guidelines for assessing industry training that align with the Kirkpatrick Model and have been distilled from published industry training literature and survey science best practices. The case review results and synthesis provide a current snapshot of professional construction industry training assessment criteria, identifying how closely established evaluation standards are met, and more critically, what survey science practices are integrated in assessments. This allows for the integration of established evaluation science into training assessment practices. The intended audience of this paper is construction education and training researchers, professionals, organizations, and groups. The practical implications of this framework are its direct implementation by those conducting training, basis in sound assessment science, and practices extracted from literature.

Overview of Evaluation Techniques
The reported efficacy of training has been shown to differ depending on the assessment methodology (Arthur et al., 2003), underlining the importance of the alignment of assessment levels and methods with outcome criteria. Kirkpatrick and Kirkpatrick (2016) define training efficacy as training that leads to improved key organizational results. Studies often use questionnaires after training for assessment; however, participant evaluations and learning metrics evaluate different aspects of success. Questionnaires administered directly following training tend to only measure immediate reaction to the training; therefore, to effectively evaluate training impacts beyond participant satisfaction, an assessment model is recommended. Kirkpatrick (1959) Techniques for Evaluating Training Programs, known as the Kirkpatrick Model, is likely the most well-known framework for training and development assessment (Phillips, 1991) and remains widely used today (Reio et al., 2017). It is comprised of four assessment levels: 1) Reaction, 2) Learning, 3) Behavior, and 4) Results.
Kirkpatrick asserts that training be evaluated using the four assessment levels described, and that these are sufficient for holistic training evaluation (Kirkpatrick, 1959). However, since its introduction, several other important evaluation models have been developed, many of which stem from the Kirkpatrick Model. For example, the input-process-output (IPO) model (Bushnell, 1990) begins by identifying pre-training components (e.g., training materials, instructors, facilities) that impact efficacy as the input stage. The process stage focuses on the design and delivery of training programs. Finally, the output stage essentially covers the same scope as the Kirkpatrick Model. Brinkerhoff (1987) six-stage evaluation model goes beyond assessment into training design and implementation. The first stage identifies the goals of training and the second stage assesses the design of a training program before implementation. The remaining four stages fall in line with Kirkpatrick's four levels. Kaufman and Keller (1994) present a five-level evaluation model where Level 1 is expanded to include enabling, or the availability of resources, as well as reaction; Levels 2 through 4 match the corresponding levels in the Kirkpatrick Model; Level 5 goes beyond the organization and presents a method of evaluating the training program on a societal level. Phillips (1998) presents a five-level model that adopts Kirkpatrick's first three levels and expands the fourth level by identifying ways that organizations can assess organizational impact. A fifth level is added that evaluates the true return on investment (ROI) by comparing the cost of a training program with the financial gain of organizations implementing training.
While developing and designing effective programs are important, these criteria fall outside the scope of this study; which focuses on training assessment implementation and not evaluating the suitability of aspects of the training programs reviewed. Therefore, the Bushnell and Brinkerhoff models have no advantage above the Kirkpatrick Model for this analysis. Similarly, there is not enough information provided in the identified studies regarding social implications as a result of training to warrant use of Kaufman and Keller's or Phillips's fivelevel models as a basis. From an assessment aspect, the reviewed models essentially stem from and adhere to the four levels found in the Kirkpatrick Model. Because the focus of this research is the assessment of construction industry training programs, and not the design and development of training, the Kirkpatrick Model is well-suited for robust synthesis and extraction of optimal standards for training evaluation methodologies and is therefore used in this study. Kirkpatrick (1996) asserts that the 1959 model is widely used because of its simplicity. Amongst the population of training professionals, there is little interest in a complex scholarly approach to training assessment. Definitions and simple guidelines are presented in the model to facilitate straightforward implementation (Figure 1). The following paragraphs describe each level in more detail.

The Kirkpatrick Model
Level 1: Reaction Within the first level, overall trainee satisfaction with the instruction they have received is measured. While all training programs should be evaluated at least at this level (Kirkpatrick, 1996), learning retention is not measured here. Participant reactions are perceived to be easily measured through trainee feedback or survey question answers (Sapsford, 2006); therefore, surveys are a common means of assessment. From a robust reaction analysis, program designers assess training acceptance and elicit participant suggestions and comments to help shape future training sessions.
Level 2: Learning Within the second level, trainee knowledge gain, improved skills, or attitude adjustments resulting from the training program are measured. Because measuring learning is more difficult than measuring reactions (Level 1), before-andafter evaluations are recommended. These may include written tests or demonstrations measuring skill improvements. Analysis of learning assessment data and use of a control group are recommended to determine the statistical significance of training on learning outcomes, when possible.
Level 3: Behavior Within the third level, the extent to which training participants change their workplace behavior is measured. For behavior to change, trainees must recognize shortcomings and want to improve. Evaluation consists of participant observation at regular intervals following the training, allowing ample time for behavior change to occur. External longitudinal monitoring is more difficult than assessment practices in the previous two levels. A control group is recommended.
Level 4: Results Within the fourth level, the effect that training has on an overall organization or business is measured. Many organizations are most interested (if not only interested) in this level of evaluation (Kirkpatrick, 1996). In fact, "The New World Kirkpatrick Model" (Kirkpatrick and Kirkpatrick, 2016) asserts FIGURE 1 | Kirkpatrick model levels and guidelines (Kirkpatrick, 1996 that training programs should be designed in reverse order from Level 4 to Level 1 to keep the focus on what organizations value most. Common assessment metrics are improved quality, increased production, increased sales, or decreased cost following training. A control group is recommended.

Survey Science Best Practices
Multiple studies have focused on proper formulation of survey questions that can be used across industries. Lietz (2010) summarized the literature regarding questionnaire design, focusing on best practices such as question length, grammar, specificity and simplicity, social desirability, double-barreled questions, negatively worded questions, and adverbs of frequency. With regards to question length, Lietz (2010) recommends short questions to increase respondents' understanding. Complex grammar should be minimized and pronouns should be avoided. Simplicity and specificity should be practiced to decrease respondents' cognitive effort. Complex questions should be avoided and instead separated into multiple questions. Definitions should be provided within the question to give context. For example, a "chronic" health condition means seeing a doctor two or three times for the same condition (Fowler, 2004). The scale used to gauge responses with should also follow the concept of simplicity. Taherdoost (2019) found that while scales of 9 and 10 are thought to increase specificity, reliability, validity, and discriminating power were indicated to be more effective with scales of 7 or less. Social desirability may result in respondents' answering questions based on their perception of a position favored by society. To remedy this bias, Brace (2018) suggests asking questions indirectly, such as "What do you believe other people think?" where respondents may be more likely to admit unpopular views. "Doubled-barreled" questions contain two verbs and should be avoided. Negatively worded questions should similarly be avoided to clarify the meaning. This is particularly the case when the words "no" or "not" are used together with words that have a negative meaning such as "unhelpful." Finally, adverbs such as "usually" or "frequently" should be avoided and replaced with actual time intervals such as "weekly" or "monthly."

Methodology
The methodology consists of three steps: 1. Relevant literature is identified through inclusion criteria; case review is performed to extract and summarize key assessment aspects. 2. Identified construction assessment methodologies are evaluated against the corresponding Kirkpatrick Model level guidelines. 3. An assessment framework is constructed that integrates optimal assessment standards aligned with the Kirkpatrick Model.

Study Selection and Evaluation
A structured literature review is implemented to collect data describing construction industry training assessment for current industry professionals. The objective is to understand how various construction industry training programs that have embedded established educational theory in their design or implementation assess training efficacy. Educational theoryembedded training was selected because it is indicative of a more robust training assessment. Peer reviewed archival literature is searched to determine the state of construction industry training studies that have been documented in scholarly works. The main search keywords were "construction industry," "education theory," and "training." The main research engines were EBSCOhost library services and Google Scholar; and they were used to identify relevant studies. The following inclusion criteria were established to identify recent, relevant peer-reviewed construction industry training studies published after 2005 for investigation in this study: 1. The training focuses on the current construction industry workforce, including construction workers (W), project managers (M), and designers (D). 2. The training incorporated educational theory in its creation or implementation.
Using the keywords mentioned above, a literature search was conducted resulting in 475 research studies, which increased to 483 through identification of other sources referenced in the initial search results. After removing duplicates and applying the inclusion criteria and additional quality measures, 15 publications were identified for the review, indicating limited research conducted in this area. The selection process is illustrated in the Preferred Items for Systematic Review Recommendations (PRISMA) (Moher et al., 2009) flow chart in Figure 2.
The following information was recorded from the relevant publications that met the inclusion criteria: location (i.e., country) where the study took place, educational theory employed, training subject, assessment level corresponding to the Kirkpatrick Model, and assessment methodology. Assessment tools were often referred to as questionnaires, surveys, or interviews. Each of these assessment types was recoded as "questionnaires." A case review summarizes the methods, assessment criteria, and results of the studies identified. The case review is created to provide context of the studies.

Kirkpatrick Model Synthesis
The assessment methodologies within the identified studies were linked to the corresponding guidelines established by the Kirkpatrick Model. The assessment methods within each training program study were evaluated, first to determine the corresponding Kirkpatrick Level, and second to identify adherence to the Kirkpatrick guidelines (Kirkpatrick, 1996) for each level.

Survey Science Synthesis
The identified studies that provided the text of the questionnaires administered to training participants were evaluated against the survey science best practices summarized by Lietz (2010 total occurrence of each practice is enumerated so that more common practices are identified.

Construction Industry Training Assessment Framework
The assessment review culminates in the presentation of a framework of optimal practices identified through the synthesis of assessment criteria used in the construction industry training studies and survey science best practices, aligned with the Kirkpatrick Model. The framework includes a summary of Kirkpatrick Model guidelines and practices resulting from the synthesis of identified construction literature and established survey science.

RESULTS, ANALYSIS, AND DISCUSSIONS Study Selection and Evaluation
Fifteen studies describing education theory-integrated construction industry training met the inclusion criteria selected, listed in alphabetical order in Table 1. A short summary of assessment criteria used in each study is provided in the following case review and corresponding ties to the Kirkpatrick Model are established.

Study Number 1
Akanmu et al. (2020) implemented a virtual reality (VR) training focused on reducing construction worker ergonomic risks. The primary assessment method was participant feedback through a questionnaire with both rated questions (1 strongly disagree, 5 strongly agree) and open-ended questions, meeting Level 1 standards. Rating questions gauged whether the user interface for the postural training program interfered with the work surface (mean 2.4), whether the virtual reality display affected performance (mean 2.7), whether the display was distracting (mean 1.3), and whether the avatar and color scheme enhanced their understanding of ergonomic safety (mean 1.2). In openended questions, 9 out of 10 participants reported that the VR training helped adjust posture. Two out of ten participants complained that the wearable sensors obstructed movement. The study did not publish the assessment questions directly, and only provided results; therefore, they were not analyzed for survey science best practices outlined by Lietz (2010). It should be noted that mean scores of 1.3 and 1.2 do not appear to be positive as they favor the strongly disagree rating based on the key provided. Additionally, the exact open-ended question text is not provided, and the article states that they are asked to encourage improvement of training in the future. This does not follow established survey guidelines, as this question will not yield quantifiable results.

Study Number 2
Begum et al. (2009) administered a survey to local contractors in Malaysia to measure the attitudes and behaviors of contractors toward waste management, categorizing this assessment as Level 1. The results found a positive regression coefficient (β 2.006; p 0.002) correlating education to contractor waste management attitude; making education one the most significant factors found in the study. The study did not provide the actual questions asked on the questionnaire, but instead stated that the following "attitudes" were assessed: general characteristics, such as contractor type and size; waste collection and disposal systems; Ensure that the responses are anonymous to encourage honesty Allow for additional comments where participants can freely express their views waste sorting, reduction, reuse and recycling practices; employee awareness; education and training programs; attitudes and perceptions toward construction waste management and disposal; behaviors with regard to source reduction and the reuse and recycling of construction waste. With this information, it is difficult to determine how closely questionnaire guidelines were followed.

Study Number 3
Bena et al. (2009) assessed the training program delivered to construction workers working on a high-speed railway line in Italy. The assessment analyzed injury rates for workers before and after training and found that the incidence of occupational injuries fell by 16% for the basic training module, and by 25% after workers attended more specific modules. This is a Level 4 evaluation because the overall organizational outcomes were assessed.

Study Number 4
Bhandari and Hallowell (2017) proposed a multimedia training that integrated andragogy (i.e., adult learning) principles to demonstrate the cause and effect of hand injuries during construction situations, focusing on injuries caused by falling objects and pinch-points. A questionnaire asked participants to rate the intensity of different emotions using a 9-point Likert scale both before and after the training simulation was distributed. Overall, workers reported a statistically significant increase in negative emotions such as confusion (p 0.01), fear (p 0.01), and sadness (p 0.01) after they had been trained. Statistically significant decreases in positive emotions such as happiness (p 0.01), joy (p 0.01), love (p 0.01), and pride (p 0.01) were also reported by trainees. Because gauging trainee response are the main assessment tool, this is classified as a Level 1 evaluation. In total, eighteen emotions were assessed, making the survey rather lengthy and possibly inducing cognitive fatigue or confusion. Additionally, a 9-point Likert scale adds a wide range of possible options to choose from, which is higher than the recommendation by Taherdoost (2019) of a 7-point scale. A shorter survey with fewer options might improve the results generated by this study.

Study Number 5
Bressiani and Roman (2017) used andragogy to develop a training program for masonry bricklayers. Questionnaires used to assess the participant feedback found that andrological principles were met in more than 92% of responses. Because guaging trainee response are the main assessment tool, this is classified as a Level 1 evaluation. The study presented training participants with a 24question survey found in the appendix of their study. The questions themselves are short, simple, and pertain to a singular topic, complying with survey best practices. However, the response options are given on a 0-10 scale. Similar to Bhandari and Hallowell's 9-point scale, this number of response choices can add confusion and complexity when respondents answer the questions.

Study Number 6
Choudhry (2014) implemented a safety training program based on behaviorism. Safety observers monitored the use of personal protective equipment (PPE) such as safety helmets, protective footwear, gloves, ear defenders, goggles or eye protection, and face masks over a 6-week period. Safety performance in the form of utilization of PPE increased from 86%, measured 3 weeks after training, to 92.9%, measured 9 weeks after training. This is classified as a Level 3 evaluation because behavior changes were observed and noted. Further, external observers were used and data were collected over time, adhering to Kirkpatrick Level 3 guidelines.

Study Number 7
Douglas-Lenders et al. (2017) found an increase in self-efficacy of construction project managers after a leadership training program was administered. This assessment was conducted through a questionnaire that presented questions on a 5-point Likert scale; which was used to gauge trainee self-perception as a result of training. Learning confidence, learning motivation, and supervisor support received average scores of 4.23, 3.86, and 3.84 respectively from training participants. Because surveys are the main assessment criteria this is classified as a Level 1 evaluation. The study did not publish the assessment questions directly, and a Note: ✓ indicates the best practice was met; 'x' indicates the best practice was not met; '-' indicates adherence to the best practice could not be assessed.
Frontiers in Built Environment | www.frontiersin.org August 2021 | Volume 7 | Article 678366 only provided results; therefore, they were not analyzed for survey science best practices.

Study Number 8
Eggerth et al. (2018) evaluated safety training "toolbox talks," which are brief instructional sessions on a jobsite or in a contractor's office. The study involves a treatment group that experienced training, as well as a control group answered a questionnaire. The trained group rated the importance of safety climate statistically significantly higher than the control group (p 0.026). Because guaging trainee response are the main assessment tool, this is classified as a Level 1 evaluation. Sample questions are recorded in the study, however, the questionnaire in its entirety is not presented. However, based on the sample questions, it is likely that the questionnaire generally falls in line with survey standards.

Study Number 9
Evia (2011) evaluated computer-based safety training targeted toward Hispanic construction workers. Based on interviews with the participants, a positive reaction to the training with significant knowledge retention was achieved. This study also did not present the questionnaire in its entirety; however, it is mentioned that the evaluation measured reaction. Workers were able to give ratings such as "very interesting," and "easy" with regards to a video watched during the training; however no numerical assessment was given. Because guaging trainee response are the main assessment tool, this is classified as a Level 1 evaluation. The study did not publish the assessment questions directly; therefore, they were not analyzed for survey science best practices. Questionnaires that were administered to the training participants indicate demonstrated improvements in safety knowledge. The results found a statistically significant knowledge gain for the questions regarding fall prevention and grounding from the pre-training and post-training questionnaires (p 0.0003). This type of evaluation is classified as Level 2 because the learning outcomes of training were measured. The pre-training and post-training testing guidelines appear to have been met throughout this study.

Study Number 11
Goulding et al. (2012) present the findings of an offsite production virtual reality training prototype. Feedback of training was requested, and the feedback was summarized as being positive. Because guaging trainee response are the main assessment tool, this is classified as a Level 1 evaluation. No numerical assessment was provided and the study did not publish the assessment questions directly; therefore, they were not analyzed for survey science best practices.

Study Number 12
Mehany et al. (2019) evaluated a confined space training program administered to construction workers. A test was administered to the training participants and the results found that the participants scored below average, even after attending the training on the subject. A score of 11/15 is taken to be the United States national average. The participants scored an average of 9.3/15. This average was further broken into a nonstudent sample (industry professionals) that scored an average mean of 8.3 and a student sample that scored 9.5. This is classified as a Level 2 evaluation because the learning outcomes of training were measured. Diversity in the population of examinees provided the authors with interesting analysis opportunities and the ability to speculate on the difference in scores between the two groups, which is desirable in learning evaluations.

Study Number 13
Lin et al. (2018) used a computer-based three-dimensional visualization technique, designed by adult education subject matter experts, to train Spanish-speaking construction workers on safety and fall fatality. Interviews were conducted to evaluate the training program. 64-90% of English-speaking workers achieved the intend results, 73-83% of Spanish-speaking workers achieved the intended results. 100% of Spanishspeaking workers reported that they would recommend the training materials to others while only 46% of Englishspeaking workers reported that they would recommend the training materials to others. Because both interviews and tests were conducted this is classified as a Level 1 and Level 2 evaluation. From a Level 1 perspective the study presents the results in an "evaluation of validation" format without referencing the exact questions asked. This makes it difficult to assess how closely question format guidelines were followed. From a Level 2 perspective a set of questions to assess knowledge gain is presented. Both English and Spanish speaking participants were tested. Six questions were included on the test to assess participant knowledge gain after the training. Similar to the previous study, the diversity in the populations provides analysis opportunities to assess learning outcomes as a result of training.

Study Number 14
Lingard et al. (2015) evaluated the use of participatory videobased training to identify safety concerns on a construction jobsite. As a result of this training, new health and safety rules were generated by participants. The training was based on viewing the recordings and success was measured by workers' ability to establish new safety guidelines to enable compliance. Because feedback was taken into consideration this is classified as a Level 1 evaluation. This study culminated in the participants sharing their reactions to the training in a group setting. While the reactions were captured, the study did not publish the assessment questions directly; therefore, they were not analyzed for survey science best practices.

Study Number 15
Wall and Ahmed (2008) explore a training delivered to Irish construction project managers on construction management computerized tools. Participants reported the program increased their understanding of construction problems and decisions. Because participant feedback was gathered this is classified as a Level 1 evaluation. However, the study did not capture participant responses in an explicit way, but rather it was presented that feedback was favorable and no numerical assessments were presented.

Case Review Summary
This case review found that ten studies (67%) used surveys, questionnaires, or interviews to assess the training programs, three studies (20%) measured learning by administering tests to training participants, one study measured changes in behavior resulting from training, and one study measured organizational impact a result of training. Attributes of the assessment methodologies that complied with Kirkpatrick standards or established survey science best practices were noted as positively complying with Level 1 assessment standards, which are summarized in the survey science synthesis. Studies that complied with Level 2-4 standards typically complied with the guidelines set forth by Kirkpatrick, however it is surprising that so few studies utilized these methodologies. This is especially the case with Level 4 evaluation standards. Organizations ultimately seek to understand how training might impact performance on an organizational level; yet of the 15 studies analyzed, one complied with this standard of evaluation. Gaps identified in the review of the studies inspired the guidelines outlined in the Construction Industry Assessment Framework presented in this paper.

Kirkpatrick Model Synthesis
Although the first two Level 1 guidelines were excluded from the analysis, amongst the remaining three Level 1 guidelines, one study (Akanmu et al., 2020) included all three assessment guidelines, while seven studies met two Level 1 guidelines, and one study met one Level 1 guideline. The three studies that met Level 2 guidelines were identical in that they excluded the use of a control group and adhered to all other guidelines. Similarly, the only study (Choudhry, 2014) that met Level 3 guidelines excluded the use of a control group and adhered to all other guidelines. One study (Bena et al., 2009) provided a Level 4 evaluation that met all associated guidelines. This information is shown in Table 2.

Survey Science Synthesis
Of the studies that used Level 1 criteria for their assessment methodology, two (18%) provided the text of the survey questions presented to training participants. The remaining studies did not publish the assessment questions directly. Bressiani and Roman (2017) presented the questionnaire in its entirety. All survey science recommendations summarized by Lietz (2010) were met except for guarding against social desirability, implementing a reasonable response scale, and allowing for additional comments. Eggerth et al. (2018) only presented sample questions from the questionnaire distributed to participants, however, all survey recommendations that could be analyzed were met. Analysis of the response scale reveals that of the five studies that provided their scales, two (40%) adhered to optimal scale standards of seven or less. 64% of studies provided results that could be quantified. 25% of studies that were analyzed for allowing additional comments were found to have done so.
The percentage was derived by dividing the number of times a practice was met by the number of times a practice was not met. When a practice could not be assessed for a study, this field was excluded from the calculation. This information is shown in Table 3.

Construction Industry Training Assessment Framework
Survey results may be skewed by the questions asked (Dolnicar, 2013), and poorly written questions often result in flawed data (Artino, 2017). When one considers that most construction industry training studies evaluate efficacy by attempting to collect the reaction of participants, it is important that the questions asked be made available for future study and analysis. For this reason, the framework provides extensive recommendations to improve Level 1 analyses. Additionally, because only 20% of studies that used questionnaires as their means of assessment provided the questionnaire text, the current adherence of Level 1 construction industry training assessment best practices remains widely unknown. Moving forward, it is of the utmost importance that this information be provided to support robust Level 1 assessment. Additionally, Taherdoost (2019) recommends a 7-point Likert type scale as to not overwhelm participants with a high number of response options. When composing open-ended questions, efforts should be made to frame the questions in a way that will yield results that are quantifiable.
While analysis of open-ended questions is rare, the results can be very valuable (Roberts et al., 2014). Due to the lack of complete survey question text included in most studies, it is recommended that survey questions be contained within training studies so that the results can be fully analyzed. The simplest method for analyzing learning development as a result of a training program is an evaluation to be administered before and after a training program (Kirkpatrick, 1996). Kirkpatrick recommends the use of a control group. However, in literature it was observed that a control group was rare. Cost, resources, and time could be contributing factors, however, for the sake of analysis these circumstances should be made clear. The study presented by Mehany et al. (2019) measured learning outcomes against an industry wide average, which provides a benchmark for the results of a given training program. If possible, this should be the norm, as it gives a standard by which a given training program is analyzed. Several studies analyzed the evaluation results for statistical significance. This should be done when possible to lend more credibility to the results.
To measure the extent to which training participants change their workplace behavior, observations are collected over time. Similar to the learning level, a rationale should be provided when a control group is not used. The study presented by Choudhry (2014) details the intervals at which observations are made. This should be standard practice and measurements at these intervals should be reports so that a progression can be seen. Additionally, is it known that people may change their behavior unexpectedly if they know that they are being observed (Harvey et al., 2009), and for this reason, observations should be made as inconspicuously as possible.
When measuring organizational performance, the same care to rationalize the lack of a control group should be included in a training study; as is the recommendation for the Learning and Behavioral Change levels. While Kirkpatrick includes common metrics for measuring training effectiveness at this level such as decreased cost or increased revenue, these metrics are not always clearly defined. The metric by which an organization would like to measure effectiveness should be clearly identified in a training study. To accurately organizational change, pre-training levels must be noted. Kirkpatrick (1996) notes that factors other than training may also affect overall organizational performance. These factors should be identified and noted in a training study.
With this information in mind, the construction industry training framework (Table 4) is aligned using Kirkpatrick Model guidelines with the additional knowledge acquired by the synthesis of the identified studies and survey science best practices. Gaps found in the studies, such as the lack of information surrounding how survey questions were chosen, contribute to the framework by emphasizing this type of information that was notably missing across all studies analyzed.

CONCLUSION
This study provides a comprehensive literature review of educational theory-integrated construction industry training focusing on assessment methodologies used in construction industry training literature. Assessment practices identified through case review were compared against the Kirkpatrick Model, a well-known and widely used assessment model. Assessment methodologies in the literature were synthesized with corresponding levels found in the Kirkpatrick Model to analyze how closely the industry adheres to established training evaluation standards. The studies that utilized questionnaires as their means of assessment and provided the text of the questions asked were evaluated against survey science best practices. This study culminates in the creation of a training assessment framework by extracting the practices used in the identified studies so that future assessment methodologies can be implemented, tested, and presented effectively, thus advancing construction industry training. The specific findings of this study are that two-thirds (67%) of identified studies used surveys, questionnaires, or interviews to assess training efficacy. Of the studies that met the inclusion criteria, 73% (11/15) were designed to assess reaction, 20% (3/15) assessed learning, and 7% (1/15) assessed each behavior and organizational impact. Kirkpatrick Levels 2 to 4 assessments implemented in construction literature typically met the Kirkpatrick guidelines; however, Level 1 guidelines were met by 18% (2/11) of the studies. Two of the ten studies (20%) that used questionnaires to assess training efficacy provided question text, and of these, one study followed survey science best practices completely. The following survey science best practices are typically not integrated: accounting for social desirability, implementing a reasonable response scale, and allowing for additional comments. Finally, archival construction industry training literature and survey science best practices were synthesized and aligned with Kirkpatrick (1959) Techniques for Evaluating Training Programs to create a framework for construction industry training assessment. Level 1: Reaction Design survey questions that will ensure the collection of relevant data from participants in a manner that can be quantified, allowing for anonymity and additional participant feedback • To provide justification for survey results, present the process of identifying relevant information to be gathered by the surveys • Generate questions that will encourage training participants to provide information that is relevant to the training designers • Adhere to survey science best practices outlined in this paper • Develop questions so that results may be quantified. Likert type scales should be no more than seven points to avoid confusion of participants • While open-ended questions are encouraged, they should be framed in a way so that the responses are quantifiable • Include survey question text in descriptions of the training (e.g., journal publications) to add to the body of knowledge Level 2: Learning Create evaluations for training participants that can be completed before and after a given training to measure learning progress. Analyze the results and determine the statistical significance of changes in knowledge • Rationalize the lack of a control group if one is not utilized • If possible, determine an industry average of test results to compare the results of trainees to the average of the overall industry • Analyze the learning outcomes for statistical significance for each individual question so that specific learning outcomes can be identified, and improvement can be made where no significance is found Level 3: Behavioral Change After the allotment of ample time for participants to change their behavior following training, conduct observations and interviews with regular observers to quantify the change in behavior, repeating the evaluation at appropriate intervals • Rationalize the lack of a control group if one is not utilized • Provide time intervals of when behavioral observances occur so change in behavior can be monitored over time • If possible, monitor behavioral changes discretely so that participants are not only changing their behavior when they are being observed Level 4: Organizational Performance After allowing ample time for results to be achieved, measure the output before and after training • Rationalize the lack of a control group if one is not utilized • Generate a metric for organizational performance prior to training implementation so data can be more easily collected • Be sure to note pre-training performance levels so changes in performance can be measured • Identify other factors that may contribute to changes in performance to isolate the effect of training as a factor