Defining a screening tool for post-traumatic stress disorder in East Africa: a penalized regression approach

Background Scalable PTSD screening strategies must be brief, accurate and capable of administration by a non-specialized workforce. Methods We used PTSD as determined by the structured clinical interview as our gold standard and considered predictors sets of (a) Posttraumatic Stress Checklist-5 (PCL-5), (b) Primary Care PTSD Screen for the DSM-5 (PC-PTSD) and, (c) PCL-5 and PC-PTSD questions to identify the optimal items for PTSD screening for public sector settings in Kenya. A logistic regression model using LASSO was fit by minimizing the average squared error in the validation data. Area under the receiver operating characteristic curve (AUROC) measured discrimination performance. Results Penalized regression analysis suggested a screening tool that sums the Likert scale values of two PCL-5 questions—intrusive thoughts of the stressful experience (#1) and insomnia (#21). This had an AUROC of 0.85 (using hold-out test data) for predicting PTSD as evaluated by the MINI, which outperformed the PC-PTSD. The AUROC was similar in subgroups defined by age, sex, and number of categories of trauma experienced (all AUROCs>0.83) except those with no trauma history- AUROC was 0.78. Conclusion In some East African settings, a 2-item PTSD screening tool may outperform longer screeners and is easily scaled by a non-specialist workforce.


Introduction Mental health and trauma disorder treatment gap in Sub-Saharan Africa
The Global Burden of Disease (GBD) studies, launched in 1996, were some of the first studies of health disability and illuminated the massive, worldwide health impact of mental disorders (1).Although mental disorder disability is driven overwhelmingly by common mental disorders such as depression, anxiety and trauma-related conditions (2,3) which have well established treatments, access to treatment is so limited in low-and middleincome countries (LMICs) that an extraordinarily 75% of people with serious mental disorders never receive any treatment at all ("treatment gap") (4).The situation in Sub-Saharan Africa (SSA) is particularly extreme, with the treatment gap reaching over 90% in some regions (5,6).Epidemiologic models predict that the disability burden from mental disorders in SSA will increase by 130% in the next 40 years (7,8).
Posttraumatic Stress Disorder (PTSD) in SSA is driven by the high incidence of traumatic stressors, including armed conflict, political violence, traumatic bereavement and domestic violence (9).Estimates of probable PTSD in the general population in SSA reach as high as 30% (9).Reduction of PTSD at a population level requires both scalable models of evidence-based treatment and practical screening tools.Sustainable strategies for improving public sector access to first line PTSD care delivered by locally available, non-specialist providers have progressed in recent years, including studies in SSA (10, 11).However, the ability to scale up care is hampered by the lack of pragmatic and scalable PTSD screening measures that have been validated in these settings (12).Screening tools are not expected to improve mental conditions.Rather, they serve to identify individuals in need of treatment-the first, crucial step toward recovery.

Study goal
The goal of this study was to develop a practical screening instrument that can be used to identify adults with probable PTSD in East Africa-the first step toward closing the PTSD treatment gap.We leveraged data from our implementation research study in Kenya.Using a structured diagnostic interview as a gold standard, we test items from the Posttraumatic Stress Checklist-5 (PCL-5, 20 items) (13) and the Primary Care PTSD Screen for the DSM-5 (PC-PTSD, 5 items), a commonly used PTSD screen in High Income Countries (HICs) (14).

Screening for post-traumatic stress disorders in East Africa
We ran a large implementation science study of scalable strategies for delivering major depression and/or PTSD treatment in western Kenya (n = 2,162): the Sequential, Multiple Assessment Randomized Trial (SMART) for non-specialist treatment of common mental disorders in Kenya: Leveraging the Depression And Primary care Partnership for Effectiveness-implementation Research (DAPPER) (15).As part of SMART DAPPER activities, we sought to identify a practical PTSD screening instrument that could be used by existing clinical staff for regional hospitals seeking to initiate their own mental health treatment programs.
Cultural differences are well-known to affect the experience and expression of mental disorders, and trauma-disorders have some of the highest variability (16)(17)(18).SMART DAPPER uses three different measures of PTSD and assesses for convergent validity.All measures are translated to local languages of Dhluo and Kiswahili, using established methodology (19).
Mini international neuropsychiatric interview (MINI 7.0.2)-PTSDmodule (20): The current version of the PTSD module queries PTSD symptoms per DSM-5 diagnostic criteria, over the past month.While we regard the MINI as our gold standard for assessment of PTSD, it is too lengthy to be used as a screening instrument at scale.
Posttraumatic stress checklist−5 (PCL-5) (13): The PCL-5 is a self-report questionnaire to assess symptoms of PTSD based on DSM-5 criteria (13): The PCL-5 includes 20 questions that measure DSM-5 Criteria B-E over the past month, with each question measuring symptom severity on a Likert scale from 0 (not at all) to 4 (extremely), and total scale ranging from 0 to 80.
The primary care-PTSD-5 screen (PC-PTSD-5) is a short PTSD screen based on DSM-5 criteria (14).The PC-PTSD-5 includes 5 questions that measure DSM-5 Criteria B-E over the past month, with each question on a binary scale (1 = Yes or 0 = No).Items are summed with a range from 0 to 5.
Trauma history questionnaire (THQ) (21): The THQ consists of 24 items and assesses lifetime exposure to potentially traumatic events in the following categories: crime, general disaster, physical/sexual assault, and other.Given the association between trauma exposure number and type with risk of PTSD (22,23) we scored the THQ according to the totals, sub-types and number of different types of lifetime trauma: 0, 1, 2, 3 or more (Table 1).

Sample population
SMART DAPPER enrollment eligibility required a positive diagnosis of major depression and/or PTSD, using corresponding MINI modules.The PCL-5 and PC-PTSD-5 were collected at baseline, 6 weeks and 3, 6, 9, 12, 18, 24, 30 months post-baseline.To evaluate a distribution with more negative diagnoses, the data set for this project included 13,099 records collected between September 2020 and March 2022.

Analysis
We first randomly divided our dataset into a 30% test dataset and 70% development dataset.In our 70% development dataset we further randomly subdivided it into training (2/3) and validation (1/3) subsets.The validation subset was used to choose the optimal value of the shrinkage parameter in each of the regression models.After evaluating the performance of the models in the development HIV   dataset we chose a small number of models to balance performance and brevity of the screener.The test dataset was reserved to measure the performance of this small set of final models in an unbiased way.We used the LASSO (least absolute shrinkage and selection operator) to select our models since it is a modern machine learning method that allows simultaneous variable selection and coefficient estimation.We preferred the LASSO over other machine learning methods (e.g., random forests) because of the ease of interpretation and ease of application in low resource settings.

Number of participants in an intimate relationship
We used PTSD as determined by the MINI PTSD as our gold standard outcome and considered predictor sets of (a) the 20 PCL questions, (b) the 5 PC-PTSD-5 questions (since this is an accepted short screen by itself) and (c) all 25 questions from both PCL and PC-PTSD.We used the individual questions as predictors to give maximum flexibility to the fitting and to allow consideration of screening tools with very few questions.A logistic regression model using LASSO was fit by minimizing the Average Squared Error (ASE) in the validation data.
We then examined the best fitting models and considered simplified versions either by rounding coefficients to integers to make them easier to use in practice or making them binary (above or below a cut-point).We evaluated their performance using validation data by calculating area under the receiver operating characteristic (AUROC) curve and sensitivities and specificities at various cutoffs.
Finally, we carried forward models that balanced ease of use and performance and assessed their performance using the reserved test dataset.The performance was assessed both overall and by subgroups defined by sex, age and trauma exposure.All analyses were conducted using SAS Version 9.4.

Results
Overall, participants ranged in age from 18 to 85 years with an average age of 35.8 (11.0) and were predominantly female [n = 1,785 (91.1%)].The training, validation and hold-out test datasets were very similar, Table 1.

Lasso fitting
Figure 1 shows the Average Squared Error (ASE) as questions were added to the model for the PC-PTSD questions (1a), the PCL questions (1b) and the combined set of questions (1c).Each individual dot (Figure 1) is a separate LASSO model fit with different shrinkage parameters.The number of questions in the model is indicated on the horizontal axis.For the combined set of questions (1c), the optimal model contained 22 of the 25 questions, but the first nine questions entered in the model all came from the PCL.For the PCL questions only (1b), the optimal model contained 17 of the 20 questions and for the PC-PTSD questions, the optimal model contained all five questions.Table 2 gives the details of the three sequential LASSO fits.
Area under the ROC curve for selected models using the validation data (Table 3).
Since the analysis using the combined set of questions did not enter any of the PC-PTSD questions until the 10th question, it suggested we might prefer to base the screener on just the PCL question set.Also, since the curves in Figures 1A, C showed the fastest reduction in ASE with very few questions in the model, it suggests we might achieve good performance with very few questions.Accordingly, we calculated the AUROC for a number of models: (a) the best two-question, four-question, six-question and 12-question screener based on the PCL questions, and (b) the best two-question, four-question and the full set of PC-PTSD questions.The values of AUROC are given in Table 3 under the headings of PCL and PC-PTSD.As expected, the questions based on the PCL performed much better than the PC-PTSD.Even using the full five questions from PC-PTSD only achieved an AUROC of 0.79.
Table 3 also shows that there is very little performance lost by using a short screener.The model using only 2 questions had an AUROC of 0.84, only slightly less than the model using 12 questions (AUROC of 0.86).We therefore explored simplified versions of the PCL screeners, adding the values of the questions ("PCL additive" in Table 3) or by rounding the LASSO fit coefficients to round integers ("PCL rounded" in Table 3).In all cases, simply adding the values of the coefficients performed nearly as well as using the LASSO coefficients.We also explored counting how many of the questions were equal to or above 3 ("PCL 3 or above" in Table 3) or how many of the questions were equal to or above 2 ("PCL 3 or above" in Table 3).Those performed less well than adding the Likert scale values.
. /fpubh. .Assessment of final models using the hold-out test dataset The excellent performance of the simplified versions of the short screeners meant that we had very few final models to assess using the hold-out test data.Those were the two and four question versions using the PCL and the corresponding additive and rounded versions.The AUROCs for those models are given in Table 3.The simple screener, which adds the Likert scale values for two PCL-5 questions-Repeated, disturbing, and unwanted memories of the stressful experience (PCL-5 item #1) and Trouble falling or staying asleep (PCL-5 item #21)-had excellent performance, with an AUROC of 0.85, slightly better even than the training data.

Assessment of final model by subgroup
Ideally, a screening tool will work well across different subgroups of a population.We therefore calculated the AUROCs using the hold-out test data separately for key subgroups.Men and women had AUROCs of 0.86 and 0.85, respectively.When broken down by age categories (18-85) the AUROCs were 0.84, 0.85, 0.86, and 0.84, respectively.When broken down by number of categories of trauma (0, 1, 2, 3 or more) the AUROCs were 0.78, 0.83, 0.86, and 0.86, respectively.Except for the no-trauma case, these were all comparable to the overall performance.While PTSD is highly co-morbid with depression (24, 25) evaluation of participants with only PTSD diagnosis (MDE negative) could provide useful information on the performance of the algorithms.PTSD instruments measure several symptoms of depression given some overlap of criteria.We therefore conducted a subgroup analysis comparing participants with PTSD and no MDE to all other combinations (PTSD and MDE, MDE alone and neither MDE nor PTSD).The performance of the screener was strong in both the PTSD only group (AUROC of 0.864) and in the remainder (AUROC of 0.849).

Discussion
Penalized regression analysis suggested that a pragmatic and simple screening tool that adds the Likert scale values from two PCL-5 questions pertaining to intrusive thoughts of the stressful experience and insomnia worked well across subgroups defined by age, sex, and number of categories of trauma experienced.Intrusive thoughts and insomnia may be strong predictors of PTSD in this population.
Interestingly, these findings align with emerging data on risk factors associated with PTSD.An observational study of Emergency Department patients in Oxford, UK showed that sleep disruption immediately following trauma exposure was significantly associated with greater numbers of intrusive memories and higher risk of PTSD 2 months later (26).A recent metaanalytic review of eight experimental studies involving planned trauma exposure and sleep manipulation found that sleep reduced intrusive memory frequency (27).Researchers hypothesize that sleep disruption interferes with memory consolidation, which leads to more intrusive memories and higher risk of PTSD.
A priori we expected that the PC-PTSD would perform well, given its strong validation data, and wide-spread use, including LMIC settings.In the SMART-DAPPER Kenyan primary care population, the PC-PTSD did not correlate well with PTSD as diagnosed by the MINI.

Implications
Within the past few years, the full PCL-5 has been validated in Rwanda, Africa (28).Given the strong discrimination metrics observed in this study with 2 items from the PCL-5 in Kenya, the utility of this brief PTSD screening tool may be regionally generalizable, and may also be useful in LMICs outside of SSA.

Limitations
There are limitations to consider.Most notably, the algorithms were trained and tested using the SMART DAPPER study data and might show bias to the population enrolled in the study.For example, given the variability of PTSD symptom expression across cultures (16), the results may not generalizable outside of this study population.Further research on the proposed PTSD screener in other parts of Sub-Saharan Africa and international locations would provide valuable information on the generalizability to other contexts and populations.We also note that the SMART DAPPER study consisted primarily of females and may therefore lack generalizability to male populations.The study was open to males and females-aiming to match "real life" conditions of those seeking treatment in a primary care setting, we refrained from enriching the sample to achieve gender balance.The effect of gender on health seeking behavior is well established, with psychological, sociological and programming biases cited as potential sources drivers of low engagement of men (29)(30)(31).Future evaluations of this screener should include populations with higher male healthcare seeking behavior.

Conclusion
A 2-item short version derived from the PCL-5 had excellent performance for identifying probable PTSD in our study population.This scale was significantly more accurate than a commonly used instrument for PTSD screening, the PC-PTSD.This tool has the potential to improve screening for PTSD in high-burden SSA clinical populations.Accurate, efficient screening would facilitate narrowing of the current PTSD treatment gap and improved population health.

FIGURE
FIGUREAveraged squared error vs. number of questions included in the LASSSo fit.(A) for the PC-PTSD questions only (B) for the PCL questions only.(C) for the combined PCL and PC-PTSD questions.
TABLE Descriptive statistics for individuals included in the training, validation, and hold-out test data.
TABLE Values of area under the receiving operating characteristic curve (AUROC) using validation data and hold-out test data.