Pemphigus Oral Lesions Intensity Score (POLIS): A Novel Scoring System for Assessment of Severity of Oral Lesions in Pemphigus Vulgaris

Background: Many patients with pemphigus vulgaris (PV) in India present with predominant/exclusive oral mucosal lesions. Current validated scoring systems for pemphigus do not adequately represent the clinical variability of oral lesions. Objective: To develop and validate a novel scoring system exclusively for oral lesions in PV. Methods: In this cross-sectional study, the Delphi method was used to build an initial scale that was administered in 115 patients with PV. Exploratory factor analysis was used to examine the underlying factor structure of the new scale. The psychometric properties of the new scale were studied. Correlations between the new scale and Autoimmune Bullous Skin Disorder Intensity Score (ABSIS), Pemphigus Disease Area Index (PDAI), and Physician Global Assessment (PGA) were also assessed. Results: Content validity of the initial scale was established with an average content validity index (CVI) of 0.8. Exploratory factor analysis resulted in a 3-factor structure with a total of 9 items. Corrected item-total correlation, a measure of data quality, was more than 0.30 for all items in the new oral mucosal scale—Pemphigus Oral Lesions Intensity Score (POLIS). Significant correlations were observed between POLIS and oral ABSIS (r = 0.85, p < 0.001), mucosal PDAI (r = 0.70, p < 0.001), and PGA (r = 0.60, p < 0.001). POLIS was also reliable with good internal consistency (Cronbach's α = 0.86) and strong inter-rater agreement. Limitations: The study cohort included participants from a single center. Usability and time taken to administer the scale were not assessed. Conclusions: The new scale, POLIS, has adequate validity and reliability. It includes both quality of life and clinical disease severity parameters, assessing disease severity holistically. Further studies evaluating the scale's responsiveness to change are in progress.


INTRODUCTION
Pemphigus is common in the Indian population, and they display distinct clinical and epidemiological qualities compared to western society (1). Oral mucosa is often the first and the only site to be involved in a significant proportion of patients with pemphigus vulgaris (PV) (1). Deep crater-like lesions, erosions with lichenoid hue, and lesions located on the retromolar trigone and occlusion line of buccal mucosa are refractory to treatment (2). Oral lesions tend to persist longer, require prolonged treatment course, and are associated with increased morbidity, leading to impaired quality of life. Additionally, oral lesions often heal with a reduction in depth or size. This is in contrast to cutaneous lesions, which usually improve with a decrease in lesion count and show early response to treatment.
Validated scoring systems (like standardized laboratory values) are essential for objective and accurate assessment of clinical severity, prognostication of disease, deciding therapeutic options, and maintaining the homogeneity of outcome measures in clinical trials. Autoimmune Bullous Skin Disorder Intensity Score (ABSIS) (3) and Pemphigus Disease Area Index (PDAI) (4) are validated composite scales, evaluating both cutaneous and mucosal disease severity in PV. The development of a scoring system is as crucial as establishing its validity. Poorscale construction brings into question the reliability and validity of the results even though the study is carefully planned. ABSIS and PDAI scales were developed based on the consensus reached through open discussion by a group of dermatologists with expertise and a special interest in pemphigus (4). The oral ABSIS subscale is a modified version of the previously described grading system by Saraswat and Kumar which consists of two subscales categorizing extent and severity of oral lesions (3,5).
The Autoimmune Bullous Disease Quality Of Life (ABQOL) questionnaire was developed using standard and validated methods to measure the disease burden associated with autoimmune bullous diseases like pemphigus, bullous pemphigoid, mucous membrane pemphigoid, epidermolysis bullosa acquisita, and linear IgA bullous dermatoses (6). Low correlations were observed between ABQOL, PDAI, and ABSIS, advocating that impairment in quality of life is independent of visible disease severity (6). However, evaluation bereft of clinical disease severity assessment may not be adequate to grade a physical disease.
Oral Disease Severity Score (ODSS), which was published after the initiation of our study, has been validated for use in pemphigus patients with oral lesions (7). ODSS has its drawbacks of not being adapted from its original development for use in oral lichen planus and confusion arising out of usage of terms "erosion" and "ulceration" since the lesions cannot be deeper than the thickness of epithelium in pemphigus (8). There are no validated scoring systems that measure the severity of oral lesions exclusively except for ODSS which has drawbacks as mentioned earlier. Our study aimed to develop and validate an independent scoring system for assessment of the severity of oral lesions in PV.

METHODOLOGY Settings
We conducted this study in the immunobullous disease clinic, Department of Dermatology, Postgraduate Institute of Medical Education and Research, Chandigarh, from October 2017 to November 2018. Institute Ethics Committee approved the study (INT/IEC/2017/948).
Scale development is a multistep process (Figure 1), that includes three basic steps-item generation, data collection, and validation.

Item Generation
We generated the initial list of candidate items for evaluating disease severity through-a methodical review of existing literature from search engines like Pubmed and Google Scholar; informal discussion with 10 practicing dermatologists; a previous study from our center on factors influencing treatment responsiveness/refractoriness of oral lesions (2); and informal discussion with 10 patients of PV having oral mucosal lesions. The list included items representing both patients' perception of disease burden and physicians' observation.

Item Selection and Construction of the Initial Scale
As per readability and comprehensiveness, we structured 34 items to develop the initial questionnaire using https://docs. google.com/forms/u/0/. The initial questionnaire was mailed to experts (practicing dermatologists and experts in pemphigus working in India and elsewhere) for face and content validity. We maximized the follow-up reminders for expert responses with the help of reminder email followed by telephone call, if necessary. Experts rated the 34 items on a 5-point Likert scale (strongly disagree-disagree-neutral-agree-strongly agree) according to importance in determining disease severity of oral lesions in PV. They also had the option of adding new items to the list during the first round.
The four new items suggested by experts in round 1 were resent in round 2, to be rated similarly to round 1. In round 3, a questionnaire with 38 items and consolidated responses from previous rounds was sent to the same experts to reach a consensus. Only experts who completed the previous round continued to participate in subsequent rounds. Twenty-eight, 19, and 18 experts participated in rounds 1, 2, and 3, respectively. We weighted the items from (-2) to (+2) for "strongly disagree, " "disagree, " "neutral, " "agree, " and "strongly agree" categories. Finally, we developed an initial scale consisting of 16 items with score of 24 or more (maximum of 36) on weighted analysis, and we arranged the items systematically for administration in patients of PV with oral mucosal lesions.

Data Collection
We enrolled all diagnosed and consenting patients (either sex with age ≥ 18 years) of PV with oral mucosal lesions during the study period. PV was diagnosed based on clinical findings, Tzanck smear, histopathology, and direct immunofluorescence of oral mucosal biopsy. We photographed oral lesions of all participants and graded them using the initial scale, mucosal PDAI subscale, oral ABSIS subscale, and Physician Global Assessment (PGA) scale. The mucosal PDAI subscale evaluated the number and size of erosions at 12 anatomical sites (mucous membranes), and the severity score ranged from 0 to 120. Oral ABSIS subscale included an extent score and a severity score. The extent score was based on the absence (0) or presence (1) of lesions at 11 different anatomical sites in the oral cavity. The severity score graded the discomfort associated with certain foods. The scores ranged from 0 to 11 for extent and 0 to 45 for severity equating to a total maximum score of 56 for oral ABSIS subscale. PGA of disease severity included four categories-none, mild, moderate, and severe oral mucosal disease (0-3). We did not include ODSS as it was not published when we designed our study protocol. Patient responses to the 16 items in initial scale were graded from 0 to 4 (none-mild-moderate-severe-very severe) as mentioned below.
Item(I)-1-Number of relapse(s) of the disease: Total number of relapse(s) of the disease since onset. It can be the same as the number of relapse(s) of oral lesions in patients with disease restricted to the oral mucosa. Relapse was patientreported and defined by the appearance of new lesions that do not heal spontaneously within 1 week in patients who were in remission, on or off therapy.   We asked patients to rate their difficulty in eating normal food on a 5-point Likert scale-none/mild/moderate/severe/ very severe. I-8-Difficulty in eating food according to its consistency: We questioned patients if they had "no difficulty" (none) or difficulty in eating-raw solid diet/cooked solid diet/semisolid diet/liquid diet (mild/moderate/severe/very severe). Raw solid diet includes raw carrot, pear, peanut, and almond. Cooked solid diet includes "chapatis, " cooked mutton chicken, cooked vegetables, and steamed gram/corn. Semisolid diet consisted of porridge, "kheer, " mashed potatoes, and boiled pulses/sago. Liquid diet includes milk, buttermilk, fruit juices, and water. I-9-Difficulty in speaking: we asked patients to rate their difficulty in speaking on a 5-point Likert scalenone/mild/moderate/severe/very severe. I-10-Difficulty in brushing teeth: we asked patients to rate their difficulty while brushing teeth on a 5-point Likert scalenone/mild/moderate/severe/very severe. I-11-Difficulty in swallowing: we asked patients to rate their difficulty in swallowing on a 5-point Likert scalenone/mild/moderate/severe/very severe. I-12-Restricted mouth opening: the difficulty or discomfort associated with opening mouth fully was rated on a 5-point Likert scale-none/mild/moderate/severe/very severe. I-13-Number of mucosae involved: we counted the total number of mucosal surfaces involved, viz., ocular mucosa, nasal mucosa, oral mucosa including oropharynx, genital mucosa, and anal mucosa.
• None: no mucosal lesions • Mild: single mucosal involvement restricted to the oral cavity • Moderate: 2 mucosae involved • Severe: 3 mucosae involved • Very severe: ≥ 4 mucosae involved I-14-Number of sites involved in the oral cavity: we counted the total number of sites involved in the oral cavity, viz., hard palate, soft palate, oropharynx, tongue, floor of the mouth, upper and lower gingiva, upper and lower labial mucosa, and both buccal mucosa separately (total 11 sites).
• None: no oral lesions • Mild: 1 to 2 sites • Moderate: 3 to 5 sites • Severe: 6 to 8 sites • Very severe: 9 to 11 sites I-15-Overall size of erosions: we measured the size of oral erosions (longest dimension) using a ruler manually marked on disposable wooden spatula (Figure 2), and the total sum of sizes of individual erosions was taken as the overall size of erosions/ulcers. We estimated the size of erosion(s) located in hard to reach areas (like the oropharynx) by clinical examination alone. of pemphigus pathogenesis that the deepest part of erosion cannot go below the basal layer of the epithelium. For example, while evaluating erosions on the buccal mucosa at almost the same location in two different patients (Figure 3), we would grade the erosion in Figure 3A as superficial and the one in Figure 3B as deep. We further counted the number of superficial and deep erosions.

Validation
We ascertained the validity and reliability of the initial scale and eliminated items that did not meet the preset criteria if needed, to construct the final scale. Content validity refers to the extent to which the items/scale measures the behavior for which it was intended. We assessed the content validity index (CVI) for each item by calculating the average of experts who rated the item as "agree" or "strongly agree." CVI for the scale (S-CVI) is the average of CVI of all items. Researchers recommend an S-CVI of 0.80 or higher (9).
We assessed data quality by checking the corrected item-tototal correlation. It is the correlation between individual item and the total score. For a reliable scale, all the items should correlate with the total score. We eliminated items with correlation of <0.30 with the total scale score. "Cronbach's alpha if item is deleted" is the value to which Cronbach's alpha would increase or decrease to, following that item's deletion from the scale.
We used exploratory factor analysis (EFA) and oblimin rotation, depending on the interpretation of factor loading to evaluate the construct validity. EFA is a statistical technique within factor analysis applied to uncover the underlying structure of a relatively large set of variables or items and to identify the underlying relationship between these variables. The underlying or latent variable explaining the correlation among multiple items is known as "factor." Bartlett's test and Kaiser-Meyer-Olkin (KMO) criterion assessed the strength of relationship among items and sampling adequacy, respectively. We used parallel analysis, scree test, and Very Simple Structure (VSS) for factor selection, as each one has its own merits and demerits.
In the factor matrix, factor loading is the correlation between items and factors, and factor loadings of 0.32 or higher meet the minimal level for interpretation of structure (10). Item's communality (h 2 ) is the estimate of its common variance with other items, and specific variance (u 2 ) is the unique variance not shared with other items. We deleted items having crossloading (more than one significant factor loading) and items having low communalities (<0.40) (10). Items with low factor loadings have high unique variance that is not shared with other items and vice versa. After deleting such items, we carried out the re-specification of factor analysis to attain the final scale. We calculated absolute fit indices like root mean square error of approximation (RMSEA) and root mean square residual (RMSR) to verify how well the model fits or reproduces the data. RMSEA and RMSR values range from 0 to 1, with smaller values closer to 0.05 indicating better model fit. Tucker Lewis Index (TLI) is a relative fit index with values over 0.90 considered acceptable (11).
Convergent validity measures the correlation between two tests that are supposed to be measuring the same construct. It was measured by assessing the Spearman's rho correlation coefficients between the final scale scores, mucosal PDAI, oral ABSIS, and PGA scores. We evaluated the reliability of the final scale in terms of internal consistency using Cronbach's alpha. Cronbach's alpha values range from 0 to 1, and values between 0.60 and 0.70 are considered as the lower limit of acceptability (12). The final scale was administered in a new set of patients (n = 13) by two independent dermatologists (raters A and B), separately. We plotted the total scores of 13 patients using the Bland-Altman plot to assess the inter-rater agreement. We used the Psych package in R (R version 3.4.4) to analyze the data.

RESULTS
Response rates in the first, second, and final rounds of the Delphi questionnaire were 67.9, 94.7, and 100%, respectively. The strength of agreement between expert responses in rounds I (round 1+ round 2) and II (round 3) was measured using linear weighted kappa (K). The observed kappa as a proportion of maximum possible linear weighted kappa was 0.5227, indicating a moderate degree of agreement between experts (13). Kappa values of zero indicate agreement not better than chance, while positive values indicate agreement better than chance, and a negative value indicates agreement worse than chance. Despite moderate kappa for the overall agreement, we wanted to assess whether responses in desired categories such as "agreeagree" between rounds are better than responses expected by chance. The observed agreement between the experts cannot be attributed to chance alone except for the "strongly disagreestrongly disagree" category where values (expert responses) expected by chance (0.8) was marginally higher than the observed value (0) ( Table 1). S-CVI of the initial scale with 16 items was 0.8 signifying strong inter-rater agreement between the experts (9).

Clinical and Demographic Characteristics
The study cohort included 115 patients with new onset of oral erosions and those with persistent and/or recurrent oral erosions irrespective of the presence or absence of cutaneous lesions. The mean (± standard deviation) age (in years) at presentation was 42.71 ± 12.99. Women represent 62.5% of the study group. The median duration from the appearance of oral erosions to enrollment in the study was 5.5 months (IQR 2-12). We used the Sparkline chart, which is a hybrid chart having both the properties of a table (individual and accurate data) and a graph (visualize trend) to present the frequency of patient responses for  each item (Figure 4). Responses to most of the items were in the "mild" to "moderate" categories except for the item "change in the size of existing oral lesions in the last 1 week, " for which most of the responses were in the "very severe" category. There were no missing responses for any of the items. Study participants with extreme values, i.e., outliers, were identified using the Mahalanobis distance (D 2 ), based on a chi-square distribution. It measures how many standard deviations away a point is from the distribution in multivariate space, i.e., the relative distance between two variables with respect to the mean of the sample. Hence, the farther the variable is from the mean, the larger the D 2 is. The study participant 11 lies away from the mean with higher D 2 and is an outlier (Figure 5).
The median scores of oral ABSIS subscale, mucosal PDAI subscale, and PGA were 6 (IQR 3-22), [8][9][10][11][12][13], and 1 (IQR 1-2), respectively. As per our clinical experience, ABSIS and mucosal PDAI scores fail to change with treatment response. In a patient with a relatively deep erosion over the buccal mucosa (Figure 6A), the ABSIS and PDAI scores were 1/11 and 1/120, respectively. With treatment, the pain or difficulty associated with eating subsided entirely within a month. ABSIS and PDAI did not capture further improvement in terms of decrease in size and depth of the erosion on monthly follow-up visits (Figures 6B,C) as the scores remained the same.

Validation
Parallel analysis, scree plot, and VSS suggested 4, 3-5, and 4 factors, respectively. Hence, we extracted matrices with 3, 4, and 5 factors to identify the best structure. The overall measure of sampling adequacy using the KMO test was 0.74, indicating appropriateness. The Bartlett test of sphericity showed significant relationship among items (χ 2 = 701.49; p < 0.001). We deleted items I-4, I-5, I-6, and I-13 as they had low loadings (<0.32) and communalities (<0.40) (also I-6 was cross loading) in the matrices and had low corrected itemtotal correlations (<0.30). We recalculated the corrected itemtotal correlations for the remaining 12 items and deleted items with correlations of <0.30-I-1, I-2, and I-3. We further performed the re-specification of factor loading for the remaining nine items.

Factor Analysis Re-specification Results
The overall measure of sampling adequacy increased to 0.81, and the strength of the relationship among items was significant (χ 2 = 483.59; p < 0.001). The appropriate number of factors to be retained as indicated by parallel analysis, scree plot, and VSS was 4, 2-4, and 3 factors, respectively. Based on factor loadings, communalities, RMSR, RMSEA, and TLI, the threefactor structure was considered a good model fit ( Table 2). Items I-10, I-12, and I-16 had low communalities, yet these items were retained in the final scale as they had significant factor loadings and corrected item-total correlation.
Based on the factor loadings ( Table 2), we grouped the final nine items under three distinct variables-symptoms due to oral mucosal involvement (4 items), symptoms due to adjacent mucosal involvement (2 items), and features observed on clinical examination (3 items). Each item in the final scale-Pemphigus Oral Lesions Intensity Score (POLIS)-was graded from 0 to 4 using simple weighting technique (14) ( Table 3), adding up to a total scale score of 36.

Reliability
Cronbach's alpha of POLIS was 0.857, suggesting that the items have good internal consistency. We used the item-deleted alpha analysis to assess whether or not the exclusion of any particular item would increase the internal consistency reliability of the scale ( Table 4). All items in POLIS had correlations higher than 0.30 ( Table 4). We used the Bland-Altman plot to estimate the inter-rater agreement as the inter-rater kappa agreement cannot be applied to continuous data. It is a scatter plot of the difference between the two measurements by raters A and B (Y-axis) against the average of the two measurements (X-axis). It provides a graphical display of the mean difference between the two raters or techniques with 95% agreement limits. All points in the plot were positioned within ± 2 SD, indicating a strong agreement between the raters (Figure 7).

DISCUSSION
Owing to the lack of validated, independent scoring system for oral lesions that can precisely reflect disease activity, a 9item scale (Pemphigus Oral Lesions Intensity Score, POLIS) was developed and validated for oral lesions in pemphigus. A three-factor structure emerged, capturing features of both patients' perspective and clinicians' assessment of disease severity. The study analyses provided adequate evidence of internal consistency, content, construct and data validity. Though validated, ABQOL does not assess clinical disease severity and is not specific for oral lesions of pemphigus. Using two scoring systems, i.e., ABQOL for assessing quality of life and PDAI/ABSIS for clinical disease activity increases the time required for evaluation. The strength of the new scale is that it includes 6 items which directly addresses quality of life impairment due to oral erosions and 3 items for assessing clinical disease severity.
A significant strength of this study is the use of qualitative methods to generate items for evaluating oral lesions in pemphigus. The Delphi method was used for initial item generation as it minimizes the biasing effects of dominant individuals, irrelevant discussions, and group pressure toward agreement as is seen in group discussions (15). The Internetbased Delphi questionnaire was chosen to reduce the time, costs, problems in communication, and participant attrition. A structured questionnaire was used in all the rounds to save time and decrease the dropout rate. The study cohort included both new and follow-up patients incorporating a relatively large set of pemphigus patients with most of them having mild to moderate disease burden.
In POLIS, patient-perceived symptoms like difficulty in eating, brushing teeth, speaking, swallowing, and mouth opening were found to be important in grading severity of oral lesions. While the symptoms associated with eating had been addressed in oral ABSIS subscale, symptoms associated with other activities mentioned above were not included. Gingival erosions are often associated with difficulty in brushing rather than difficulty in eating. Extensive or deep erosions over the buccal mucosa particularly over the retromolar trigone and labial commissures are associated with difficulty in mouth opening. Difficulty in speaking and swallowing indicates involvement of the larynx, pharynx, and esophagus (adjacent mucosae) and thereby disease extent and severity.
The mucosal component of PDAI includes the number and size of erosions at predefined anatomical locations while the oral involvement score of ABSIS includes the number of sites involved in the oral cavity and symptoms of discomfort associated with eating or drinking. We observed that these variables were not adequate to capture the clinical variability of oral erosions over time (Figure 6). POLIS includes size and depth of erosions and accounts for these changes.
Along with the number of sites involved in the oral cavity, POLIS also includes the number of erosions, further categorizing them as superficial or deep. Presence of any number of deep erosions was considered as severe disease as they are relatively refractory to treatment (2). By incorporating both size and number of erosions, POLIS effectively assesses the clinical severity of oral lesions in PV where erosions tend to coalesce and extend particularly over the gingival margins. The size and number of erosions were graded at intervals of 10 cm and 10 erosions, respectively, which makes POLIS simpler and comfortable to use. POLIS showed better correlation with oral ABSIS than with mucosal PDAI, probably owing to assessment of patientperceived symptoms along with clinical severity in the oral ABSIS subscale.
Considering the drawbacks of the existing scales and the use of standard statistical methods in the development and validation of POLIS, it can be considered as a useful tool for the accurate assessment of severity of oral lesions in PV. Limitations of the study include study cohort from a single center, absence of data regarding usability, and time taken to administer the scale. Further studies evaluating the correlation between POLIS, ODSS, and ABQOL are required.

CONCLUSION
Due to the remarkable clinical variability of oral lesions in pemphigus, several parameters are needed to precisely reflect disease activity. Given the absence of disease severity scoring systems developed using standard and objective methods at the time of initiation of our study, we report a new disease severity scale for assessing severity of oral lesions in pemphigus. It combines both clinical disease severity and quality of life assessment, thus measuring the severity of oral mucosal PV holistically. The scales' responsiveness to change is being assessed, and the results will be published subsequently in another study.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by Institutional ethics committee Post Graduate Institute of Medical Education and Research #INT/IEC/2017/948. The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
TS and KK: had full access to all the data in the study and took responsibility for the integrity of the data, and the accuracy of the data analysis. TS, DD, SG, and KK: study concept and design. TS, DD, SH, RM, SG, and KK: acquisition, analysis, or interpretation of data. TS: drafting of the manuscript. DD, SH, SG, KK, and RM: critical revision of the manuscript for important intellectual content, administrative, technical, or material support, and study supervision. KK and SG: statistical analysis. All authors contributed to the article and approved the submitted version.