The Geospatial Distribution of Myositis and Its Phenotypes in the United States and Associations With Roadways: Findings From a National Myositis Patient Registry

Background Little is known about the spatial distribution of idiopathic inflammatory myopathies (IIM) in the United States (U.S.), or their geospatial associations. Methods We studied a national myositis patient registry, with cases diagnosed in the contiguous U.S. from 1985–2011 and comprised of dermatomyositis (DM, n = 484), polymyositis (PM, n = 358), and inclusion body myositis (IBM, n = 318) patients. To assess the association of myositis prevalence with distance from roads, we employed log-Gaussian Cox process models, offset with population density. Results The U.S. IIM case distribution demonstrated a higher concentration in the Northest. DM, IBM, and cases with lung disease were more common in the East, whereas PM cases were more common in the Southeast. One area in the West and one area in the South had a significant excess in cases of DM relative to PM and of cases with lung disease relative to those without lung disease, respectively. IIM cases tended to cluster, with between-points interactions more intense in the Northeast and less in the South. There was a trend of a higher prevalence of IIM and its major phenotypes among people living within 50 m of a roadway relative to living beyond 200 m. Demographic characteristics, rural-urban commuting area, and female percentage were significantly associated with the prevalence of IIM and with major phenotypes. Conclusions Using a large U.S. database to evaluate the spatial distribution of IIM and its phenotypes, this study suggests clustering in some regions of the U.S. and a possible association of proximity to roadways.


INTRODUCTION
Idiopathic inflammatory myopathies (IIM) are rare, lifethreatening, systemic autoimmune disorders characterized by chronic proximal muscle inflammation and weakness. The most common IIM are dermatomyositis (DM) with characteristic skin rashes of Gottron's papules and heliotrope rash, polymyositis (PM) with an immune attack on myofibers, and inclusion body myositis (IBM), with progressive weakness and muscle atrophy in older patients (1). The overall prevalence of IIM has been estimated to range from 14.0-21.4 cases per 100,000 population in the United States (U.S.) (2)(3)(4)(5). Among the subgroups, the prevalence of DM has been reported as 13 cases (6) and for IBM as 5.05 cases (7) per 100,000 population. Prevalence estimates have varied, depending on the database utilized, enrollment in health systems, and coding of IIM diagnoses.
Little is known about the spatial distribution of IIM and its associations with environmental exposures (8). There is strong empirical evidence that environmental causes contribute to the development of systemic autoimmune diseases (9)(10)(11). For rheumatoid arthritis, significant regional differences have been identified in the geospatial distribution (12)(13)(14). European data suggest a latitudinal gradient of IIM, with increased risk of DM relative to PM in southern Europe, likely related to increased exposure to ultraviolet B radiation (15). An association of ultraviolet radiation exposure with DM has also been suggested in other studies (16,17).
Adult rheumatoid arthritis and juvenile idiopathic arthritis prevalence have both been linked to exposure to air pollution, identified by residential location near airports or road networks (18)(19)(20)(21). Only one study has examined air pollution as a risk factor in myositis, and found a residential association with air emissions in clinically-amyopathic DM, but not classic DM (22). Exposure to air pollutants and tobacco smoking during fetal development may contribute to juvenile DM risk (23). Increasing risk of PM was also suggested for tobacco smoking in Caucasians (24).
To further evaluate the relevance of environmental exposures to the development of IIM, we aimed to examine the geospatial characteristics of IIM and its subgroups in the U.S. using a national patient registry. The geocoded locations of cases were analyzed using the methods developed in spatial point processes. We also examined whether the prevalence rates of IIM and its subgroups are associated with distance to roadways, with the hypothesis that air pollution exposure from road traffic may be involved in pathogenesis.

Study Population, Design, and Data Source
We extracted data from a U.S. national myositis patient registry database, called 'MYOVISION' (17,25) containing IIM patients who enrolled between December 2010 and July 2012 by completion of a patient questionnaire after signing informed consent. The present study cohort consisted of 1,247 adult and juvenile DM, PM, and IBM cases, diagnosed between 1985 and 2011, residing in one of the 48 contiguous U.S. states where road network locations were available. The present study was restricted to patients who had the same residential zip code at diagnosis and enrollment ( Table 1).

Geospatial Distribution
We used ArcGIS (26) to geocode the latitude-longitude coordinates of the zip code centroid of each patient's residence at diagnosis, to map the prevalence of the study population across the U.S. (Figure 1A). We treated case location (longitude and latitude) as a spatial point process (see Myositis Supplement Methods for details), employing the inhomogeneous J-function (27,28) to examine whether there is any clustering of myositis cases. Directions of higher concentration of IIM cases and its subgroups (DM, PM, IBM) were determined using the nearest-neighbor orientation density methods (29). Spatially adaptive bandwidth selection methods were used to address inhomogeneity in the distribution of myositis cases when evaluating spatial trends in increased prevalence (30) relative to other subgroups. The independence of IIM subgroups was tested using the mark connection function (31). Space-time separability tests (31,32) were used to evaluate the independence of spatial and temporal processes in generating cases. Based on the tests of separability between the spatial point and temporal processes using the nearest-neighbor and variogram, we concluded that a spatial point process was appropriate for the entire study period.

Air Pollution Exposure
Since the location of IIM cases and subgroups suggested various patterns of clustering, the log-Gaussian Cox process (LGCP) (33) with population density as an offset was used for assessing the effect of the air pollution exposure surrogate, distance to roads, on the prevalence of IIM cases. Two separate LGCP models, unadjusted and adjusted, were fitted first for all IIM cases, then separately for DM, PM, IBM, and for IIM cases with and without lung disease, since the spatial distributions of cases and subgroups were independent. Unadjusted models included only distance from roads as the independent variable, while adjusted models added all covariates as detailed below. The fitted LGCP models were examined using two standard approaches: (1) assessing goodness-of-fit by Monte Carlo tests; and (2) examining whether the empirical J-function lies within the 95% confidence band for the model estimated J-function (27,28). Exposure to traffic-related air pollution was approximated by the minimum distance from the residential zip code at diagnosis to major or minor road networks. We used the Topologically Integrated Geographic Encoding and Referencing (TIGER) shape files from the 2010 U.S. census to define major and minor road networks (https://catalog.data.gov/dataset/tiger-lineshapefile-2010-series-information-file-for-the-2010-censusblock-state-based-shapefi). The minimum distance to a road network was categorized as <50 m, 50-199 m, and ≥200 m. In assessing the effect of living in the <50 m or 50-199 m distance categories, we used the distance ≥200 m as a reference category.

Demographic and clinical characteristics
Minimum distance from the residential zip code to the major or minor road networks (in m)

Covariates
Geographic and demographic variables were considered as covariates. The geographic variables are the longitude and latitude of the residential zip code centroid where the patient was living at diagnosis. Demographic variables included census tract level measures of rural-urban commuting area (RUCA) codes, median household income, female percentage of population, percentage of population by age groups (7-17 years, 18-44, 45-64, and 65 years or older), and white percentage of population; and county level estimates of smoking percentage of population. The RUCA code was acquired at the census-tract level for the census years 2000 and 2010 from the U.S. Department of Agriculture (https://www.ers.usda.gov/data-products/ruralurban-commuting-area-codes.aspx). The county level estimates of cigarette smoking prevalence for the years 2000 and 2010 were acquired from a study (34) that used the data from Behavioral Risk Factor Surveillance System. All other demographic variables were acquired from the U.S. Census Bureau for the same census years. We averaged the demographics over the census years to get a unique value for each census tract (or, county) representing the whole study period represented by the patients' diagnosis dates. For all computation and graphics, we used the R computing software (R Foundation, Vienna, Austria) (35): the "spatstat" R package (36) computes statistics related to spatial point processes such as nearest-neighbor orientation density, mark connection function, J-function, and fits LGCP models, and the "sparr" R package (37) computes areas of increased prevalence of cases relative to other subgroups.
In general, the density of IIM cases and its subgroups positively correlated with the general population density (Figure 1). To examine whether a higher myositis prevalence existed in any direction, the Rose diagram in Figure 1A ( Supplementary Table S1) shows the nearest-neighbor orientation density curve, using the counterclockwise from east convention. There were more IIM cases located in the East, particularly in the Northeast, and fewer cases in the Midwest region (Figure 1A), this peak maximized at 63.4 • . Among the subgroups, DM and IBM were more common in the East (Figures 1B,D, with maximums at 24.0 • and 357.2 • , respectively) and PM was more common in the Southeast, with the maximum at 292.4 • (Figure 1C). Patients with and without lung disease have higher prevalence in the East (Figure 1E, maximum at 8.5 • , and Supplementary Table S1, maximum at 336.8 • , respectively) similar to those of DM and IBM patients.
The spatial distribution of increased prevalence of DM cases relative to PM cases (Figure 2A) shows an area in the West (San Francisco Bay area) and an area in the Northwest of the U.S. (Seattle-Tacoma-Bellevue Metro Area) that had statistically significant excess and lower prevalence of cases, respectively. The Seattle-Tacoma-Bellevue Metro Area also had statistically significant lower prevalence of cases of DM relative to IBM (Figure 2B). Similarly, relative to cases without lung disease (Figure 2C), the Dallas-Fort Worth-Arlington Metro Area within Texas and the Seattle-Tacoma-Bellevue Metro Area within Washington had a statistically significant excess and lower prevalence of cases with lung disease, respectively. There were no areas with significant excess or lower prevalence of PM cases relative to IBM.
The inhomogeneous J-function estimates for the myositis cases in the U.S. after adjusting for population density ( Figure 3A) were consistently <1, indicating that the myositis cases in the U.S. have a clustering pattern. The curve was almost flattened after around 120 km, indicating that the interaction between points is limited at greater distances. This clustering pattern varied by geographic region (Figure 3B), with the curves for the Northeast flattening around 65 km and for the West around 240 km. Similarly, the clustering patterns varied among myositis subgroups (Figure 3C), with interpoint distances of DM and PM cases differing from IBM cases, and with interpoint distances differing between cases with lung disease compared to cases without lung disease (Figure 3D).
The average density of IIM was ∼3.8 cases per 10,000 square miles within the 48 contiguous U.S. states, and by subgroups, the density of cases per 10,000 square miles was 1.6 for DM, 1.2 for PM, and 1.0 for IBM. There were 1.0 cases with lung disease per 10,000 square miles and 2.9 cases without lung disease. After adjusting for the effect of average population density in an LGCP model, the density of IIM was ∼2.6 per 10,000 square miles and the prevalence of IIM was ∼2.8 cases per 1 million population. These estimates from LGCP model have taken into consideration that the areas with high population densities are also the areas of high IIM cases. The prevalence of myositis subgroup cases per 1 million population within the 48 contiguous U.S. states was 1.1 for DM, 0.8 for PM, 0.7 for IBM, and 0.7 for IIM with lung disease and 2.0 for IIM without lung disease.
Subgroup analyses included patients with DM, PM, IBM, and patients with and without lung disease (Supplementary Figure S2). Since the four estimated mark connection functions for the pairs DM and PM (A), DM and IBM (B), PM and IBM (C), and IIM with and without lung disease (D) are within the bounds (shaded area) created by Monte Carlo simulation, this indicates that the spatial distributions of case subgroups were independent of each other.
The relative changes in log-intensity of IIM cases due to minimum distance from the residential location at the time of diagnosis to the major or minor road network, after adjusting for population density as an offset value, are shown in Table 2. From the unadjusted model for all IIM, the prevalence of cases was 2.9 times higher if living within 50 m of major or minor road networks compared to living outside of 200 m. After adjustment for geographic and demographic covariates, the prevalence of IIM cases was 3.3 times higher among those living within 50 m of the road network. The adjusted effect of living within 50 m of a road network from the subgroup analysis ranged from a 2.2-fold increase in prevalence for DM to a 6.5-fold increase in prevalence for IBM. For both unadjusted and covariate adjusted models, the effect estimates for the minimum distance of living from the road network compared to living ≥200 m were statistically insignificant. The coefficient estimates with 95% CI for all covariates in adjusted models for all IIM and subgroups are presented in Supplementary Table S2, indicating that RUCA and percentage of females have significant negative and positive effects, respectively, on the prevalence of IIM cases. While for major phenotypes, the percentage of females had significant negative effects on the prevalence of DM cases and cases with and without lung disease, whereas RUCA had significant positive effects on the prevalence of IBM cases and cases with lung disease. The model goodness-of-fit test for all myositis incidences from the global Monte Carlo test was significant (p = 0.002), indicating an overall good fit of the LGCP model to all IIM cases. The model validation results for myositis subtypes are very similar (Supplementary Figure S3).

DISCUSSION
Based on the residential data of a large U.S. national myositis patient registry, the spatial distribution of myositis cases and the population density in the U.S. have a similar pattern. After adjusting for population density, however, we consistently observed that the myositis cases have clustering patterns for all IIM, by regions, and by subgroups. The prevalence of myositis cases is higher in the Northeast of the U.S. Interestingly, the Northeast region has also seen a higher incidence of Lyme disease, and a previous study suggested that myositis can be caused by Lyme Borreliosis (38). As blood samples to confirm this etiology through serologic testing were not available in this patient questionnaire registry, subsequent studies could examine Lyme as a potential etiologic factor. The IIM subgroups DM, IBM, as well as patients with lung disease were more common in the East, whereas PM was more common in the Southeast. Moreover, when we made relative comparisons of case prevalence among the subgroups, an area in the West, located in the San Francisco Bay Area had significant excess cases of DM relative to PM. Earlier studies suggested that the increased prevalence of case of DM relative to other types of IIM was possibly related to increased exposure to ultraviolet radiation (15,17), but the reasons for the clustering seen in this study are not clear and require further investigation. In addition, we also observed an area in the South, located in the Dallas-Fort Worth-Arlington Metro Area with a statistically significant excess prevalence of cases with lung disease relative to cases without lung disease. According to the American Lung Association, the Dallas-Fort Worth-Arlington Metro Area ranked 17 th for high ozone days out of 229 metropolitan areas, 42 nd for 24-h particle pollution out of 216 metropolitan areas, and 50 th for annual particle pollution out of 204 metropolitan areas (39). We suspect that these exposures, but possibly multiple other environmental factors, may relate to the excess prevalence of cases in certain regions.
The higher prevalence of myositis cases in the Northeast is probably suggesting that the interactions between points are more intense in that region than in the South. Urbanization may be a contributing factor for the increased number of cases in certain areas. From all adjusted LGCP models, we observed a statistically significant negative effect for the RUCA, implying that rural areas had a lower prevalence of cases. As might be expected from the female predominance in myositis, we also observed a positive effect for female presence, meaning areas with a high percentage of females in the population have a higher prevalence of myositis cases in the MYOVISION study sample. These results are also reflected in Table 1, in that cases from metropolitan areas and with high proportions of females are predominant.
This study has several limitations, including first, the use of a convenience sample of U.S. IIM cases that may be nonrepresentative of the general myositis population in the U.S. This may be reflected in our prevalence estimate, which was underestimated compared to an earlier published meta-analysis result with global data that the prevalence of inflammatory myopathies ranges from 2.4 to 33.8 per 100,000 population (40). The incidence data by year may not reflect actual changes in the annual incidence of myositis, but rather a bias toward enrollment of more recently diagnosed patients and inadequate representation in the number of cases. Another limitation is the lack of specific residential street addresses (due to privacy concerns) that would have allowed us to specify 2 | The unadjusted and adjusted relative risk estimates (with 95% CI) of the minimum distance from the residential location at diagnosis to major or minor road networks of developing IIM (from the log-Gaussian Cox process (LGCP) model). latitude and longitude coordinates. However, by socioeconomic and demographic characteristic such as urbanization and gender, this study population was similar to two earlier studies (2,3). A second limitation is the use of patient registry data, with a study questionnaire completed by enrolled patients that did not include physician or medical record confirmation. In addition, serum samples were not available for myositis autoantibody testing. The study design, therefore, limited our ability to discern certain subgroups of myositis. The PM subgroup, for example, may include patients with antisynthetase syndrome and immune-mediated necrotizing myopathies (1), and thus, the spatial distribution of PM, as reported in this study, may not be fully accurate. In addition, from the patient questionnaire data, it cannot be determined if patient-reported lung disease is specifically interstitial lung disease, which would have greater prognostic importance (1). These issues should be further addressed in a subsequent study.

Type of myositis
Another limitation may be that only 2010 road network data was used for calculating the distance from the residence zip code. The major and minor road network system in the 48 contiguous states of the U.S. probably did change during the patients' years of diagnosis , but it was difficult to define an average road map for this large time period. Presumably, road networks for the census years 1990 and 2000 were less dense than noted in the census year 2010, and therefore we may have overestimated exposure levels at the time of diagnosis. The census tract level covariates were derived from the averages of 2000 and 2010 census data. The estimates of exposure effect were sensitive to cut-points used for defining exposure categories. However, the cut-points were determined with the consultation of subject matter experts and from the literature (21). Our data also showed the distribution of cases (in percentages) over exposure categories by demographic and myositis subgroups to be almost equal. Also, the study may have been underpowered to detect differences. For example, a relatively low percentage of patients lived within 50 m of a roadway network. We also did not examine other sources of pollution, such as industrial emissions, but plan to examine this in a subsequent study. Finally, although air pollution correlates with proximity to roadways, additional influential exposures, such as noise levels, lack of greenery, stress and income levels may also correlate with roadway density (41). This is the first national level study using a large U.S. database and spatial point process modeling techniques to illustrate the spatial characteristics that could be related to potential risk factors for myositis and its subgroups. Although application of spatial point processes is more common in ecology (42), it is an emerging tool in epidemiological research (43). We observed clustering of IIM and its phenotypes in some regions, and a statistically nonsignificant effect of distance from road network on myositis prevalence, but a significant effect of RUCA, possibly indicating exposure to higher level of air pollutants, consistent with previous studies that also found a significant correlation between clinically amyopathic DM with airborne pollutants (22) and urban dwellings (7). More research is needed to understand the role of proximity to roadways and air pollutants, as well as other exposures and risk factors, in the development of myositis, given the geospatial distribution of myositis patients nationwide.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by Cincinnati Children's Hospital and Medical Center. The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
MH: participated in research design, performance of the research, data analysis and in the writing of the paper. JW, JM, PF, CB, MK, BG, HB, and FM: participated in performance of the research, interpretation of data for the work and in the revising of the paper. MM: participated in performance of the research, interpretation of data for the work and in the writing of the paper. LR: participated in research design, in the performance of the research, interpretation of data for the work, and in the writing of the paper. All authors contributed to the article and approved the submitted version.