Structural Equation Model (SEM) of Stroke Mortality in Spanish Inpatient Hospital Settings: The Role of Individual and Contextual Factors

Introduction: Traditionally, predictive models of in-hospital mortality in ischemic stroke have focused on individual patient variables, to the neglect of in-hospital contextual variables. In addition, frequently used scores are betters predictors of risk of sequelae than mortality, and, to date, the use of structural equations in elaborating such measures has only been anecdotal. Aims: The aim of this paper was to analyze the joint predictive weight of the following: (1) individual factors (age, gender, obesity, and epilepsy) on the mediating factors (arrhythmias, dyslipidemia, hypertension), and ultimately death (exitus); (2) contextual in-hospital factors (year and existence of a stroke unit) on the mediating factors (number of diagnoses, procedures and length of stay, and re-admission), as determinants of death; and (3) certain factors in predicting others. Material and Methods: Retrospective cohort study through observational analysis of all hospital stays of Diagnosis Related Group (DRG) 14, non-lysed ischemic stroke, during the time period 2008–2012. The sample consisted of a total of 186,245 hospital stays, taken from the Minimum Basic Data Set (MBDS) upon discharge from Spanish hospitals. MANOVAs were carried out to establish the linear effect of certain variables on others. These formed the basis for building the Structural Equation Model (SEM), with the corresponding parameters and restrictive indicators. Results: A consistent model of causal predictive relationships between the postulated variables was obtained. One of the most interesting effects was the predictive value of contextual variables on individual variables, especially the indirect effect of the existence of stroke units on reducing number of procedures, readmission and in-hospital mortality. Conclusion: Contextual variables, and specifically the availability of stroke units, made a positive impact on individual variables that affect prognosis and mortality in ischemic stroke. Moreover, it is feasible to determine this impact through the use of structural equation methodology. We analyze the methodological and clinical implications of this type of study for hospital policies.


INTRODUCTION Prevalence of Ischemic Stroke
According to the WHO, ischemic stroke (IS) is the third leading cause of death in Western countries, and the first cause of disability in adults, in addition to having a high morbimortality load (1). In the USA alone, there are 800,000 persons every year who experience a stroke incident, either first-time or recurrent. The age-adjusted mortality rate in the most recent American studies has shown that stroke is a direct, underlying cause in 36.2 of every 100,000 exitus per year (2).
In Europe, as of today, the age-standardized incidence of stroke falls between 95 and 290 episodes per 100,000 inhabitants, with 1-month mortality between 10 and 35%; stroke represents the second leading cause of morbidity and disability (3). The present situation in Europe is rising incidence among young adults, despite the decreasing trend worldwide. Mortality is not the only parameter of interest; 33% will require readmission to hospital, 7-13% will have another episode, moderate cognitive decline will affect 35-47% and dementia, 7-23% (3). Consequently, morbidity load as well as mortality are pressor elements in this population; they have important repercussions today, and in the case of Europe, can only be expected to worsen in coming years.
In Spain, mortality due to cardiovascular causes and stroke in particular began to decline in 1973, thanks to improved attention to cardiovascular risk factors associated with greater stroke mortality, as well as to diagnostic and therapeutic advances in the earliest phases of care. Very heterogeneous values of incidence in Spain have been reported, as seen in the study by Lópoez-Pousa et al. (4). Subsequently, the Iberictus study, led by the Spanish Society of Neurology, allowed access to more up-todate, quality data, showing an incidence of 118 cases per 100,000 inhabitants per year. In-hospital mortality was also reported as 4% (5,6). Nonetheless, rising mortality rates are to be expected in the future, due to pronounced aging of the population and the increased prevalence of risk factors in an increasingly elderly population (5). Currently, ischemic stroke is the second leading cause of death in Spain in the general population and the first cause of death in women (6); according to clinical records in our country, it represents 12.9% of total deaths (7).

Risk Factors for Developing a Stroke
The risk factors associated with stroke incidence and mortality are well-known. These factors can be divided into personal factors (related to the patient, regardless of modifiability) and contextual factors, which are usually associated with availability of specific resources, shorter time to care, and the establishment of specific plans for stroke care (8,9).
The most notable, prevalent individual risk factors for developing a stroke include hypertension (HTN), Diabetes Mellitus (DM), abnormal heart rhythm (especially atrial fibrillation), hyperlipidemia and hypertriglyceridemia, liver disease, smoking, sedentary lifestyle and finally nutritional and genetic factors (2,10). Sleep apnea and certain psychosocial factors have also been associated. The factors mentioned not only increase incidence, but also subsequent mortality (11). Predictors of poor evolution include the severity of the initial stroke, measured on the National Institute of Health Stroke Scale (NIHSS) or Canadian Neurological Scale (CNS); existence of diabetes mellitus; large or pronounced drops in blood pressure; body temperature; certain coagulation markers; and inflammation and glycemia at hospital admission (12).
In addition to individual factors, there are other important prognosis factors that have seldom been studied in conjunction with the individual factors; we will call these contextual risk factors. The existence of a comprehensive plan of actionwhich maximizes and optimizes patient care from the time of hospital arrival-has been shown to have beneficial results for patients who have suffered an acute stroke, increasing their probability of recovery (13). Over the past 20 years, not only the change in preventive action, but also early, regulated response that follows the most advanced quality standards, and the creation of specific stroke care units, have been shown to bring about a significant decrease in stroke mortality and sequelae.

The Construction of Probabilistic Prediction Models
Extensive work has been done in detecting the risk factors of developing an ischemic event and in estimating the likelihood of death or of sequelae (7). Specifically, work by Smith et al. (14) produced predictive models of in-hospital mortality, whether for ischemic or hemorrhagic stroke, using a limited number of variables; excellent estimated discriminative capacity was attained. Other highly interesting work has shown a successful methodology for elaborating predictive models of stroke (15).
Since the creation of stroke units, there have been numerous studies where these units demonstrate a decrease in mortality and disability, in comparison to the administration of conventional care (Cochrane Database of Systematic Reviews, 2013). More recently, their cost-effectiveness and a shortened average length of stay have also been demonstrated (16). A large part of the literature has focused on individual prognosis  factors, while other authors have assessed isolated contextual  elements, especially the availability of stroke units. To date, there  is insufficient evidence that combines both types of variables  and explores their interrelations using a structural, hierarchical  equation methodology. Consequently, our main objective was to establish interdependent and predictive relationships among the variables that are most often identified in association with pathogenesis and development of stroke, and the main dependent variables (mortality and readmission to hospital). Specifically, and original to this study, we evaluated the role of certain process and context variables, and how they acted as intermediate, modulating variables in the non-linear relationship between predictive variables and outcome variables.

Aims and Hypotheses
In order to address the main objective, the initial hypothesis states that each individual variable defined in the linear model (primarily age, gender, obesity, and epilepsy) and each contextual variable (year, existence of stroke units) would have a statistically significant effect on the intermediate variables of the previously established linear model, whether individual variables (arrhythmias, dyslipidemia, and hypertension) or contextual (length of stay, number of diagnoses, and procedures). These in turn would have a significant effect on the two final, dependent variables, namely, readmissions, and mortality.

Type of Study
A retrospective cohort study using analytical observation of all hospital stays of the Diagnosis Related Group (GRD) 14-nonlysed ischemic stroke-during the time period 2008-2012. All hospital stays of patients age 24 or older were included.

Scope
The study was carried out within the Spanish National Healthcare System (NHS, Spain), a decentralized structure across 17 autonomous regions with their respective regional healthcare systems. Each of the Autonomous Systems has its own structure, with Basic Healthcare Zones grouped in turn into Primary Care Districts and Hospitals. This system is the same throughout the country, despite the drawback of frequent failures in inter-region communication. Healthcare within this network is free of charge; costs are borne by the different regional governments.

Information Source, Sample, and Case Selection
The source of information was the Spanish Minimum Basic Hospital Discharge Dataset, made available by the Ministry of Health, Consumerism and Social Policies. A total of 186,245 hospital stays were analyzed. Diagnostic and procedural coding followed the International Classification of Diseases, Ninth Revision, Clinical Modification (ICD9MC). The selection criteria consisted of identifying the patient stays that were discharged under DRG 14 (AP-DRG classifier, version 21). This diagnostic group includes exclusively those patients admitted for ischemic stroke who undergo medical treatment, but not fibrinolysis or mechanical reperfusion; consequently, this DRG defines a very specific, select group of patients. As in the relevant bibliography, the total group of hospital stays was then limited to patients over the age of 24, given the small incidence and prevalence of these events in younger persons. Additionally, outlier hospital stays were filtered out according to the classical method that defines outliers with the formula T2=Q3+1.5(Q3-Q1), where Q identifies the third and first quartiles and T2 is the maximum value of the stay that results from applying the formula. Using this methodology, patients with stays longer than 21 days were identified and excluded.

Procedure
This project has been approved by the Clinical Ethics Committee of the Province of Almeria, Complejo Hospitalario Torrecardenas, Andalusian Health Service, Ministry of Health, Andalusia (Spain).

Variables and Analysis Schema
The schema of analysis identified two axes for studying relations and associations between variables. On one hand, variables were classified into two large dimensions in each episode: individual and context dimensions. The context variables were identified as year, existence of a stroke unit, length of stay, total count of diagnoses and procedures at discharge, and any readmissions; the remaining variables were considered individual variables ( Table 1). On the other hand, our second axis of analysis classified variables as independent variables, intermediate/process variables, or outcome/dependent variables-regardless of the dimension to which they belonged.
The main dependent variable in the individual dimension was in-hospital mortality. Secondarily, readmissions were also analyzed as a dependent variable in the context dimension. According to the second axis of analysis, both individual and contextual variables were classified as outcomes (exitus and readmission), intermediate or process variables (arrhythmias, dyslipidemia, HTN, length of stay, NDX, and NPR) or initial variables (age, gender, obesity, epilepsy, year, stroke unit) ( Table 1). One must keep in mind that the variables that make up the secondary diagnoses cannot always be identified differentially as complications that occurred during hospitalization or as preexisting patient comorbidities, such as epilepsy.
In order to make the Year variable (6 categories) more homogeneous, the derived variable "Year Gp" was obtained by establishing three bienniums.
Sociodemographic information was obtained from the variables year, age, gender, and Autonomous Region. Administrative elements were assessed through the variables length of stay, readmission within 30 days for the same DRG, type of admission (emergency vs. scheduled), and type of discharge (alive vs. exitus). We used the number of diagnoses at discharge (NDX) as a proxy variable for the patient's comorbidity, and the number of procedures at discharge (NPR) to estimate the procedural complexity of each episode and the main clinical comorbidities associated with ischemic processes ( Table 2).  For each of the hospitalization episodes, the total number of diagnoses was calculated (including both new comorbidities and pre-existing comorbidities at the time of admission) and coded into 14 fields of variables assigned for that purpose. In this way, diagnosis number 1 is the one that motivates the admission and the rest of the diagnoses are recorded sequentially, some as derivatives of others, until completing the entire spectrum of pathology that existed in each event.

Statistical Analysis
For the statistical analysis, variables were treated as follows, according to the dimension being analyzed: (1) first, the initial variables were the independent variables (IV), and the process and outcome variables were dependent (DV); (2) second, the independent process variables were the IV and the outcome variables exitus (death) and readmission were the DV.
Two types of analysis were carried out in order to determine which variables to include in the structural linear model. First, bivariate analysis was carried out; Student's t-test was used to test the equality of means hypothesis for independent samples or analysis of variance. In cases where they could not be applied, the Mann-Whitney or Kruskall-Wallis non-parametric U was applied, as appropriate. The Chi-square test was used for comparison of qualitative variables. Relationships between quantitative variables were determined through Pearson or Once the variables were identified, the empirical model of structural equations was finally developed. AMOS (v. 23.0) for Windows was used to construct the structural prediction model-specifically, to verify the structural linear prediction hypothesis (path analysis). To interpret the confirmatory factor analysis (CFA) and the structural equation model (SEM) fit, we focused on the Comparative Fit Index (CFI) and the Root Mean Square Error of Approximation (RMSEA). CFI values equal to or greater than 0.90 and 0.95, respectively, were taken to indicate acceptable and close fit to the data (17). RMSEA values equal to or below 0.05 and 0.08 were taken to indicate close and acceptable levels of fit, respectively (18). Keith (19) proposed the following beta coefficients as research benchmarks for direct effects: less than 0.05 is considered too small to be meaningful, above 0.05 is small but meaningful, above 0.10 is moderate, and above 0.25 is large. For indirect effects, we used Kenny's (20) definition of an indirect effect as the product of two effects; using Keith's benchmarks above, we proposed a small indirect effect = 0.003, moderate = 0.01, and large = 0.06, values that are significant in the sphere of education.

Basic Descriptive Results
The sample was composed of 186,245 hospital stays between the years 2008 and 2012. There were a total of 12,800 exitus during hospitalization. Over the study period, the death rate declined from 7.3% in 2008 to 6.5% in 2012, for an average rate of 6.9% for the whole period. Mean age of the sample was 79.92 (SD 12.54) years, with a mean hospital stay of 7.54 (SD 4.54) days, and 3.27 (SD 2.45) was the mean number of procedures applied. The mean number of diagnoses at discharge was 6.91 (SD 2.95), and 4.8% of the sample were in a readmission situation under the same DRG. Table 3 shows the distribution of the main variables by year.

Inferential Relations Among Variables
First, we assessed the effects of the individual-related IVs (age, gender, obesity, and epilepsy) and context-related IVs (year and existence of a stroke unit) on the intermediate individual variables (arrhythmias, dyslipidemia, and HTN) and intermediate contextual variables (length of stay, NDX, and NPR).
Findings showed a significant effect from each of the IVs (both contextual and individual) on the intermediate variables mentioned, except in the case of gender. No statistically significant, main interaction appeared. There were also numerous significant partial effects of each independent variable on the dependent variables; these are marked with an asterisk to the right in Table 2.
Afterward, we analyzed the effect of the individual IVs and the contextual IVs on the main dependent, individual variable (exitus). Uni-and multi-variate analyses showed a significant main effect of all the individual and contextual factors mentioned, except gender; in addition, evidence showed that the discrete factors with the greatest effect on mortality were age and epilepsy, followed by the existence of stroke units. Moreover, certain variables produced several significant interaction effects on mortality (Table 4), with the greatest observed power detected for the interactions of (Year * Stroke unit), (Age * Epilepsy * Year), and finally the interaction of (Year * Gender * Obesity * StrokeUnit), the latter demonstrating great explanatory power. The most relevant variable, common to two of the interactions detected, was the existence of stroke units. Likewise, the intermediate or process variables, whether related to the individual (arrhythmias, dyslipidemia, and HTN) or to the context (length of stay, NDX, and NPR), had a clear effect on mortality as DV.
By observing the pathologies coded for each hospital stay, we detected a significant main effect from multiple intermediate (or mediating) variables on mortality. This effect was shown for arrhythmias, dyslipidemia, and hypertension; however, the most powerful interaction in determining exitus was the joint effect of the interaction (Dyslipidemia * Hypertension) ( Table 5).
Regarding the discrete contextual variables analyzed, those with the greatest effect were length of stay and NPR, along with readmissions. However, the variables with the greatest explanatory power were the interactions of (Length of Stay * NDX * NPR), (NDX * NPR * Readmissions), (Length of Stay * NDX * Readmissions), and (Length of Stay * NDX * NPR * Readmissions).

Linear Relations of Structural Prediction
The results of structural analysis or pathway analysis (SEM) showed an acceptable model of relationships. The relationship parameters of both models are presented below ( Table 6).

Standardized Direct Effects
In the case of the personal variables, the predictive linear model establishes that the variable GENDER was predicted by AGE (0.259). OBESITY was negatively predicted by AGE (-0.111) and positively by GENDER (0.078). EPILEPSY was positively predicted by GENDER (0.008), and UNIT was positively predicted by AGE (0.013).
The variable ARRHYTHMIAS was significantly predicted by AGE (0.     Table 7 shows the direct effects of the variables included in the model.

Standardized Indirect Effects
The model also revealed multiple indirect predictions among the variables. With respect to personal variables, the predictive linear model establishes that AGE was a positive, significant, indirect predictor of OBESITY (0.020). The variable EPILEPSY was not predicted indirectly by any other variable.     Table 8).

Graphic representation of the structural model
The final model is graphically represented in Figure 1.

Empirical Evidence
This investigation began with the hypothesis that each variable defined in the linear model, whether individual or contextual, would have a statistically significant effect on the intermediate variables of the established model, at the individual level and at the contextual level. These in turn would have a significant effect on the two dependent outcome variables, namely, readmissions and mortality. This hypothesis was in large measure confirmed, having verified in our SEM model that the individual variables made a differential, statistically significant impact on the intermediate (mediating) variables, and these in turn on exitus. This is not an every-variable-to-every-variable relationship; the particular predictions are made explicit below, as well as some paradoxical relationships that deserve a detailed explanation. The inferential results presented here show effects from combined variables, similar to what has been reported with prior evidence. The clearest effects were produced by the combination of multiple variables.

Individual Variables as Predictors
As seen in other studies, different variables were found to be statistically significant predictors of the presence of arrhythmias as comorbidity in this group of patients. In this context, arrhythmias were significantly, positively predicted by age (21,22), obesity (23,24) and the presence of epilepsy among the secondary diagnoses. Some of these linear associations were known previously, but had not been demonstrated to date using a predictive structural model. The literature reflects an association between epilepsy and arrhythmias, whether direct or mediated by antiepileptic treatment (25)(26)(27). In the same way, age was associated with the presence of dyslipidemia (28) and HTN (29). The association found between epilepsy and dyslipidemia is consistent with the known effect on lipids from treatment with certain anti-epileptic drugs (30,31).
One paradoxical result is the negative prediction of dyslipidemia as a function of age. A possible explanation would be that stroke-affected patients suffer from vasculopathy and often arteriopathy; they are affected by different types of pathologies that are treated fundamentally with statins. Prior research has demonstrated that the use of statins increases with age. Thus, age might be negatively associated with dyslipidemia through the use of this pharmacological group in the type of patient most prevalent in this study: older people with a background of cardiovascular pathology.
On the other hand, gender (being a woman) positively predicts arrhythmias and HTN, but not dyslipidemia. The association between the female gender and the existence of certain types of arrhythmias is well-documented (32), and probably accounts for our findings. However, the limitations of our information source (CMBD) do not allow us to identify the subtypes of arrhythmias prevalent in our study sample. As for HTN, it is known to be more prevalent and more associated with men at younger ages than women, but in the situation that concerns us, several elements might explain an association with the female gender. On one hand we are working with patients affected by an ischemic  stroke and not the general population; on the other hand, the more senile sectors, with higher prevalence of HTN, are also mostly female in our sample and in the general population, due to the longer life expectancy of women. Another noteworthy result is the positive predictive role of epilepsy with respect to HTN. According to the established literature, HTN is an obvious, crucial risk factor for ischemic stroke, in the same way that stroke itself is a risk factor for developing epileptic crises. We may then suppose, in full agreement with other authors (33), that the relationship between HTN-especially if not properly controlled-and epilepsy can also develop directly, that is, even prior to development of an ischemic event.
These results concur with prior medical evidence showing that age positively predicts arrhythmias (21,23) as well as HTN (29). Although the evidence is not as clear, age influences hemodynamic regulatory mechanisms, which in turn have consequences in blood pressure and brain self-regulation (29). A paradoxical result is the negative prediction of dyslipidemia.
Obesity, for its part, negatively predicts arrhythmias, but positively predicts dyslipidemia and HTN. A consistent model of obesity as a positive predictor of dyslipidemia and HTN is evident and well-documented (34,35); this falls in line with the relatively new concept of obesity as a chronic, recurring, progressive disease, as suggested by Bray et al. (36). Finally, in our understanding to date, there seems to be no clear association of obesity and arrhythmias, or at the least, it would occur through some mechanism not yet understood.
There are substantial associations between variables of individual characteristics and the two main dependent variables (readmissions and exitus); in a few cases, they seem paradoxical or difficult to explain, thus indicating a need to investigate some of the predictive effects that were found. Both dyslipidemia and NPR are significant, negative predictors of readmissions, while NDX is a significant, positive predictor. Increased procedural effort or therapeutic intensity can explain the direction of the NPR-Readmissions prediction, such that where greater effort is applied, there is less likelihood of being readmitted to hospital for the same reason in the 30 days following discharge. Similarly, when patients have a greater number of diagnoses (greater comorbidity), prediction of readmissions is positive, demonstrating that the patient's overall complexity undoubtedly influences his or her prognosis.
Elsewhere, the evidence showed dyslipidemia and NPR as negative predictors of mortality, while age and the existence of arrhythmias were positive predictors. It seems logical that more elderly patients, and patients affected by arrhythmias (also more frequent at advanced ages), would have greater mortality. The negative association between dyslipidemia and mortality, to our understanding, can only be understood in that dyslipidemic patients receive greater procedural effort, and probably undergo more frequent medical checks. This assertion is supported by the direct, significant, negative prediction that occurs between the number of procedures applied, and mortality.
To complete this section, we must make note of the central, core prediction between the two dependent variables. Just as each different individual variable on its own has been related through different mechanisms to each of the dependent variables, there is an obvious, significant, and very powerful prediction between readmissions and mortality. This association has been cited in many studies on a variety of pathologies, and we believe it lends even greater biological plausibility to the structural model (37)(38)(39).

Contextual Variables as Predictors
There were also statistically significant effects from the contextual variables.
Year was confirmed to have a negative effect on length of stay and on in-hospital mortality. The effect on mortality was mediated by NDX and NPR, variables that in turn depend directly on the existence of stroke units and the ongoing creation of such units during the study period. The period analyzed in this study was a time of marked change, where improved stroke care, both in therapeutic terms and in organization of care, prompted a drop in average length of stay and in short-term mortality-and consequently in in-hospital mortality, which we are analyzing here (16,40).
In this context, where there is higher patient comorbidity (with NDX as the proxy variable for comorbidity), there are higher levels of 30-day readmissions, and secondarily, there are the above-mentioned increases in mortality. As for NPR, considered a proxy variable for the degree of therapeutic effort applied to the patient, we find that with greater effort, there is a decrease in readmissions and in mortality. Both variables are closely related to the existence of stroke units, such that procedural effort is objectively greater within these units than in conventional hospitalization (41).
Although the moment in time (Year) predicted shorter hospital stays, within stroke units there was greater likelihood of longer stays throughout the whole study period. The most important effect found was that the existence of stroke units positively predicted length of STAY, as commented. These units admit the more complex patients in particular (greater comorbidity or NDX), and apply greater therapeutic and procedural effort (higher NPR), which would explain the decrease in both mortality and in readmissions; according to other authors, however, a paradoxical effect can occur due to the patient's own complexity (42,43). The contextual variable that most clearly affects decreased mortality is procedural effort (NPR), which in turn is higher in stroke units and in patients with greater complexity (NDX); both of these variables (diagnoses and procedures) are associated with the stroke units themselves, due to the type of cases that are admitted in these units (44).
Another noteworthy result is the predictive effect of individual mediating variables on context variables. The most interesting result, from the point of view of how the healthcare system affects disease in subjects, is that the number of DIAGNOSES negatively predicts length of STAY, but positively predicts an increased number of PROCEDURES. This may be interpreted as more complex patients having shorter hospital stays because of the high levels of accumulated mortality in this group. The patient's diagnostic complexity (NDX) itself would lead to greater procedural effort (NPR), but there may also be mechanisms that limit therapeutic effort at the most advanced ages (45). In any case, according to our criteria, the model has the capacity to explain these complex associations that are made evident through structural models and that underlie the clinician's thinking and the physiopathology of disease in a stroke.

Clinical Implications
Regarding the importance of the proposed illustrated algorithm, the present analysis yields an empirical model that incorporates a macro and micro view of predictive relationships between the independent, mediating, and outcome factors of the subjects' health in interaction with the contextual, organizational factors. In our view, this model has unquestionable epidemiological value for revealing probabilistic predictive relationships between personal and contextual factors, thereby enabling healthcare organizations to understand and make decisions regarding the detection of diseases that bring increased likelihood of others. It also enables large-scale assessment of the adequacy of resources deployed as a function of the pathologies analyzed, opening the way to cost-benefit analyses. Some previous analyses have contributed evidence in this line of work, using different methodologies (8,46).
The results of this study are also relevant from the point of view of clinical management. Attention to the value of contextual elements (mainly managerial and organizational elements like stroke units) would unquestionably contribute to improved clinical care for the patient and to organizational efficiency itself. An understanding of how the individual and contextual elements of stroke are related to each other gives us a broad, ambitious view of this scenario, now supported by a structural model that provides empirical evidence, in contrast to the formerly fragmented or nonexistent evidence in prior contributions to our understanding of this disease.

Methodological Contributions
Contributions from this type of analysis of large clinicaladministrative databases are obvious. First, this approach goes beyond the classic, correlational methodology that establishes covariation relationships between study variables but has many limitations with respect to establishing causal relationships. In fact, certain prior studies have shown that when empirical models are based on associations between variables, and an SEM model is later developed, some of the previous association relations are not sustained in the new structural model, because of accumulated measurement errors. Second, while carrying out prior inferential analyses ensures that interdependence (or causal) relationships between variables are consistent, this type of analysis is unable to present such relationships in a combined, multidirectional manner, but only as limited to each multivariate analysis. Third, the SEM model makes it possible to establish structural multi-directionality of causation through path analysis. Consequently, this type of analysis would be appropriate to an R&D&I Department (47) within the hospital context, where it would be possible to test the efficiency of hospital interventions and healthcare resources (48,49).

Limitations and Prospects
A first methodological limitation is that no latent variables have been defined in the model. Latent variables can establish a generic relation between constructs, but not the specific ones that we wanted to find. In our case, we have tried to define the causal relationships between observable variables. From our point of view, this precise relationship is very important.
The data are taken from non-lysed strokes. The clinical situation today is a different one (intravenous fibrinolysis and mechanical thrombectomy), where the role of the stroke unit is even more critical. However, given the high prevalence of this subtype of stroke (ischemic and not subject to reperfusion), we think that establishing a predictive empirical model with personal and hospital variables is of great relevance. It would be interesting to replicate the study with patients who have received treatment for acute stroke, when enough data become available. We believe that the future inclusion of patients subjected to mechanical or chemical reperfusion would probably modify the outputs in the sense of less sequential morbidity, decreased length of stay, and lower mortality in stroke units and even in general hospitalization. We also consider that the contextual dependent variable "readmissions" would be favorably diminished by the inclusion of these new therapeutic techniques. Even in the case of nonlysed stroke, this replication in a real cohort would make it possible to simplify and further divide up the elements of the final model. We could learn more precisely which elements might be implemented in routine clinical care in order to optimize outcomes.
Working with these massive clinical-administrative databases has the advantage of the great statistical power of a large sample size, but such databases are not free from significant drawbacks. On one hand, the data reflect the in-hospital situation exclusively, possibly leading to an external validity issue; in our particular case, acute stroke patients are rarely addressed on an outpatient basis, so we consider this bias to be minimal. We also must consider that the information is limited by the quality of the diagnostic and procedural codings themselves, and that this quality is rather uneven, not only geographically (different healthcare regions) but also over time (the study period), fortunately the latter tends toward improvement.
In addition, we must consider the limitation that variables such as "epilepsy" imply, where we cannot identify whether it is occurring as a result of the stroke or whether the patient has suffered this pathology for some time. This obvious database limitation in not differentiating certain secondary diagnoses as complications or as comorbidity is only partially compensated by the high sample size and the diagnostic position: epilepsy encoded in the second diagnostic position is understood to be an acute complication, while in lower positions it is more likely to be a preexisting comorbidity.
Finally, we must take into account the very critical patients who die shortly after admission: their chronic pathologies are often under recorded, possibly distorting the statistical results and even provoking paradoxical results. The wellknown Jencks bias, a phenomenon described in Jencks et al. (50), has been confirmed in multiple studies. Studies by Dahlin et al. (51) are most noteworthy, where under recording was proven to be a constant, even when as many as 25 diagnoses had been reported upon discharge. For all these reasons, such biases in the information source are difficult to control, but given the sample size, power and level of detail, this source provides extremely valuable information for patient care and for improved organizational management.

AUTHOR CONTRIBUTIONS
JdlF development of the conceptual idea. Statistical methodology Global drafting of the manuscript. JG-T general review of the manuscript. Introduction and review-writing of the discussion. MI-E general review of the manuscript and partial wording of the discussion. GS global review of the manuscript. Statistical review. AG-U review of the design evaluation process used. JF-P review of the global English level of the manuscript.