Individuals with long-term illness, disability or infirmity are more likely to smoke than healthy controls: An instrumental variable analysis

Despite the prevalence of smoking cessation programs and public health campaigns, individuals with long-term illness, disability, or infirmity have been found to smoke more often than those without such conditions, leading to worsening health. However, the available literature has mainly focused on the association between long-term illness and smoking, which might suffer from the possible bidirectional influence, while few studies have examined the potential causal effect of long-term illness on smoking. This gap in knowledge can be addressed using an instrumental variable analysis that uses a third variable as an instrument between the endogenous independent and dependent variables and allows the identification of the direction of causality under the discussed assumptions. Our study analyzes the UK General Household Survey in 2006, covering a nationally representative 13,585 households. We exploited the number of vehicles as the instrumental variable for long-term illness, disability, or infirmity as vehicle numbers may be related to illness based on the notion that these individuals are less likely to drive, but that vehicle number may have no relationship to the likelihood of smoking. Our results suggested that chronic illness status causes a significantly 28% higher probability of smoking. The findings have wide implications for public health policymakers to design a more accessible campaign around smoking and for psychologists and doctors to take targeted care for the welfare of individuals with long-term illnesses.


. Introduction
It is well-documented that smokers have poorer physical and mental health than nonsmokers, as measured by several subjective and objective indicators (1,2). Numerous studies have examined the link between cigarette smoking and certain types of illnesses, impairments, morbidity, and low health-related quality of life, suggesting that smokers .
are more likely to get lung cancers, cardiovascular diseases, and low subjective health than non-smokers (3,4). The present research also indicates that individuals with long-term illness, disability, or infirmity (LSI), defined as conditions for chronic sickness in the 2006 British General Household Survey, could be more prone to be smokers and less likely to undergo smoking cessations or screenings for tobacco use than healthy people (5)(6)(7). From a psychological perspective, high levels of substance abuse could be attributed to the higher subjective utility smokers receive relative to the perceived health drawbacks of tobacco consumption (8-10). Specifically, for people with LSI, smoking is a common strategy to cope with the social isolation they face alongside the mental health issues that come with it (11,12). In the US, people with LSI had a substantially higher smoking rate than the national average of 12.5% in 2020. Nearly 20 of every 100 American adults with a physical disability (19.8%) are smokers, whereas only 11.8% of their counterparts without disabilities smoke (13). Beyond the descriptive statistics, recent research has empirically examined the smoking prevalence in persons with activity limitations using a regression framework (14,15). For example, using logistic regression on cross-sectional survey statistics in the US, Fitzmaurice et al. (14) found that individuals who reported mobility limitations were more likely to be overweight and to become smokers. The other study confirmed the finding that disabled Americans smoked more cigarettes per day than people without disabilities and also found that they were less able to fully participate in all preventive services offered (16). Similar results were found in the UK. British adults with physical impairments smoke 20 or more cigarettes per day than their counterparts while holding age and gender constant (17). Other than the Western evidence on smoking addiction among disabled people, previous studies suggest a similar effect in Asia. Lee et al. (18) revealed that people South Koreans with physical impairments listed on the National Disability Registration System had higher smoking rates and were less likely to quit smoking than the general population in South Korea.
In brief, the literature reviewed above suggests a correlation that individuals with physical disabilities have higher smoking rates than healthy individuals. Such association has been found in various geographical locations and across cultural contexts. Nonetheless, existing studies fail to address reverse causality. Reverse causality is when there is a bidirectional relationship between the dependent and independent variables. Smoking has been linked to the deterioration of people's physical condition and consequently restricts individuals' daily activity (16,19,20). Conversely, individuals' mobility-restricting conditions might also trigger smoking behavior, which leads to a bidirectional association (21). When ignoring reverse causality, the estimated correlation might be biased (22,23). Moreover, with limited covariates included in the studies reviewed above, there might be Omitted Variable Bias (OVB) when the model fails to control for any variable that is correlated with both outcome and independent variable. With the presence of the OVB, the regression estimate will be systematically biased (24). Hence, the correlations suggested by the studies reviewed above revealed does not imply a causal link between physical disability and smoking.
Our research aims to examine the causal relationship between LSI and smoking, namely whether LSI people are more likely to smoke. Identifying causal links between long-term sickness or disability and smoking addiction is an important issue. If smoking is the cause, the smoking itself could be the primary target of effective intervention; conversely, if smoking is the result, it may be more effective for smoking cessation programs to target the identified causes, including LSI. Individuals with chronic diseases or disabilities face numerous obstacles to obtaining smoking cessation programs and other public health campaigns (25). These obstacles include lack of patient adherence, accessibility (e.g., transportation), and knowledge on how to accommodate people with disabilities, infirmity, and long-term illness, some of which may not be visible (26,27). Hence, although individuals with disabilities have a smoking prevalence that is 50% higher than that of the general population, the types of tobacco cessation measures offered were not available to more than 40% of smokers with impairments who had sought assistance (6).
Using the instrumental variable approach, we aim to fill the current literature gap and study if individuals with LSI are more likely the current smokers. The instrumental variable method has been widely used in economics to evaluate causal effects when reverse causality is present (28,29). A variable is exogenous if it does not correlate with unobserved factors in the regression and is endogenous if such a correlation exists. The instrument is an exogenous variable that correlates with the potentially observed endogenous regressor and only influences the dependent variable through the potentially endogenous regressor. In other words, the instrumental variable only impacts the response variable indirectly through an endogenous variable, and the response variable has no relationship with the instrumental variable (30, 31). As such, it avoids the bidirectional problem in the current literature studying the associations between chronic illness and smoking (14)(15)(16). Herein, we implemented the instrumental variable approach with the number of vehicles owned as an instrumental variable. The vehicle numbers can be related to LSI based on the notion that these individuals are less likely to drive; meanwhile, those vehicle numbers have no relationship to the likelihood of smoking after controlling a range of social demographic factors (32). We hypothesize that individuals with long-term illness, disability, or infirmity are more likely to smoke compared to those healthy ones. The data used in this study were collected from the General Household Survey (GHS), which is a repeated cross-sectional study that was conducted annually (except for 1997-1998 and 1999-2000 due to significant survey redevelopment) by the Office for National Statistics in the UK starting in 1971. The GHS is now known as the General Lifestyle Survey after it was renamed in 2008 and incorporated into the Integrated Household Survey. The GHS aims to provide different government departments and organizations with data regarding an array of characteristics of British private households for monitoring and policy purposes, as well as present a general picture of the lives of people in the UK. The factors collected by the survey include both general private household information and individual traits. General private household information such as the number of household members, while individual traits consist of factors such as education, individual income, and marital status. For the current analysis, we used cross-sectional data collected for the GHS in 2006, the latest GHS available for access. A sample of ∼13,585 households was randomly selected, and interviews were subsequently conducted with all adults aged 16 and above in every responding household (33). A total of 9,731 households consisting of 22,924 individuals responded to the interview invitation (including full and partial interviews) for a response rate of 73.1%. 5,666 samples are excluded from analysis because they are less than 18 years old. In the United Kingdom, a person must be 18 or older to purchase cigarettes, hence smoking-related questions are not given to anyone under the age of 18. Then we adopted the complete case analysis on remaining 17,258 observations by removing 3,925 ones with missing values for any selected variables. We have a final sample size of 12,297 from the GHS in 2006, which is equivalent to 78% of the original sample.
A total of 12 relevant variables of household and individual characteristics were chosen from the GHS in 2006 for the purpose of this study, with a focus on three particularly important variables. First, the outcome is a binary variable that indicates whether an individual currently smokes, denoted by smoking. This measure was coded as one if an individual was a smoker and a value of zero otherwise. Second, our independent variable of interest, illness, was coded as one if an individual was suffering from chronic sickness (long-term illness, disability, or infirmity) that limits an individual's daily activity and a value of zero otherwise. Third, the instrumental variable is the number of vehicles (cars and vans only) owned by the household where the individual is from is denoted by the number of vehicles. Nine other variables are demographic and socioeconomic factors, including age, sex, marital status, number of family units in the household, minority, natural log of individual income, natural log of household income, socioeconomic group, and education level. These variables are used as control variables in the econometric analysis in a later section. Among these variables, it is important to note that age, socioeconomic group, marital status, and education all consist of more than one binary indicator to identify different groups within a single variable; that is, they are factor variables. We also present the analysis stratified by the factor variables.
The summary of descriptive statistics is presented in Table 1. Among the 12,297 individuals, ∼40.7% suffered from long-term illness, disability, or infirmity. Given that the variable of sex takes the value of one if an individual is male and takes the value of two if an individual is female, it can be seen from the descriptive data that ∼54.5% of the sample in this study were female. In addition, 4.8% of the sample were considered ethnic minorities (i.e., nonwhite) in the UK. In terms of the proportion of individuals who smoked, the data from the GHS in 2006 indicated that 20.6% of the sample were current smokers, though the average smoking rate decreased to around 14% in 2020 (34). Finally, the individuals from the sample owned an average of at least 1.4 vehicles, and the mean of family units in the household was 1.08.
Before quantitatively analyzing the causal impact of LSI on the possibility of smoking, we note in Table 2 that the proportion of smoking does not differ significantly according to the mean and odds ratio. However, one could not draw a causal conclusion from the raw difference in means or codes ratios due to the presence of the reverse causality and OVB we discussed earlier in the introduction section. Hence, an instrumental variable approach will be applied in the later section to determine the causality between illness and smoking.

. . Methods
We aimed to investigate the causal effect of whether individuals with LSI were more likely to smoke than those without such conditions. By controlling for a range of demographic, socioeconomic, educational, medical and familyrelated control variables, the following model is first proposed. For each individual i, where smoking is the binary variable of smoking status in 2006; illness indicates whether an individual suffers from longterm illness, disability or infirmity; c 1 is a constant; and X' denotes the set of controls. α 1 is expected to identify the effect of long-term illness, disability, or infirmity on smoking. This    indicates that smoking may be a function of illness; however, it is also possible that the opposite is true.
That is, Equation 2 shows how illness may be a function of smoking. That is, not only does this group of individuals tend to smoke, but smoking may also cause long-term illness, disability, or infirmity, as discussed previously (37)(38)(39)(40).
We decided to implement an instrumental variable regression to rule out this simultaneous effect (41). We used the number of vehicles owned as the instrumental variable. This instrument was chosen because individuals with long-term illness, disability, and infirmity are less able to drive, resulting in them being less likely to purchase vehicles and their family owning less vehicles (42)(43)(44). The relevance condition is testable; the rule of thumb is that if F > 10, then the relevance condition is satisfied, meaning that there is a statistically significant association between the instrument (Vehicle ownership) and explanatory variable (Illness). The exogeneity assumption is not testable, but it should be satisfied given our all-around controls denoted by X' to avoid potential omitted variables that may result in biased OLS estimators. That is, the instrument is uncorrelated with the error term, and only influences smoking status indirectly through illness. Therefore, our first-stage equation is the following: where Vehicle' denotes a vector consisting of the dummies of the vehicles owned. We used the predictions of illnesŝ Illnesss i as part of the second-stage equation in an instrumental variable specification: We firstly chose to use a linear probability model (LPM) in both stages as our "baseline" regression (24). Because both smoking and illness are binary variables, we further used recursive bivariate probit model (RBPM) where we used probit model in both stages to enhance the consistency of results. Generally, we prefer RBPM model because it is designed for binary outcomes (e.g., both smoking and LSI have only two possible outcomes). We additionally implemented the endogeneity test (Table 3), which reported the endogeneity of the illness variable, again indicating the need to implement instrumental variable regression. The null hypothesis is that the specified endogenous regressors can actually be treated as exogenous. This test is a robustness check against our claim of the need for an IV regression: an insignificant test statistic indicates that generic OLS regression (Equation 1) is sufficient, and IV regression is unnecessary. All analyzes were completed in Stata 16. Table 3 presents the results of our instrumental variable specifications with both first-stage and second-stage equations. For IV specification LPM, in the first-stage equation, significant and negative results are reported for at least 1 vehicle owned (−0.06, t = −4.22), at least 2 vehicles owned (−0.11, t = −6.91), and at least 3 vehicles owned (−0.14, t = −7.13), with 0 vehicles owned as the reference group. In the second-stage equation, the main outcome of interest in the instrumental variable specification, a positively large causal impact of longterm illness, disability, or infirmity on the possibility of smoking, is reported (0.90, t = 5.74). The F statistic in Table 3 indicated the validity of our instruments. For IV specification RBPM, in the first-stage equation, significant and negative results are reported for at least 1 vehicle owned (−0.06, t = −4.16), at least 2 vehicles owned (−0.11, t = −6.69) and at least 3 vehicles owned (−0.14, t = −6.81), with 0 vehicles owned as the reference group. In the second-stage equation, the main outcome of interest in the instrumental variable specification, a positively large causal impact of long-term illness, disability, or infirmity on the possibility of smoking, is reported (0.28, t = 6.96). The F-statistic is 60.89 in the first-stage equation, demonstrating the validity of our claimed relevance condition.

. Discussion
This study intended to find the causal impact of being longterm ill, disabled, or infirm on the status of smoking. Our study implemented an instrumental variable regression on a linear probability model and recursive bivariate probit model. We used a heteroskedastic version of the regression model for robustness. In the first-stage regression, we found negative associations between having 1, 2 or 3+ vehicles relative to having none and long-term illness. This supports our claim that individuals with long-term disease, disability, or infirmity are less likely to drive or not allowed to drive and therefore own fewer vehicles. Alternatively, the association may also indicate that individuals with more vehicles are less likely to have long-term illness, disability, or infirmity. The secondstage regression reported that individuals with long-term illness, disability, or infirmity were 28% more likely to smoke. As in IV regressions the estimates only include unidirectional effect, this study empirically proves that individuals with long-term illness, disability, or infirmity are more probably to smoke comparing to healthy individuals.
Admittedly, our findings have some limitations. Although we estimated a positive causal impact of LSI on smoking, there is insufficient quantitative data and evidence for us to disentangle

First-stage (Equation )
Second-stage (Equation ) First-stage (Equation ) Second-stage (Equation ) Illness -0.90 * * *  the reasons why individuals with long-term illnesses are more likely smokers. One possible explanation is that smoking could be a method of coping with psychological issues due to longterm illness; studies have found that tobacco consumption can control stress levels, alleviate signs of depression, and enhance wellbeing in the short term (45,46). Although smoking may be a way to ease anxiety and distress in the short term for people with chronic mental health issues, no concrete research suggests that smoking has a positive influence on subjective mental health over the long term (12, 47-49). Additionally, disabled persons generally experience greater discrimination than the general population. These perceived discriminations are also connected with increased stress and a higher likelihood of tobacco use (50,51). However, the extent to which these factors affect the smoking rate remains quantitatively unclear, warranting further investigation in the future.
Also, the cohort of our studies is only UK residents, implying that it may only identify a local treatment effect of having disability, infirmity, and long-term illness. That is, this causal effect found by our studies may be different in another country, such as those "south countries" (41). Further, another potential limitation is caused by the self-reported data since individuals might provide a socially acceptable response rather than the truth. Given that smoking may not be regarded as an appropriate practice in certain places, respondents may conceal their smoking status by responding that they do not smoke. However, as the data collection is anonymous and we have used the instrumental variable framework, the selection bias is largely eliminated. In addition, a potential relationship between car ownership and smoking might exist irrespective of LSI status, as car ownership could be an indicator of socio-economic status, violating the IV exclusion assumption. Although this relationship is not testable, this is less of a problem as we have a rich set of controls, including income and socioeconomic status.
Despite these limitations, this study can inform social care workers, psychologists, and public health practitioners to take more care of the health state of this marginalized group (individuals with LSI). Due to their current health state, they may face subtle or overt discrimination and have smaller networks, making them anxious and lonely. Regardless of why these individuals smoke more, the bidirectional relationship between smoking and illness creates a vicious cycle. So, to discourage smoking and improve health levels, the most common measure implemented by healthcare workers previously was not sufficient as those with the chronic illness are Frontiers in Public Health frontiersin.org . /fpubh. . much more likely to be smokers. As such, it is necessary to take further care of those individuals and also target them directly by interventions to break the vicious cycle, which may alleviate the burden on National Health Service (NHS) hospitals (13). This study fills a gap in the current literature, which includes only theoretical studies by psychologists (39) and lacks empirical studies due to the simultaneous effect. Our studies confirm this association as a causal relationship; that is, individuals with chronic disease are 28% more probable to be smokers, ceteris paribus. The present data suggest that physical impairments increase smoking, and if psychological problems play a role, it may be helpful for doctors and psychologists to target those processes. As such, the smoking frequency of disabled persons will decline, leading them to have better health, a greater life expectancy, and a lower burden of health expenditures (52).
In the future, we aim to extend our research to other countries, where people with disabilities, infirmities, and longterm illnesses have fewer social-care resources, and to compare how the results vary. We also aim to investigate the psychological reason that individuals with chronic diseases smoke more. Currently only psychological studies are discussed theoretically, and we aim to explain the theories empirically in future studies.

. Conclusion
This study estimated the causal impact of long-term illness, disability, or infirmity on individual smoking behavior using the instrumental variable strategy on UK General Household Survey data. Although the descriptive statistics did not reveal a statistically significant correlation between longterm illness, disability, and individual smoking behavior, our instrumental variable estimates showed that individuals' longterm impairments could lead to a sizable smoking tendency.
The public policy implications of these findings are, to some extent, straightforward. Because people with disabilities and chronic diseases are more likely to be smokers, conventional targeted smoking cessation interventions were not very effective. In other words, we have been trying to address the result rather than the cause of smoking, so it might be best redirecting intervention effects toward individuals with longterm impairment. Other than conventional smoking cessation programs, policies for disabled people should also consider filling the inequalities between them and healthy individuals, such as providing accessible information for persons with cognitive impairments, promoting equal opportunities for jobs, and having accessible facilities or technologies readily available (53,54).

Data availability statement
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/supplementary material.

Author contributions
XZ and YL conceived, designed the analysis, and collected the data. XZ, YL, and YX contributed to the data or analysis tools and performed the analysis. XZ, YL, TZ, and YX wrote the paper. All authors contributed to the article and approved the submitted version.