How Do We Define What Is Bad for Your Health? The Role of Epidemiology

Ever wonder how we have found that tobacco, alcohol, and even too much sugar are harmful? Epidemiology is a branch of medicine that studies how often diseases occur in different groups of people and why. By following and studying a large number of people, it is possible to identify behaviors that will have an impact on health. This impact can be negative (as with alcohol, tobacco, and junk food) or positive (as with doing sports and eating fruits and vegetables). In this article, we will explain how epidemiological studies can be used to identify substances or behaviors that impact health, and how to know whether we can trust the results of such studies when we see them on the news.


Figure Figure
The two main types of epidemiological studies. Case-control studies begin with the disease and work backwards toward identifying the exposure factors that led to the disease, so we call them retrospective. In cohort studies, we start from the exposure and separate the exposed and non-exposed subjects, then wait to see whether the disease appears. This is called a prospective study. and to adopt measures to increase the health of the population POPULATION An identified group of people that will be investigated in a study.
The first thing to do before starting any epidemiological study is to identify behaviors that are either potential risk factors for health, or potentially protective for health. Although today some risk factors appear quite obvious to us, like tobacco or alcohol consumption, this has not always been the case-there was a time when these links were not so obvious. Currently, there is plenty of research going on to study the e ects of electronic cigarettes, Wi-Fi, pesticides, and a lot of other potential risks to human health. Once a behavior or a substance that might influence human health has been identified, we need to design a study to evaluate its impact on health. In this article, we will explain the principles of epidemiology using the example of the association between the exposure (tobacco) and the outcome variables (lung

OUTCOME VARIABLE
The object of the study, in most of the study we are interested in a disease or in death. cancer). Tobacco is probably the best example to use, because it has a long history. The consumption of cigarettes began in the early s and increased for many years afterwards, because the dangers of smoking had not yet been identified. Some early studies on the e ects of tobacco were done during the s and s, but it was not until the s that the first large-scale studies were performed [ ].

DESIGN A STUDY
The first step toward determining if something is helpful or harmful to human health is to design a study. Basically, if you want to assess the link between a risk factor and a particular disease, you have two possibilities: you can start from the risk factor and try to connect it to the disease, or start from the disease and try to work back to the risk factor that caused it (Figure ).
kids.frontiersin.org February | Volume | Article | Table   Table showing calculations from the results of case-control and cohort studies.

Risk or exposure Cases cancer Controls no cancer
Exposed tobacco Non-exposed no tobacco For the case-control study, the OR is the ratio between the number of cancer cases seen in exposed (smoker) and non-exposed (non-smoker) participants. So, an OR of . means that a patient with lung cancer is six times more likely to be a smoker. For the cohort study, the RR is the ratio of the incidence of lung cancer between tobacco-exposed and non-exposed subjects. So, an RR of means that smokers are five times more likely to get lung cancer compared with non-smokers. Case − controlstudy Odds Ratio

Table
Let us first discuss starting from the disease and trying to identify the risk factor. We will continue with our example of tobacco. If you think that tobacco causes lung cancer, you can go to the hospital and ask patients with lung cancer if they smoke or used to smoke [ ]. This information by itself is not very useful, because we need to compare it to the same information obtained from healthy people without lung cancer. To do this, we need to find healthy people who have other characteristics (like age, gender, profession, hobbies, etc.) that are as similar as possible to the group of lung cancer patients. This group is called the control group. Once we have collected the data from both

CONTROL GROUP
Group of people used as comparison.
Depending on the type of study, it can be a group of healthy people (case-control study) or a group not exposed to the risk factor (cohort study).
groups, we can put the results into a table, separated by whether the people were exposed to the risk factor (tobacco, in our case) or not and whether they had lung cancer or not (Table ).
We can then compute two first ratios (number of exposed/number of non-exposed): one for the cases (patients with lung cancer) and one for the control group. Finally, we can compute the ratio of these two first ratios, referred to as the odds ratio (OR), which represents the probability for a patient with cancer to be a smoker. For example, an OR of . means that a patient with lung cancer is six times more likely to be a smoker.
We call this type of analysis a case-control study, because we are

CASE-CONTROL STUDY
Type of study in which the outcome (for example, having lung cancer) is used to define the groups (cases and control) and epidemiologists try to identify the exposure that led to the outcome (tobacco).
comparing patients to a control group. It seems nice, but we will see in the next section that there are some limitations to this type of study.
The second option is to start from the risk factor and wait for the disease. Again, we need to have a control group with which to compare the results of the exposed group. This is what epidemiologists started doing in , to study the e ect of tobacco on lung cancer [ ]. At the beginning of the study, both groups must be as similar as possible, with the only di erence between them being the risk factor, such as whether or not they use tobacco. From an ethical point of view, we obviously cannot force people to do something potentially harmful, so we have to find people who are already willingly exposed kids.frontiersin.org February | Volume | Article | to the risk factor. After a certain period of time, which varies greatly depending on the disease studied, we can compute the incidence, or the number of new cases of disease over a period of time, for both groups (number of cases/number of people in the group). Finally, we can compute the risk ratio (RR), which is the risk of developing the disease for the exposed people compared with the non-exposed people. For example, an RR = indicates that smokers are five times more likely to get lung cancer compared with non-smokers.
This kind of study is called cohort study, because we are following up

COHORT STUDY
Type of study in which the exposure (for example, tobacco use) is used to define the groups (smokers-exposed or non-smokersnon-exposed) and epidemiologists then wait to see if disease occurs.
with people over time. Usually, cohort studies are more powerful than case-control studies, because they are less likely to be influenced by biases, which will be discussed below.
There are other study designs, but case-control studies and cohort studies are the most popular and frequently used. Now we will discuss a very important point in the current context of fake news and misinformation: can we believe in the results of epidemiological studies?

QUALITY OF THE STUDY … AND THE RESULTS!
For someone who is not familiar with epidemiology and is reading the results of a study, the best way to determine whether the results can be interpreted with confidence or not is to look at … the confidence intervals! Confidence intervals are a range of value around the

CONFIDENCE INTERVALS
A range of values around the measured value that likely contain the true value of a variable in the population. It is a mathematical way to determine whether the results of a study can be viewed with confidence. estimation based on the study that likely contain the true value of the whole population. In a good study, the OR and RR are never presented alone, but together with their confidence intervals (usually a % confidence interval, meaning that if we repeat the same experiment times, the results will be within that range times). We would not go into the details of calculating confidence intervals, but to keep it simple, a well-conducted study with enough subjects and not too much variation in the results will give narrow confidence intervals, indicating that the results can be trusted. In Figure , you can see the risk of having lung cancer for a former smoker and current smoker, compared with a non-smoker. This study has reasonable confidence intervals, so the results can be trusted.

BIAS
We have just seen that the confidence intervals can be used to assess the quality of a study, and they are a good indicator, but studies with narrow confidence intervals can still be totally biased! What is bias? It is a type of error that will lead to incorrect conclusions from the data. There are plenty of possible biases, but the two most important kinds are called selection biases and information biases. To put it simply, selection bias occurs when the improper patients or controls kids.frontiersin.org February | Volume | Article | Figure   Figure Results of a study on the influence of smoking on lung cancer. The odds ratio represents the risk of having lung cancer between non-smokers, former smokers, and current smokers, n means the number of participants that have been followed in the di erent groups. The scale at the bottom of the figure represents the OR, we use this kind of representation to ease the interpretation and to directly visualize the importance of the studied risk factors. are selected for the study. In the study of tobacco and lung cancer, a selection bias would occur if some of the control actually had undiagnosed lung cancer, for example.
Information bias occurs when the information from the two groups is not obtained in the same way. For example, the investigator of a study can ask much more questions to patients with lung cancer about tobacco consumption compared to interviews with the healthy control patients. A common type of information bias is called recall bias: in some studies, we need to go back or years, so it is often di cult for people to remember or sometimes they are ashamed of their past behaviors and do not tell the truth, or they do not think that smoking a few cigarettes years ago is considered as smoking.
Biases can seriously a ect the outcome of a study, so it is important to be aware of them and to discuss the potential influence of the various types of bias when presenting the results of a study!

PRINCIPLE OF CAUSALITY
We have discussed the design of an epidemiological study, the confidence intervals, and the biases. If everything has been done correctly, does that mean that tobacco causes lung cancer? Not necessarily! We still have to think about the causality-the relationship between causes and e ects. Just because two things, like tobacco use and lung cancer, seem to be associated with each other does not definitely mean that one causes the other. For example, if we still considering lung cancer, the OR or RR are higher for alcoholic people compared to people who are not drinking, but it does not mean that alcohol causes lung cancer! Actually the alcohol is a confounding factor: alcoholics tend to smoke more than non-alcoholics which explains this association.
So, how do we know if our results are truly causal or if they are just associated with each other? There are several ways to do this. Criteria to help determine if observed epidemiological associations are causal have been proposed [ ], the most important are the strength of the association (the higher the OR or RR are, the more likely it is to be causal), the temporality (exposure must precede the onset of disease), and the biological gradient (an increased exposure (for example the number of cigarettes/day) resulted in increased lung cancer). They provide the clearest evidence of a causal relationship.

CONCLUSION
In this article, we have described the di erent steps of an epidemiological study: identifying a risk factor, choosing the appropriate study design, and trying to minimize the influence of bias as much as possible. Once the data has been obtained and the results computed, the results must then be interpreted and causality must be established. Finally, once the conclusion has been reached and a risk factor has been identified, the most important part is to inform the public and politicians, so that everyone can work together to establish preventive measures to decrease or minimize the impact of this factor on public health.