CAUSALITY: USING MATH TO UNDERSTAND THE SCIENCE OF CAUSE AND EFFECT

Some people say mathematics is the most important subject in science because it is the language of nature. In this article, we provide examples for this by explaining causality. Causality is an important concept because it influences essentially all areas of science and society. In simple terms, causality is the principle that examines the link between a “cause” and an “effect”. This allows us to study important practical questions. For instance, in medicine, biology or law, one can ask “What medication can be used to treat this disease?” “What protein activates a certain gene?” or “What criminal act caused the harm?” To answer these and similar questions, methods from probability, statistics, and graph theory are needed to quantify the meaning of causality. In this article, we provide an overview of this fascinating topic.

Some people say mathematics is the most important subject in science because it is the language of nature.In this article, we provide examples for this by explaining causality.Causality is an important concept because it influences essentially all areas of science and society.In simple terms, causality is the principle that examines the link between a "cause" and an "e ect".This allows us to study important practical questions.For instance, in medicine, biology or law, one can ask "What medication can be used to treat this disease?""What protein activates a certain gene?" or "What criminal act caused the harm?"To answer these and similar questions, methods from probability, statistics, and graph theory are needed to quantify the meaning of causality.In this article, we provide an overview of this fascinating topic.

WHAT IS CAUSALITY?
The study of causality has a long lasting history dating back to the philosophers Aristotle ( -BC) and David Hume ( -).While their work is important addressing philosophical questions about causation like "what it means for something to be a cause",

CAUSATION
The relationship between cause and e ect, where one event (the cause) brings about or influences another event (the e ect).
for the quantification of causality allowing a form of measurement, mathematical models are needed.Such causal models go back to Sewall Wright ( -) Donald Rubin ( -) and Judea Pearl ( -).
Causality refers to the relationship between a "cause" and an "e ect", where the e ect is a result of the cause.It is the principle that helps us understand how things are related and how they change.Simply put, causality is the study of how things are related and how they change.
It is the link between an event and the outcome of that event.
For example, imagine you are playing with a toy car, and you push it across the floor.The cause of the toy car moving is you pushing it.The e ect is the toy car moving across the floor.Without the cause (you pushing the car), the e ect (the car moving) would not have happened.
Another example is when you plant a seed in the ground.The cause is you planting the seed.The e ect is the seed growing into a plant.The cause and e ect are closely related to each other-without planting the seed (cause), the plant would not grow (e ect).
Causality is very important in scientific research, where scientists try to understand how things work by studying causes and e ects.For example, a scientist may conduct an experiment to find out how a certain medication a ects a person's health.The medication is the cause and the e ect is the change in the person's health.
Causality requires three parts: a cause, an e ect, and a relationship between the two (Figure A).Causality can be visualized using a graph (sometimes also called a network) [ ].In the mathematical language of graph theory, a graph consists of two building blocks: nodes and Now that you know that causality is useful for describing the relationship between "cause" and "e ect," the next question is to ask: How do we measure causality?Unfortunately, there is no physical measurement device that can be directly used to measure a causal relationship between two things the way a thermometer measures temperature or a barometer measures atmospheric pressure, for example.Instead, a causal relationship can only be "measured" using a combination of mathematical tools from the fields of probability, happen if we try to use only a statistical approach.

WHAT IS THE PROBLEM WITH STATISTICAL MEASURES?
To demonstrate why it is di cult to measure causality, let us start with a simple example.Assume it is summer and very hot.On the beach you can see many people wearing shorts and eating ice cream.Could you say that wearing shorts makes you eat ice cream?It is probably obvious to you that, even though many people are both wearing shorts and eating ice cream, one factor does not cause the other.There is a statistical measure that allows us to quantify such an association precisely.This association measure is called correlation.Mathematically it is denoted by r xy where x and y indicate that it is evaluated for two variables.

Emmert-Streib and Dehmer
To understand how to estimate correlation from data, let us consider a second example.Suppose for each season of the year, we have information about ice cream sales.We assign "ice cream sales" the variable x i where i denotes a season, i.e., i ǫ {season: fall, winter, spring, summer}.Here "i ǫ" means that i can assume all values in the set given by {fall, winter, spring, summer}.The values of x i are shown in Figure B. Suppose we also have information about the number of motorbike drivers seen on the streets per season, and we call "motorbike drivers" y i .The values of x i and y i are visualized in is a very high value because the maximal value a correlation can assume is .So from this analysis, it looks like ice cream sales and the number of motorbike drivers seen on the road are strongly associated.Based on this observation, one could formulate the following question: Do we see more motorbikes on the streets when we eat more ice cream?
You would probably say the answer to this question is "no".This means that, even though the statistical technique gave us a high correlation, that does not ensure that "ice cream sales" is the cause for "lots of motorbikes."In other words, a high correlation does not ensure that there is a causal relation.If you think carefully, you can come up with an alternative explanation for this situation.Instead of assuming that either "ice cream sales" is a cause of "motorbike drivers" or vice versa, it is more reasonable to think that there is a third variable playing a role-that is, the season of the year.Figure D shows "seasons" as the cause and "ice cream sales" and "motorbike drivers" as two e ects.
Overall, the two examples we have shown tell you something very important: correlation is not causality.

WHY IS CAUSALITY IMPORTANT FOR SCIENCE?
Understanding causality allows us to make predictions about what will happen in the future, based on past experience, and helps us to identify the factors that contribute to certain outcomes.
In medicine, causality is used to understand how various factors contribute to an individual's health.This helps doctors and researchers develop e ective treatments and preventative measures.

Emmert-Streib and Dehmer
An example for the use of causality in biology is the identification of gene regulatory networks (GRNs) [ , ].A GRN is a network similar to the one shown in Figures A, D, meaning it consists of nodes and edges.For a GRN, the nodes correspond to genes and the edges to the interactions between genes.Humans have over , genes, which means GRNs are considerably larger than the networks shown in Figure .Such networks provide important information about the functioning of cells because cellular functions are controlled by gene activity.This means a GRN helps researchers to discover which genes turn on (or o ) other genes.This helps researchers understand genes better, and it can also help them understand what causes certain diseases.
In psychology, causality is used to understand how various factors contribute to an individual's behavior and mental health.For example, researchers study the relationship between a person's emotions and their behavior, or the relationship between a person's environment and their behavior.Similarly, in economics, causality is used to understand how various factors impact the economy, to make predictions about how the economy will behave in the future.
In summary, causality is studied because it helps us to understand how things happen, how things change, and how various factors contribute to certain outcomes.This knowledge is important for making predictions, designing experiments, and developing e ective interventions and treatments.

POTENTIAL OUTCOMES AND THE RUBIN CAUSAL MODEL
So, now you know that there is a problem with using correlation as a measure of causality.A solution to this problem is provided by the Rubin causal model (RCM) [ ].To understand the basic idea underlying the RCM, let us consider a hypothetical experiment.
Suppose we have a new medication, and we want to test whether it can treat a disease.You could think of aspirin for treating headache, or cough syrup for treating cough, for example.The RCM defines a causal e ect, represented as δ, as the di erence between having received the treatment and not having received the treatment: Here Y corresponds to the outcome when having received the treatment, and Y corresponds to the outcome when not having received the treatment.You could think of Y and Y as measures for the severity of the headache or the number of coughs within an hour.The preceding description corresponds to a hypothetical experiment because in the real world an individual cannot both receive the treatment and not receive the treatment at the same time.For this reason, the variables Y and Y are called potential outcomes because both variables could be potentially observed but only one is actually observed.
The good news is that there are some (statistical) tricks that can be used to estimate such a causal e ect.The underlying idea of this method is quite simple and based on assigning patients randomly to two groups-one group that receives a treatment and a second group that does not.Assuming patients in both groups are similar, one can estimate a causal e ect for the groups.Of course, in the real world, all patients are not identical to each other.For this reason, researchers try to find patients that are similar to each other, with respect to age and general health, for example.
You might have noticed that there is a di erence between the causal e ect defined for Equation and the description given here.The di erence is that the causal e ect in Equation is for an individual patient, whereas the causal e ect from a randomization is for a group of patients.In statistics, this means we estimate a causal e ect for a population of patients.In summary, the randomization of patients allows us to estimate a causal e ect between two groups of patients, making the assumption that all patients are similar to each other.This approach is the underling concept of randomized controlled trials (RCT), which are routinely used for approving new medications or treatments.

CONCLUSION
We hope that our brief overview showed that causality is a fundamental concept that allows us to tackle the most interesting and important problems in society and science.However, measuring causality requires a combined approach, using mathematical methods from probability, statistics, and graph theory and this field is nowadays called data science.
Last, we want to emphasize an aspect of this article that relates to the mathematical language itself.As you can see above, there are various complicated-looking symbols and relations.However, remember that it is completely normal not to understand everything immediately.This is not only the case for pupils in high school, but also for data scientists and statisticians working at universities.In fact, it is very common to study a problem for years before a solution is found.So, do not be intimidated when encountering obstacles-view them as a motivation to work hard to find solutions, even if it takes years.In fact, this is fearlessness that is required for becoming a data scientist.We hope this article helps you realize your potential.
Figure C.The colors of the points correspond to the seasons, as shown in Figure B. The black line in the graph is called a regression line and shows us that the REGRESSION LINE A straight line that shows the average relationship or trend between two sets of data points.data points fall in almost a straight line.Using these values and the equation of correlation, shown in Figure , we find that the value for the correlation between ice cream sales and motorbike drivers is r xy = . .A correlation value of r xy = .