Challenges and Opportunities
- Motivation
One of the main endeavours of many sciences is to identify causal relations. Given a phenomenon, researchers are interested in finding out which variables have causal influence on that phenomenon. Furthermore, scientists want to know how that phenomenon will change when one manipulates those variables. For instance, when a new drug is tested, it is examined whether it has a causal effect on improving the health of the patients.
Big data analytics, machine learning, and deep learning have garnered significant interest in the health science fields. Due to their excellent predictive accuracy, they are increasingly employed for disease diagnosis and risk prediction. However, in many biomedical applications, achieving high prediction accuracy in and by itself is not the primary goal; discovering the risk factor or mechanism that can be altered is often the primary research question.
Like many scientific concepts, causal relations are not features that can be directly read off from the data, but have to be inferred. The field of causal discovery is concerned with this inference and the assumptions that support it.
Today’s machine learning applications are largely based on associations. Even though a risk factor may be associated with the disease, it does not necessarily mean that it can alter the disease process. In early 2018, a Phase 3 trial called “TOMMORROW” tested the effect of a diabetes drug on reducing Alzheimer’s disease (AD) dementia risk. The study measured amyloid deposition, which is an early sign of Alzheimer’s disease and is also associated with diabetes. However, since diabetes is not causal to amyloidosis, the study failed in the interim analysis. For a successful intervention, the risk factor we intervene on should have a causal (rather than merely associative) relationship with the disease outcome.
- Causal Relationships for Clinical Research
Clinical research is predominantly focused on causal relationships. Hypothesis-driven clinical research, for example, often assumes a causal structure, a set of causal relationships among biomarkers and outcomes, and researchers estimate the effect size of these relationships (e.g. causal inference). In such research, drawing a causal conclusion is valid, because prior knowledge ascertains that the relationships are indeed causal. However, when there is no knowledge of the causality, the causal structure itself needs to be discovered from data through a process known as causal structure discovery. A commonly used but incorrect practice is to assume a partial causal structure and adjust it based on output statistics of the fitted model using methods such as structural equation models (SEM).
- Cardio-Vascular Disease
We might have measures of different quantities obtained from, say, a cross-sectional study, on the amount of wine consumption (for some unit of time) and the prevalence of cardio-vascular disease, and be interested in whether wine consumption is a cause of cardio-vascular disease (positivey or negatively), and not just whether it is correlated with it. That is, we would like to know whether the observed dependence between wine consumption and cardio-vascular disease (suppose there is one) persists even if we change, say, in an experiment, the amount of wine that is consumed. The observed dependence between wine consumption and cardio-vascular disease may, after all, be due to a common cause, such as socio-economicstatus (SES), where those people with a higher SES consume more wine and are able to afford better health care, whereas those with a lower SES do not consume as much wine and have poorer healthcare. The example illustrates the common mantra that “correlation does not imply causation” and suggests that causal relations can be identified in an experimental setting, such as a randomized controlled trial where each individual in the experiment is randomly assigned to either the treatment or control group (in this case, to different levels of wine consumption) and the effect on cardiovascular disease is measured.
The randomized assignment makes the wine consumption independent of its normal causes (at least in the large sample limit) and thereby destroys the “confounding” effect of SES. Naturally, there are many concerns about such an analysis, starting from the ethical concerns of such a study, the compliance with treatment, the precise treatment levels, the representativeness of the experimental population with respect to the larger population etc., but the general methodological reason, explicitly emphasized in R.A. Fisher’s well-known work on experimental design, of why randomized controlled trials are useful between wine consumption and cardio-vascular disease (suppose there is one) persists even if we change, say, in an experiment, the amount of wine that is consumed.
The observed dependence between wine consumption and cardio-vascular disease may, after all, be due to a common cause, such as socio-economicstatus (SES), where those people with a higher SES consume more wine and are able to afford better health care, whereas those with a lower SES do not consume as much wine and have poorer healthcare. The example illustrates the common mantra that “correlation does not imply causation” and suggests that causal relations can be identified in an experimental setting, such as a randomized controlled trial where each individual in the experiment is randomly assigned to either the treatment or control group (in this case, to different levels of wine consumption) and the effect on cardiovascular disease is measured. The randomized assignment makes the wine consumption independent of its normal causes (at least in the large sample limit) and thereby destroys the “confounding” effect of SES. Naturally, there are many concerns about such an analysis, starting from the ethical concerns of such a study, the compliance with treatment, the precise treatment levels, the representativeness of the experimental population with respect to the larger population etc., but the general methodological reason, explicitly emphasized in R.A.
[More to come ...]