COVID-19 and the UK Biobank—Opportunities and Challenges for Research and Collaboration With Other Large Population Studies

Large population studies such as the UK Biobank provide great opportunities for understanding the pathophysiology, health impact and prognostic factors associated with COVID-19, a condition that has had significant impact on almost everyone around the world. We highlight the vast opportunities, challenges and limitations for research and collaboration from the UK Biobank and other large population studies in helping us better understand and manage both current and potential future pandemics.


BACKGROUND
The Coronavirus disease 2019 (COVID-19) pandemic has had a profound impact on health and the way people live globally. Our knowledge of the disease is increasing at a fast pace and thus far has largely been from observational studies and registries (1,2), with an increasing number of clinical trials underway assessing treatment options, vaccination and other preventative strategies to limit the morbidity and mortality associated with it (www.covid-trials.org).
There have been reports that the disease has worse outcomes in those who are older, have cardiovascular disease, and may potentially be linked to certain medications, as well as socially disparate groups. The studies to date, whilst essential given the extraordinary circumstances, are prone to potential limitations inherent in clinical observational studies that generally lack systematic assessment and initially included mostly those who had been moderate or severely affected by COVID-19 and thus required hospitalization (3). The main presentations have been with cough and fever and confirmed cases were initially based on positive nasal and throat swabs for SARS-CoV-2 leading to respiratory failure. Oxygen support, non-invasive or invasive ventilation have been the main stay of treatment to date with reports of propensity to thromboembolic complications and potential cardiac manifestations (4).

LARGE POPULATION STUDIES
Large longitudinal population studies provide a powerful way of tracking the health of a large group of the population over time (5). The impact of factors such as environmental, genetic and lifestyle choices on health and outcomes can be assessed to enable researchers to better understand the drivers for health and potential differences between groups of people. With the ultimate aim of improving health through public health policies and their delivery. A number of large population studies are under way around the world including the UK Biobank study, the China Kadoorie Biobank, USA Million Veteran Program, and the Prospective study of 500,000 adults in Chennai, India (6)(7)(8)(9). Each study will have variations in the number of people enrolled, although these specific ones aim to involve between 500,000 or more adults. Each study varies in the populations enrolled (including age and ethnicity) and extent of factor measurement (imaging and genetic testing, for example). For the purpose of this manuscript we will discuss the UK Biobank study and other population studies to assess the opportunities and challenges in relation to the recent COVID-19 pandemic.

UK Biobank Cohort Study
The UK Biobank is a prospective cohort study with deep phenotype and genotype data collected for over 500,000 individuals aged between 40 and 69-years-old at recruitment between 2006 and 2010, from across England, Scotland and Wales (6). The rich dataset contains biological measurements, lifestyle questionnaires and health-related information, blood and urine biomarkers for all participants. Genome-wide genotype data collected on all participants are providing opportunities for genetic association discoveries and genetic basis of complex traits that could guide future therapeutic targets (10).
Additional information in a large subset are available or in the process of being collected, such as deep imaging (MRI of the heart, brain and abdomen, carotid ultrasound scanning and bone densitometry) in 100,000 with a target completion in 2023 (11,12). Almost half of these participants have already been scanned. There is also funding confirmed to allow follow-up scanning in about 10,000 of these volunteers.
The number of UK Biobank participants scanned pre-COVID was under just below 50,000. The imaging centers stopped scanning participants on the 13th March due to COVID-19 and will resume scanning when deemed safe. Although only a 1/5 of the UK Biobank are planned to have imaging, it still provides detailed imaging information on 100,000 individuals which is substantial and unprecedented for any national biobank. Another advantage is the on-going rescanning effort which will enable the assessment of pre-and post-COVID changes.
Follow-up health information is provided by robust linkage to primary care electronic health records, death and cancer registries and hospital admission records. With increasing outcome information generated over time the epidemiological opportunities of the UK Biobank study will be vast.
The open source nature of the UK Biobank study is novel and therefore allow any researcher to benefit from the size and scope of the study through an application process. This is particularly commendable given longitudinal studies are notoriously expensive and logistically challenging to execute.

UK Biobanks and COVID-19
With the COVID-19 pandemic affecting so many people, the UK Biobank study provides great opportunity for epidemiological analysis and allow us to explore characteristics that are associated with poorer outcomes in COVID-19 patients along with those that may be protective. The association of lifestyle, comorbidities, medication and phenotypic information with outcomes will become an invaluable source as more data becomes available on those that are tested for presence of COVID-19, especially as the UK government plans to ramp up targets for testing in the general population and not just those admitted to hospital or health care workers (Figure 1).
Results of COVID-19 tests for UK Biobank participants are provided by Public Health England for participants residing in England. These are being updated on a weekly basis and include both positive and negative test results. On a monthly basis, information directly linked to primary care data, hospital inpatient data, and death data will be made available along with critical care data for those individuals that have been confirmed as having COVID-19. Table 1 provides examples of large population studies and which studies are actively collecting COVID-19 related information. Even at the time of revising the manuscript for the journal it was clear that a large number of the Biobanks were taking active steps in increasing the COVID-19 related data to help us better understand the disease.
The UK Biobank for example has now also initiated a coronavirus antibody study where they will invite a representative sample of 20,000 of the total 500,000 participants who express an interest in participation. They will be asked to self-collect 0.5 mls of blood from finger prick for antibody testing. This will be repeated monthly for at least 6-months. Children and grandchildren of the participants, who are over the age of 18 years will also invited to provide blood samples for both antibody testing and genetic testing to additionally assess for genetic susceptibility in young adults.

OPPORTUNITIES
The growing COVID-19 related information for a cohort with a rich phenotype and genotype assessment along with regular outcome measure updates will allow researchers to define the relevance of wide-ranging genetic and non-genetic factors to severity and outcomes based on age, lifestyle, co-morbidities, prescribed medications, environmental, and regional factors. The outcome data now and in the future will provide a comprehensive analysis of the mortality rates and associated morbidity in the UK cohort. Particularly where the data are able to help identify risk factors that predispose to poorer outcomes and those that could be protective thus guiding lifestyle and prevention recommendations. This creates a colossal opportunity for detailed analysis of the cohort and the impact of the disease on longer term health and well-being of survivors that will guide future research and public health policies.
In those who have already undergone deep imaging phenotyping, follow-up scanning will provide novel insights in understanding the downstream, long-term effects of COVID-19 exposure on biological systems. Analysis of the subset of participants undergoing follow-up imaging could also provide better understanding of pathophysiology using the pre-and post-COVID-19 imaging data.
The UK Biobank is already one of the largest contributors to an international consortium to investigate the genetic determinants of vulnerability to COVID-19, disease severity and outcomes (https://www.covid19hg.org). The second-round meta-analyses of the genome-wide association studies of COVID-19 status had been released. This initiative may not only enrich our knowledge of COVID-19 biology but provide the genetic evidence for drug targets and assist in the development of genetically informed risk assessment of COVID-19 susceptibility. The genetic data also allow the conduct of Mendelian randomization studies which permit evaluation of causality in observational settings (13).

CHALLENGES AND LIMITATIONS AND FUTURE PERSPECTIVES
There is already a large interest from researchers globally in the UK Biobank study which will lead to healthy competition for research and publication. As large groups of researchers may be working in silos on similar projects there may large efforts with those being quickest getting publications. Due to the need for timely submissions for publication there is a potential risk for less rigor or quality control checks during data cleaning and analysis (14).
The UK Biobank enrolled middle and older aged adults only and Caucasians making up the vast majority of participants, with limited number of other ethnicities (15). No participants were under the age of 40 at enrolment 16 years ago. Thus, only those who are about 56 years and older at the time of the COVID-19 pandemic are included. The recently proposed inclusion of children and grandchildren of participants for antibody testing will partly reduce this limitation. There is also evidence of healthy-volunteer bias in the UK Biobank cohort. Therefore, although the UK Biobank data are valid for the investigation of biological associations given its large sample size and the heterogeneity of measurements, it cannot be used to ascertain true disease prevalence in the population (16).
Impact of delayed uptake of population screening through swabbing in the UK in those with milder disease along with lack of systematic symptom data may limit the research potentials. There is also a chance that key findings may only be generated once we have passed the worst period of the pandemic.
Data sharing that allow combination of large cohorts from around the world including the UK Biobank study and other larger population initiatives will increase the richness of the data and allow better assessment of geographical variations, ethnic differences and similarities to better guide public health policies and ways of managing future pandemics. CONCLUSIONS COVID-19 has had a global impact and will change our health care approaches in the future. The UK Biobank population study can offer great opportunities given the detailed systematic nature of the assessments along with the growing linkage to the current COVID-19 testing and outcome data. The true potentials of the UK Biobank and other large population-based research studies will become evident as the data accumulate over time and may be enhanced further by linking large population-based studies which can allow limitations such as ethnic and geographical differences and guide optimisation of public health policies.