# Precision Public Health

edited by : Tarun Weeramanthri, Hugh Dawkins, Gareth Baynam, Matthew Bellgard, Ori Gudes and James Semmens published in : Frontiers in Public Health

### Frontiers Copyright Statement

© Copyright 2007-2018 Frontiers Media SA. All rights reserved.

All content included on this site, such as text, graphics, logos, button icons, images, video/audio clips, downloads, data compilations and software, is the property of or is licensed to Frontiers Media SA ("Frontiers") or its licensees and/or subcontractors. The copyright in the text of individual articles is the property of their respective authors, subject to a license granted to Frontiers.

The compilation of articles constituting this e-book, wherever published, as well as the compilation of all other content on this site, is the exclusive property of Frontiers. For the conditions for downloading and copying of e-books from Frontiers' website, please see the Terms for Website Use. If purchasing Frontiers e-books from other websites or sources, the conditions of the website concerned apply.

Images and graphics not forming part of user-contributed materials may not be downloaded or copied without permission.

Individual articles may be downloaded and reproduced in accordance with the principles of the CC-BY licence subject to any copyright or other notices. They may not be re-sold as an e-book.

As author or other contributor you grant a CC-BY licence to others to reproduce your articles, including any graphics and third-party materials supplied by you, in accordance with the Conditions for Website Use and subject to any copyright notices which you include in connection with your articles and materials.

All copyright, and all rights therein, are protected by national and international copyright laws.

The above represents a summary only. For the full conditions see the Conditions for Authors and the Conditions for Website Use. ISSN 1664-8714 ISBN 978-2-88945-501-0 DOI 10.3389/978-2-88945-501-0

### About Frontiers

Frontiers is more than just an open-access publisher of scholarly articles: it is a pioneering approach to the world of academia, radically improving the way scholarly research is managed. The grand vision of Frontiers is a world where all people have an equal opportunity to seek, share and generate knowledge. Frontiers provides immediate and permanent online open access to all its publications, but this alone is not enough to realize our grand goals.

### Frontiers Journal Series

The Frontiers Journal Series is a multi-tier and interdisciplinary set of open-access, online journals, promising a paradigm shift from the current review, selection and dissemination processes in academic publishing. All Frontiers journals are driven by researchers for researchers; therefore, they constitute a service to the scholarly community. At the same time, the Frontiers Journal Series operates on a revolutionary invention, the tiered publishing system, initially addressing specific communities of scholars, and gradually climbing up to broader public understanding, thus serving the interests of the lay society, too.

### Dedication to quality

Each Frontiers article is a landmark of the highest quality, thanks to genuinely collaborative interactions between authors and review editors, who include some of the world's best academicians. Research must be certified by peers before entering a stream of knowledge that may eventually reach the public - and shape society; therefore, Frontiers only applies the most rigorous and unbiased reviews.

Frontiers revolutionizes research publishing by freely delivering the most outstanding research, evaluated with no bias from both the academic and social point of view. By applying the most advanced information technologies, Frontiers is catapulting scholarly publishing into a new generation.

### What are Frontiers Research Topics?

Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: researchtopics@frontiersin.org

# Precision Public Health

### Topic Editors:

Tarun Weeramanthri, Western Australian Department of Health, Australia Hugh Dawkins, Western Australian Department of Health, Australia Gareth Baynam, Genetic Services of Western Australia, Australia Matthew Bellgard, Queensland University of Technology, Australia Ori Gudes, University of New South Wales, Australia James Semmens, Curtin University, Australia

Precision Public Health – an emerging field. Image: 'Precision Public Health Asia 2018' organizing committee, used with permission.

Precision Public Health is a new and rapidly evolving field, that examines the application of new technologies to public health policy and practice. It draws on a broad range of disciplines including genomics, spatial data, data linkage, epidemiology, health informatics, big data, predictive analytics and communications. The hope is that these new technologies will strengthen preventive health, improve access to health care, and reach disadvantaged populations in all areas of the world. But what are the downsides and what are the risks, and how can we ensure the benefits flow to those population groups most in need, rather than simply to those individuals who can afford to pay? This is the first collection of theoretical frameworks, analyses of empirical data, and case studies to be assembled on this topic, published to stimulate debate and promote collaborative work.

Citation: Weeramanthri, T., Dawkins, H., Baynam, G., Bellgard, M., Gudes, O., Semmens, J., eds. (2018). Precision Public Health. Lausanne: Frontiers Media. doi: 10.3389/978-2-88945-501-0

# Table of Contents

05 Editorial: Precision Public Health Tarun Stephen Weeramanthri, Hugh J. S. Dawkins, Gareth Baynam, Matthew Bellgard, Ori Gudes and James Bernard Semmens

### Section I Enabling Technologies

	- Malcolm Campbell and Dimitris Ballas

### SECTION II Impacts and Outcomes


Lakkhina Troeung, Nita Sodhi-Berry, Angelita Martini, Eva Malacova, Hooi Ee, Peter O'Leary, Iris Lansdorp-Vogelaar and David B. Preen

54 Variation in Population Vulnerability to Heat Wave in Western Australia Jianguo Xiao, Tony Spicer, Le Jian, Grace Yajuan Yun, Changying Shao, John Nairn, Robert J. B. Fawcett, Andrew Robertson and Tarun Stephen Weeramanthri

64 Improving the Estimation of Risk-Adjusted Grouped Hospital Standardized Mortality Ratios Using Cross-Jurisdictional Linked Administrative Data: A Retrospective Cohort Study Katrina Spilsbury, Diana Rosman, Janine Alan, Anna M. Ferrante, James H. Boyd and James B. Semmens

### SECTION III Challenges and Opportunities


### SECTION IV Informing Policy

	- Caron M. Molster, Karla Lister, Selina Metternick-Jones, Gareth Baynam, Angus John Clarke, Volker Straub, Hugh J. S. Dawkins and Nigel Laing

### SECTION V The Future of Precision Public Health


# Editorial: Precision Public Health

*Tarun Stephen Weeramanthri <sup>1</sup> \*, Hugh J. S. Dawkins1 , Gareth Baynam2 , Matthew Bellgard3 , Ori Gudes4 and James Bernard Semmens5*

*1Public and Aboriginal Health Division, Western Australian Department of Health, Government of Western Australia, Perth, WA, Australia, 2Genetic Services of Western Australia, Subiaco, WA, Australia, 3 eResearch Directorate, Queensland University of Technology, Brisbane, QLD, Australia, 4University of New South Wales, Sydney, NSW, Australia, 5Curtin University, Perth, WA, Australia*

Keywords: technology, data, GIS, equity, ethics, omics, prevention, policy

**Editorial on the Research Topic**

**Precision Public Health**

# INTRODUCTION—OLD AND NEW

Traditional public health practice has had a central reliance on data, and the core discipline of epidemiology, in order to inform health policy and priority setting, drive health improvement across whole populations, and target disadvantaged populations. Core public health activities include risk factor and disease surveillance, screening, development of interventions, assurance, and evaluation. Since the 1970s, New Public Health has also emphasized community engagement, health promotion, partnerships, and advocacy.

In the last 20 years, and particularly with the sequencing of the human genome and advances in other "-omics," informatics and a range of technologies, new possibilities have opened up for a much more finely delineated view of the "time-person-place" triad that underpins epidemiology, and the balancing of genetic, biological, environmental, and social determinants of disease.

This may lead, we argue in this article, to new preventive and treatment options and the next paradigm shift in public health, namely toward "Precision Public Health" or PPH. However, we also caution against a blind optimism about what technology can achieve on its own, and argue for a solid grounding of PPH on the old verities of public health, namely whole population health improvement and equity.

### USE OF THE TERM "PRECISION PUBLIC HEALTH"

In 2013, building on our experience in the Health Department of Western Australia with genomics, spatial technology in health, and data linkage, and our extensive "policy-practice-academic" partnerships in all three areas, we proposed use of the term "Precision Public Health" to complement the parallel developments in medicine, such as Personalized Medicine and Precision Medicine, a term used in a 2011 US National Academy of Sciences Report, and then the subject of a major US research initiative in 2015, focused on cancer and other diseases (1).

Reservations about the individual and clinical focus of Precision Medicine, its silence on social determinants, and its capacity to improve population health were expressed by Bayer and Galea (2). The new concept of PPH was introduced into the academic literature by Khoury,1 who called for a modernization of surveillance, epidemiology, and information systems, as well as targeted interventions and a population health perspective (3). Most recently, Khoury has

*Edited and Reviewed by: Paul Russell Ward, Flinders University, Australia*

### *\*Correspondence:*

*Tarun Stephen Weeramanthri tarun.weeramanthri@health. wa.gov.au*

### *Specialty section:*

*This article was submitted to Public Health Policy, a section of the journal Frontiers in Public Health*

*Received: 06 April 2018 Accepted: 12 April 2018 Published: 30 April 2018*

### *Citation:*

*Weeramanthri TS, Dawkins HJS, Baynam G, Bellgard M, Gudes O and Semmens JB (2018) Editorial: Precision Public Health. Front. Public Health 6:121. doi: 10.3389/fpubh.2018.00121*

<sup>1</sup>Khoury M. CDC Blog post March 2, 2015 titled "Precision public health and precision medicine: two peas in a pod." Available from: https://blogs.cdc.gov/genomics/2015/03/02/precision-public/ (Accessed: April 18, 2018).

emphasized the historic continuity of PPH to work on public health genomics over recent decades, while acknowledging that PPH encompasses more than genomics (4).

The first meeting to use the "PPH" term was the Precision Public Health Summit held in San Francisco in June 2016.2 Though most of the participants were from the US, the meeting had a global health focus, and focused on data integration and sharing, new partnerships, community engagement, and social justice for better public health outcomes. A subsequent article from Bill and Melinda Gates Foundation authors presented a "back to basics" view of PPH suitable to the developing world: use of data with greater geographic precision to improve disease surveillance; better birth and death registration; building of laboratory capacity; and training in epidemiology (5).

## DEFINITION OF "PRECISION PUBLIC HEALTH"

Though a universal definition of PPH has not been adopted, a number of complementary definitions have been proposed.

In the introduction to this Frontiers Research Topic (RT), we proposed the following definition of "precision public health": "the application and combination of new and existing technologies, which more precisely describe and analyse individuals and their environment over the life course, to tailor preventive interventions for at-risk groups and improve the overall health of the population."

The Precision Public Health Summit had a breakout group session on "Building a Working Definition of PPH,"3 where divisions emerged between clinicians and academics on one side, and public health practitioners on the other, on whether the goals of PPH were already encompassed under Precision Medicine, and whether an alternative hybrid term such as Precision Health was preferable. There was a clear perception that the PPH term carried an implied criticism of Precision Medicine, the fairness of which was debated.

Khoury has described "precision in the context of public health" as "improving the ability to prevent disease, promote health and reduce health disparities in populations" through the application of technology and the development of targeted programs and health policy (paraphrased) (see text footnote 1).

In this Frontiers RT, Dolley has described PPH as "an emerging practice to more granularly predict and understand public health risks and customize treatments for more specific and homogenous sub-populations, often using new data, technologies and methods."

Baynam et al. has added a descriptor of PPH as a "new field driven by technological advances that enable more precise descriptions and analyses of individuals and population groups, with a view to improving the overall health of populations."

# KEY QUESTIONS

In this RT, we sought articles to kick-start this new concept by posing the following questions.


# RT ARTICLES—BROAD CATEGORIES

The 18 papers in the RT addressed in main the first three questions, as well as the last question, and can be grouped into the following broad and non-exclusive categories:

Genomics, newborn screening, phenomics, or other "omics" (Molster et al., Newnham et al., Baynam et al., Jansen et al.).

Spatial or GIS (Campbell and Ballas, Weeramanthri and Woodgate).

Data, analytics, and informatics (Brown et al., Lwin et al., Mann et al., Spilsbury et al., Gunnell et al., Xiao et al., Bellgard et al., Troeung et al., Preen et al., Dolley).

Case studies in infectious diseases (Inglis and Urosevic, Newnham et al.).

Case studies in cancer prevention, screening, and survival (Gunnell et al., Girschik et al., Troeung et al., Preen et al.).

Population vulnerability, equity, and targeted public health policy (Campbell and Ballas, Weeramanthri and Woodgate, Molster et al., Xiao et al., Newnham et al., Girschik et al., Jansen et al., Troeung et al.).

Ethics and privacy (Brown et al., Molster et al., Jansen et al.). Surveillance and screening (Lwin et al., Inglis and Urosevic, Molster et al., Jansen et al., Troeung et al., Preen et al.).

Social media, mobiles, community participation, and crowdsourcing (Lwin et al., Girschik et al.).

# RT ARTICLES—SPECIFIC POINTS

Newborn screening can be viewed as an archetypal PPH technology. Despite being introduced more than 50 years ago, Jansen et al. demonstrate there are many unanswered questions around evidence, affordability, policy, and the introduction of new tests as technology improves. Molster et al. show that consideration of preconception carrier screening needs careful balancing of potential harms against benefits.

<sup>2</sup>https://precisionmedicine.ucsf.edu/programs/precision-population-health/ summit (Accessed: April 18, 2018).

<sup>3</sup>https://tinyurl.com/yddwgsnq (Accessed: April 18, 2018).

Girschik et al. synthesize data, academic literature, and expert opinion into an explicit and precise process for setting cancer prevention priorities.

Lwin et al. show us how to apply new mobile technologies and crowdsourcing, to produce real-time surveillance data for influenza tracking.

Campbell and Ballas and Xiao et al. use complex spatial and other analytic methods to unlock administrative datasets to identify inequity and drive progressive policy.

Gunnell et al. show the value of linking administrative data to well-designed, longitudinal cohort studies, to derive precise measures of physical activity and mortality in cancer survivors.

Preen et al. and Troeung et al. examine colonoscopy data from administrative datasets to predict risk of colon cancer and target policy to particular age groups.

Inglis and Urosevic look at diagnostic and surveillance challenges of antimicrobial resistance in detail, and remind us of the need for validation of tools and tests, and the steps and pitfalls on the route from cell to bench to person to population.

Dolley and Mann et al. test the claims of "Big Data" enthusiasts, and offer alternatives.

The ethical implications of the new precision technologies for consent and privacy are addressed by Brown et al. in their article on data linkage.

Two papers test the value of PPH as a policy framework. Newnham et al. comprehensively examine the biological and social factors behind preterm birth, including evidence-based research in various "-omics" fields, so as to construct multilevel preventive policy. Baynam et al. sees 3-D facial analysis as a "prototypical precision public health tool" and show how phenotype complements genotype, and links to a traditional public health policy wheel.

Weeramanthri and Woodgate outline a set of recommendations to improve uptake and use of spatial data in the health sector, which could be applied to precision technologies in general. Their recommendations include communication of strong case studies, linkage of spatial data to patient pathways, formal cost-effectiveness analysis of the value added by technology, and training, capacity, and new stakeholder partnerships.

### CONCLUSION AND FUTURE STEPS

Precision public health is a rapidly evolving field.

Any notion of precision must begin with an attention to precise and unambiguous language, which not only underpins

### REFERENCES


definitional, measurement, and classification issues but also aids clear communication with the public and professional groups.

When we look at our original RT proposal, and compare the definition of PPH offered there, to the material in the papers that were submitted and accepted, it is clear that "data and informatics" needs to be front and central in any future consensus definition. It is the combination of data-related skills and technologies (e.g., in epidemiology, data linkage, informatics, and communications) and the ability to aggregate, analyze, visualize, and make available high quality data, larger or linked, in closer to real time, that is at the heart of PPH, much like epidemiology is at the heart of traditional public health.

Another challenge is to build on the work presented in this RT, which mainly comes from countries with developed economies (Australia, US, UK, Singapore), and explore how the concept can be applied in all countries, with varying levels of resources and health investment, struggling to provide universal health coverage.

To this end, the RT editors and others are organizing a Precision Public Health Asia Symposium4 to be held in October 2018, to further work on a consensus definition, to explore in more detail the ethical and social implications of the concept, and as a launchpad for further collaboration in the region.

This group of RT articles specifically reinforces the importance of embedding old and new technologies within explicit policy frameworks, whether traditional policy cycles or newer frameworks derived from systems biology or complexity theory (Inglis and Urosevic, Bellgard et al.). Such planning is central to operationalizing PPH, which sits at the nexus of precision medicine and public health, moving us from an "*n* of 1" (precision medicine) to an "*n* of many" (precision public health). It is a fundamental choice—new technologies leading by chance to more precise diagnoses and treatments for some fortunate individuals, or planning for and designing a system that offers those same benefits across the population and with a shorter lag time to those most in need.

### AUTHOR CONTRIBUTIONS

TW drafted the Editorial, and all other authors revised and approved the final version.

4www.pph2018.com (Accessed: April 18, 2018).

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2018 Weeramanthri, Dawkins, Baynam, Bellgard, Gudes and Semmens. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

<sup>5.</sup> Dowell S, Blazes D, Desmond-Hellmann S. Four steps to precision public health. *Nature* (2016) 540:189–91. doi:10.1038/540189a

# 3-Dimensional Facial Analysis— Facing Precision Public Health

*Gareth Baynam1,2,3,4,5,6,7\*, Alicia Bauskis3 , Nicholas Pachter1,4,8, Lyn Schofield1,9, Hedwig Verhoef10, Richard L. Palmer11, Stefanie Kung11, Petra Helmholz11, Michael Ridout11, Caroline E. Walker3 , Anne Hawkins1 , Jack Goldblatt1,4, Tarun S. Weeramanthri12, Hugh J. S. Dawkins3,8,9,13 and Caron M. Molster <sup>3</sup>*

*1Genetic Services of Western Australia, Department of Health, Government of Western Australia, Perth, WA, Australia, 2Western Australian Register of Developmental Anomalies, Perth, WA, Australia, 3Office of Population Health Genomics, Public Health Division, Department of Health, Government of Western Australia, Perth, WA, Australia, 4School of Paediatrics and Child Health, University of Western Australia, Perth, WA, Australia, 5 Institute for Immunology and Infectious Diseases, Murdoch University, Perth, WA, Australia, 6 Telethon Kids Institute, Perth, WA, Australia, 7Spatial Sciences, Department of Science and Engineering, Curtin University, Perth, WA, Australia, 8School of Pathology and Laboratory Medicine, University of Western Australia, Perth, WA, Australia, 9Centre for Comparative Genomics, Murdoch University, Perth, WA, Australia, 10Cooperative Research Centre for Spatial Information, Perth, WA, Australia, 11School of Spatial Sciences, Curtin University, Perth, WA, Australia, 12Public Health Division, Department of Health, Government of Western Australia, Perth, WA, Australia, 13Centre for Population Health Research, Curtin Health Innovation Research Institute, Curtin University of Technology, Perth, WA, Australia*

### *Edited by:*

*Rumen Stefanov, Institute for Rare Diseases, Bulgaria*

### *Reviewed by:*

*Aida Mujkic´ , University of Zagreb, Croatia Vita Dolzan, University of Ljubljana, Slovenia*

*\*Correspondence:*

*Gareth Baynam gareth.baynam@health.wa.gov.au*

### *Specialty section:*

*This article was submitted to Public Health Policy, a section of the journal Frontiers in Public Health*

*Received: 08 November 2016 Accepted: 14 February 2017 Published: 10 April 2017*

### *Citation:*

*Baynam G, Bauskis A, Pachter N, Schofield L, Verhoef H, Palmer RL, Kung S, Helmholz P, Ridout M, Walker CE, Hawkins A, Goldblatt J, Weeramanthri TS, Dawkins HJS and Molster CM (2017) 3-Dimensional Facial Analysis—Facing Precision Public Health. Front. Public Health 5:31. doi: 10.3389/fpubh.2017.00031*

Precision public health is a new field driven by technological advances that enable more precise descriptions and analyses of individuals and population groups, with a view to improving the overall health of populations. This promises to lead to more precise clinical and public health practices, across the continuum of prevention, screening, diagnosis, and treatment. A phenotype is the set of observable characteristics of an individual resulting from the interaction of a genotype with the environment. Precision (deep) phenotyping applies innovative technologies to exhaustively and more precisely examine the discrete components of a phenotype and goes beyond the information usually included in medical charts. This form of phenotyping is a critical component of more precise diagnostic capability and 3-dimensional facial analysis (3DFA) is a key technological enabler in this domain. In this paper, we examine the potential of 3DFA as a public health tool, by viewing it against the 10 essential public health services of the "public health wheel," developed by the US Centers for Disease Control. This provides an illustrative framework to gage current and emergent applications of genomic technologies for implementing precision public health.

Keywords: public health, 3D facial scan, rare diseases, spatial information, genomics and genetics, developmental disabilities

# INTRODUCTION

Rare diseases (RD) are increasingly recognized nationally (1) and globally as a public health priority (2, 3). While individually, RD have a low prevalence, it is estimated that the combined prevalence is between 6 and 8% of the population (2, 4). Most RD have a genetic association and are often severely debilitating, impair physical and mental abilities, and shorten life expectancy (5). These characteristics present clinical and public health challenges. These include the need for early and accurate diagnosis and for identifying emerging technologies to enhance the delivery of clinical and public health practices for affected individuals (1).

The RD community has collectively nominated timely accurate diagnosis and earlier intervention with improved therapeutic options as key issues (6). This context and this challenge also provide opportunities for innovation and creating new knowledge. One such opportunity for improved diagnosis and treatment is through the clarity that can be achieved with detailed analysis and representation of the phenotype of genetic and rare disorders. Broadly, a phenotype is the set of observable characteristics of an individual resulting from the interaction of a genotype with the environment; in medicine, it is used to describe some deviation from normal morphology, physiology, or behavior. Greater phenotypic clarity is being advanced through imaging, the use of standards for phenotypic description, and their combination. This "precision" or "deep" phenotyping affords medicine and science a unique opportunity to generate biological insights.

An emerging deep phenotyping application is 3-dimensional facial analysis (3DFA). In the RD domain, 3DFA has been investigated and is increasingly being implemented, primarily for diagnostic purposes (7–9). 3DFA is also being applied to monitor existing and novel therapies, an area in which it has a nascent role (10, 11). 3DFA involves the investigation of deeply precise 3D facial data that can be acquired with various facial imaging technologies and applied to deliver scientific insights. The technological innovations enabling 3DFA include advances in imaging hardware, analytical techniques, and the combination with other, e.g., text-based, advances.

Approaches such as 3DFA, and other forms of deep phenotyping, mean that RD are providing a fruitful domain for precision approaches to medicine and public health. This is highlighted by a series of targeted precision initiatives in multiple countries, including in the United States, programs based at the National Institutes of Health at the Centers for Mendelian Genetics and the Undiagnosed Diseases Program and Network (12, 13); in Japan, via its Agency for Medical Research and Development under its rare and intractable diseases pillar; and in Western Australia (WA), through a coordinated suite of initiatives being implemented under the *WA Rare Diseases Strategic Framework 2015–2018* (1, 9).

### PRECISION TECHNOLOGIES IN PUBLIC HEALTH

Precision public health has been defined as "the application and combination of new and existing technologies, which more precisely describe and analyze individuals and their environment over the life course, in order to tailor preventive interventions for at-risk groups and improve the overall health of the population." Thus, precision public health complements and extends precision medicine's focus by recognizing that precise interventions are needed at both the individual and population levels.

Herein, we outline the state of play of current and emergent 3DFA applications, specifically within a precision public health paradigm, and using congenital and rare disorders as an exemplar. As an illustrative framework, we use the 10 essential public health services of "the public health wheel" (14). This framework operationalizes the three core functions of public health, namely, assessment, policy-making, and assurance.

### MONITOR HEALTH STATUS TO IDENTIFY AND SOLVE COMMUNITY HEALTH PROBLEMS

Congenital anomalies are an important class of mainly RD accounting for 12–15% of people with RD and are also known as birth defects (15). The causes of these conditions can be divided into genetic (e.g., monogenic disorders), multifactorial (e.g., cleft palate), and environmental exposures [e.g., fetal alcohol syndrome (FAS)]. Congenital anomalies accounted for 732,000 disabilityadjusted life years lost, in 2010, in Western Europe alone (16).

A considerable proportion of congenital anomalies are associated with facial dysmorphology (17), either through the presence of congenital anomalies in known syndromes with welldocumented facial dysmorphology (e.g., cardiac anomalies in Noonan syndrome), or in the recurrent co-coding of individual congenital anomalies and facial dysmorphism in individuals (17). Furthermore, hundreds of disorders (18), which are collectively and variably described as "dysmorphic syndromes" or "developmental disorders," have characteristic facies. In these instances, 3DFA has potential to contribute to the improved speed and accuracy of diagnosis for a sizeable proportion of the general population. This will contribute to more accurate epidemiological data, including more precise estimates of the incidence, prevalence, and burden of congenital disorders.

### DIAGNOSE AND INVESTIGATE HEALTH PROBLEMS AND HEALTH HAZARDS IN THE COMMUNITY

3-Dimensional facial analysis is being developed for deeply precise diagnostic applications across a broad range of typically rare conditions with well-established facial dysmorphic patterns (8), see **Figures 1** and **2**. Additionally, it is increasingly and objectively unlocking hitherto undetected, or underappreciated,

FIGURE 1 | Some functionality of 3DFA. The purple line around the facial periphery demonstrates a cropping and facial segmentation tool. White dots are automated land marking. The vertical purple line demonstrates an application of the measuring tool, in this case showing a 17.1 mm philtral length.

facial diagnostic signatures (7). For example, speech delay is common in rare conditions, and in one study of kindergarten children, approximately 7% had language-specific impairments (19). A potential 3DFA application is using facial signatures as early predictors of language delay, either in those from the general population or in those at high familial risk, e.g., siblings of children with autism. The presence of a group of rare disorders, characterized by severe speech impairment and with overlapping facial features, collectively called Angelman-like syndromes (20), supports the possibility of using facial signatures to predict speech delay. There are also numerous other rare disorders that are associated with variable degrees of speech delay, e.g., Cornelia de Lange syndrome and biologically related disorders (21), that have characteristic, and overlapping facial phenotypes. It is likely that other children, with or without known syndromes, will have facial signatures that are indicative of speech delay that may offer a novel way for early screening for language delay to target early intervention.

# INFORM, EDUCATE, AND EMPOWER PEOPLE ABOUT HEALTH ISSUES

We all have a face. It is our unique expression of who we are, it reflects our life experiences and communicates our emotions to the world. From birth, our faces are a window to our being and our portal of interaction with our world. Our faces speak of the community from whence we came, and of the communities to which we belong, the ultimate expression of our connection as individuals. Our face is a canvas for the arts, a window for education, a living record of the diversity of the environment and our origins. Our face is also a biological billboard that advertises our physical and mental wellness, our aging, and our disease. We commonly say, "you look ill," "you look well," "you look in pain," and we can, for instance, readily recognize a child with Down's syndrome by their facial features. Objectively documenting and harnessing these facial clues that underlie common parlance and innate recognition capacities, can be used to inform, educate, and empower people for health.

A person who has a 3D image taken of their face can almost immediately see the computer-generated image. This recognizable and relatable image enables patients and their families to gain a new perspective of their health, or the health of a relative. Within WA, the technology has recently been used with primary school students who participated in a project to support equitable innovation for Aboriginal health. As the parameters of normal facial contours vary with ethnicity, it is important to compile reference scans for different ethnic groups. The children involved in this project were delighted to be able to view and manipulate their 3D facial images. "They especially loved being able to turn their faces upside down to look up their noses!" (22).

# MOBILIZE COMMUNITY PARTNERSHIPS AND ACTION TO IDENTIFY AND SOLVE HEALTH PROBLEMS

Projects focusing on the delivery of novel ways to diagnose and monitor rare disorders have been undertaken with the key support of patient advocacy organizations in Australia (e.g., Rare Voices Australia, Fabry Australia, Mucopolysachharide and Related Diseases Society Australia, Short Statured People's Association of Australia) and internationally (e.g., International MPS Network, Costello Syndrome Family Association, CFC International). Similarly, projects focused on equitable health innovation to address RD have been developed and delivered in partnership with Aboriginal leaders, health workers, and communities. These projects have been important to the development of interest in and application of 3DFA.

The desire to address RD, together with the multifarious and crosscutting aspects of faces described above, has provided a unique vehicle for mobilizing community partnerships to engage in the identification and solution of health problems. A 2015 event organized as part of an ongoing platform for harmonizing translational research across premier hospitals, research institutes, government, and the community is an illustrative case (23). The "Faces of WA" event coalesced cross-sector interest in 3DFA applications from across science, arts, research, education, and data analytics communities.

# DEVELOP POLICIES AND PLANS THAT SUPPORT INDIVIDUAL AND COMMUNITY HEALTH EFFORTS

In 2015, WA released the state-wide *WA Rare Diseases Strategic Framework 2015–2018* (1), which includes, but is not limited, to the following objectives: build on existing services for RD diagnosis and screening; identify emerging technologies to enhance the delivery of health care for RD, including to rural and remote areas; engage with people living with RD, their carers and families; promote active participation of people living with RD with their health care; build epidemiology and health system evidence for RD by improving diagnosis and disease classification; and strengthen clinical and translational research in RD. The implementation of 3DFA contributes to most of these objectives.

An example of a disease-specific policy in WA is the *Fetal Alcohol Spectrum Disorders (FASD) Model of Care* (24). FASD has been prioritized as a public health issue in Australia and other countries, and a cohesive, multilevel, and community-focused suite of approaches is required to address this preventable disorder group. The *FASD Model of Care* identifies health care and public health prevention strategies as the most important means of reducing FASD. The implementation and use of 3DFA has the potential to address several of these recommendations, including but not limited to screening and early diagnosis. FAS is part of the of FASD continuum for which characteristic facial features are obligate for diagnosis. These facial characteristics are known to vary by ethnicity (25, 26), and there is a paucity of data on the Australian Aboriginal population. A pragmatic, but potentially imprecise, approach using African-American facial standards has been implemented for FAS diagnosis. Should these be unfit for purpose, epidemiological, and diagnostic data may be inaccurate with implications for targeted health and prevention strategies. Potentially, the ethnic variation of Aboriginal facial features could be more objectively addressed with the precision of currently available 3D approaches.

# ENFORCE LAWS AND REGULATIONS THAT PROTECT HEALTH AND ENSURE SAFETY

Clinicians and public health practitioners advocate for, review, evaluate, revise, educate, and enforce compliance with laws and regulations. This must include new and existing laws and regulations related to the use of genomic and other technologies, and the information generated from their use (27).

The genetic and genomic information about individuals, and that obtained from 3DFA, forms part of and expands the individual's health information. It may also provide information about the individual's relatives, which may be of interest to them, especially where prevention or treatment is available. Health professionals may inform the individual of how the genetic information relates to their relatives; however, confidentiality requirements prevent the disclosure of this information to the relatives, without the consent of the individual, except in specific circumstances outlined in relevant legislation and regulations. Internationally and variably, legislation has been enacted to regulate the collection, use, and disclosure of health information. In Australia, the 2006 amendments to *the Privacy Act 1988* allow for health practitioners to use or disclose an individual's genetic information, without their consent, where there is a reasonable belief that doing so is necessary to lessen or prevent a serious threat to the life, health, or safety of their relatives. Irrespective of jurisdictional differences in legislation and its implementation, the principle remains that new phenotypic technologies that reveal indications of familial disease may have implications for life insurance, employment, and reproductive choices, so they need to conform with legislated codes for privacy protection, disclosure, and data sharing.

# LINK PEOPLE TO NEEDED PERSONAL HEALTH SERVICES AND ASSURE THE PROVISION OF HEALTH CARE WHEN OTHERWISE UNAVAILABLE

Linking people to needed services includes developing mechanisms to assure the provision of such services to marginalized and underserved populations. In relation to RD, three such population groups in WA are people living with long-standing undiagnosed conditions, those living in rural and remote areas, and Aboriginal Australians. Improved diagnostic services utilizing new genomic technologies and 3DFA are enabling more equitable access to services for these populations.

At the level of state-wide clinical practice, 3DFA has been implemented in services that improve population access to RD diagnostics. The Rare and Undiagnosed Diseases Diagnostic service at Genetic Services of WA (9) integrates genomic diagnostics into a state-wide clinical service that includes outreach clinics. The Undiagnosed Diseases Program Western Australia is a cross-disciplinary service provided within the local children's hospital but also accessible to children across the state. These complementary programs aim to find diagnoses for those with long-standing, undiagnosed conditions.

Given the (increasing) transportability and reducing cost of 3D facial imaging systems, 3DFA is being implemented in remote outreach clinics by initially using a model of periodic deployment of a portable camera. Permanent placement of scanners in key regional locations is planned for the future. This is to facilitate point-of-care diagnostics, treatment, and monitoring and to enhance referral, i.e., by pairing submission of 3D images with text-based referrals and consultation processes.

### ASSURE COMPETENT PUBLIC AND PERSONAL HEALTH-CARE WORKFORCE

Facial gestalt is key to diagnosis for numerous genetic conditions (e.g., Velocardiofacial syndrome, Williams syndrome, Noonan syndrome) and non-genetic conditions [e.g., fetal valproate syndrome (FAS)]. Through the creation of tools that objectively determine facial patterns and unlock knowledge, diagnostic ability and workforce competency can be improved. The increasingly transportable nature of the approach also suits capacity building in remote regions; training in 3DFA could contribute to workforce development. Coupled with the non-invasive nature of 3DFA, this increasing portability, and the very nature of faces, it also provides a unique opportunity to engage with this new technology as a bridge to deeper engagement with other (e.g., genomic) technologies.

# EVALUATE EFFECTIVENESS, ACCESSIBILITY, AND QUALITY OF PERSONAL AND POPULATION-BASED HEALTH SERVICES

The value of clarifying a diagnosis is undeniable (28). Improved diagnostic certainty through the precision of 3DFA provides novel opportunities to evaluate health care, for instance, in assessing the diagnostic programs for rare genetic diseases through direct and objective comparison of facial phenotypic and molecular diagnostic approaches. Additionally, by improving certainty of the diagnosis of RD, one could more accurately assess interventions targeted to the reduction of the burden of these conditions. Given the marked disparity between the proportion of the population with RD and their combined health system costs, supporting the need for early diagnosis and intervention has the potential to drive cost savings across the health system (29).

3-Dimensional facial analysis is being used to monitor the effectiveness of drug therapy, which provides new avenues to assess drug response for both localized facial anomaly (10) and systemic disease (11). An example of 3DFA's use to monitor drug response was an application in mucopolysaccharidosis type I (MPS I). This lysosomal storage disease is caused by the body's inability to produce a specific enzyme, it expresses a pattern of progressive facial dysmorphology and it is treatable by drug therapy. 3DFA was used to monitor a child with MPS I undergoing treatment to demonstrate that the rate at which facial dysmorphology was advancing was reduced (11). 3DFA has also been used to monitor a child receiving a treatment (rapamycin) for a craniofacial anomaly. This child had extensive facial malformations and monitoring of their facial features using 3DFA showed a progressive improvement in their condition (10). In addition to observing treatment and disease progression in a standard clinical setting, 3DFA may also have a monitoring role in clinical trials.

# RESEARCH FOR NEW INSIGHTS AND INNOVATIVE SOLUTIONS TO HEALTH PROBLEMS

Rare diseases are a hot bed for technological innovation and recurrently discoveries in RD have delivered innovations for common diseases (30). While 3DFA is yielding translational insights into innovations for diagnosis, treatment, and monitoring in the RD domain, it is also particularly suited to examining the overlap between rare and more common diseases. Notably, population level studies demonstrated that common genetic variations (polymorphisms) were associated with discrete patterns of facial variation. Notably, these facial signatures recapitulated the characteristic facies of the respective genetic syndrome due to rare genetic variation (pathogenic mutations). This highlights further evidence of the overlap between common and rare phenotypes with implications for possible reciprocal (rare-common) insights.

An example of a common disease that is poised for 3D facial translational research is obstructive sleep apnea (OSA). Through determining the facial signatures of OSA, 3DFA can be used as a complementary tool for OSA screening and classification. Again reflecting the potential for joint insights into rare and common diseases, OSA is a condition seen in RD, where it regularly has an earlier onset than in the general population (e.g., MPS syndromes).

A further promising area is face-to-text conversion. Conversion of a 3D facial image to standardized text-based descriptive terms known as human phenotype ontology (HPO) is an ideal way to achieve this. These standardized terms can be used computationally (i.e., are machine readable) and can then be used for report generation and for integration with text-based diagnostics. Face-to-text conversion has been performed for a limited subset of facial HPO terms (31). It needs to be extended to the full set and be further validated by human experts.

# CONCLUSION

3-Dimensional facial analysis is a prototypical precision public health tool that delivers non-invasive, non-irradiating, transportable, and community engaging deep phenotyping. It enables multisector applications that can be increasingly implemented across the spectrum of public health. It can be applied to individuals as well as for single RD. Finally, insights generated in RD could be investigated in more common diseases.

# ETHICS STATEMENT

The parents/legal guardians have consented to publish the image.

# AUTHOR CONTRIBUTIONS

GB, HD, CM, and AB contributed equally to the coordination of content and drafting the manuscript. GB, NP, JG, and AH contributed expert content to the clinical genomics components. TW, HD, CM, CW, and AB contributed to the expert content on public health genomics. HV, LS, RP, SK, PH, and MR contributed to the informatics and 3-dimensional facial scanning content. GB, HD, JG, TW, and CW are chief investigators of the Australian National Health and Medical Research Council APP1055319, and LS is employed on this grant.

# FUNDING

The authors acknowledge their involvement in the International Rare Disease Research Consortium (IRDiRC) and the support from The Western Australian Government Department of Health as part of its commitment to the goals of the IRDiRC. The authors gratefully acknowledge the combined financial support-in-part from the RD-Connect-European Union Seventh Framework Programme (HEALTH. 2012.2. 1.1-1-C) under grant agreement number 305444; RD Connect: an integrated platform connecting databases, registries, biobanks, and clinical bioinformatics for rare disease research; and from the Australian National Health and Medical Research Council APP1055319 under the NHMRC– European Union Collaborative Research Grants scheme. GB acknowledges the WA Department of Health Raine Clinical Research Fellowship.

### REFERENCES


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2017 Baynam, Bauskis, Pachter, Schofield, Verhoef, Palmer, Kung, Helmholz, Ridout, Walker, Hawkins, Goldblatt, Weeramanthri, Dawkins and Molster. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# FluMob: Enabling Surveillance of Acute Respiratory Infections in Health-care Workers *via* Mobile Phones

*May Oo Lwin1 \*, Chee Fu Yung 2 , Peiling Yap3 , Karthikayen Jayasundar1 , Anita Sheldenkar1 , Kosala Subasinghe1 , Schubert Foo1 , Udeepa Gayantha Jayasinghe4 , Huarong Xu3 , Siaw Ching Chai3 , Ashwin Kurlye4 , Jie Chen2 and Brenda Sze Peng Ang3*

*1Wee Kim Wee School of Communication and Information, Nanyang Technological University (NTU), Singapore, Singapore, 2KK Women's and Children's Hospital (KKH), Singapore, Singapore, 3 Tan Tock Seng Hospital (TTSH), Singapore, Singapore, <sup>4</sup> Institute of Media Innovation (IMI), Singapore, Singapore*

### *Edited by:*

*Tarun Stephen Weeramanthri, Government of Western Australia Department of Health, Australia*

### *Reviewed by:*

*Annette Regan, Curtin University, Australia Peter Gregory Markey, Northern Territory Department of Health, Australia*

> *\*Correspondence: May Oo Lwin tmaylwin@ntu.edu.sg*

### *Specialty section:*

*This article was submitted to Public Health Policy, a section of the journal Frontiers in Public Health*

*Received: 18 October 2016 Accepted: 28 February 2017 Published: 17 March 2017*

### *Citation:*

*Lwin MO, Yung CF, Yap P, Jayasundar K, Sheldenkar A, Subasinghe K, Foo S, Jayasinghe UG, Xu H, Chai SC, Kurlye A, Chen J and Ang BSP (2017) FluMob: Enabling Surveillance of Acute Respiratory Infections in Health-care Workers via Mobile Phones. Front. Public Health 5:49. doi: 10.3389/fpubh.2017.00049*

Singapore is a hotspot for emerging infectious diseases and faces a constant risk of pandemic outbreaks as a major travel and health hub for Southeast Asia. With an increasing penetration of smart phone usage in this region, Singapore's pandemic preparedness framework can be strengthened by applying a mobile-based approach to health surveillance and control, and improving upon existing ideas by addressing gaps, such as a lack of health communication. FluMob is a digitally integrated syndromic surveillance system designed to assist health authorities in obtaining real-time epidemiological and surveillance data from health-care workers (HCWs) within Singapore, by allowing them to report influenza incidence using smartphones. The system, integrating a fully responsive web-based interface and a mobile interface, is made available to HCW using various types of mobile devices and web browsers. Real-time data generated from FluMob will be complementary to current health-care- and laboratory-based systems. This paper describes the development of FluMob, as well as challenges faced in the creation of the system.

Keywords: mobile-health, influenza, mobile phones, application, health-care workers, surveillance

# INTRODUCTION

Seasonal influenza affects nearly 20–25% of the Singapore population (1). The all-cause mortality attributable to influenza stands at 14.8 per 100,000 person-years, making the burden comparable to other temperate countries (2). Globally, it is estimated that there were approximately 284,500 respiratory and cardiovascular deaths associated with the 2009 influenza pandemic (3). Due to Singapore's geographical location, pandemic threats from respiratory infectious diseases continue to persist, e.g., avian influenza A subtype viruses (H5N1 and H7N9) in Shanghai, China, and the Middle East respiratory syndrome coronavirus in the Middle East, in addition to seasonal influenza. The true impact of influenza often stretches beyond the viral illness itself and contributes to other disease burden by causing complications in patients with preexisting conditions (i.e., cardiovascular diseases or cardiopulmonary disease).

Economic modeling has recently demonstrated that the treatment-only strategy for influenza resulted in a mean number of 690 simulated deaths, 13,950 hospital days, an equivalent of 2.5 million workdays lost, and a mean economic cost of USD\$469.8 million per year (4). Southeast Asia is acknowledged as a hotspot for emerging infectious diseases (5), and Singapore—as a travel and health hub of the region—faces a constant risk of pandemic outbreaks. The 2003 severe acute respiratory syndrome outbreak proved to be a huge burden on Singapore's economy, costing US\$570 million and resulting in unprecedented rates of unemployment at 5.5% (6, 7). Existing and potential threats highlight the importance of having robust surveillance and health communication systems present, which can forewarn people, detect unusual signals and provide health education in an efficient and cost-effective manner.

Given the absence of an efficient surveillance system that addresses challenges within hospitals in Singapore, this paper reports the design and development of a prototype integrated mobile-health participatory influenza surveillance system entitled FluMob. Following a review of literature on information and communication technology (ICT) approaches to addressing influenza tracking and surveillance, we describe FluMob's architecture, followed briefly by the methodologies used to recruit and retain users. Finally, we present the challenges the research team faced in the various phases of the implementation of the intervention, and lessons learnt, which will be useful to public health researchers and practitioners involved in similar initiatives or interventions in the future.

# RELATED LITERATURE

Participatory epidemiology (PE) is a concept that has increasingly been used in health surveillance in recent years. It uses community involvement to improve the understanding and control of diseases and was most prominently brought to attention by work conducted in Africa investigating animal health from information gathered by local farmers (8).

With the proliferation of Internet and mobile phone usage, ICT has played a significant role in the development of PE for disease surveillance, health monitoring, and information sharing; enabling both individuals at the point of care and stakeholders such as health authorities and health providers to be directly linked to the communities they served. Platforms such as "*Outbreaks Near Me*" and "*Ushahidi*" have been effective in optimizing the collaboration between ICT and health surveillance (9). Communication through ICT such as mobile phone messaging has also been used to influence health behaviors by encouraging healthy eating and exercise (10), adhering to medication recommendations (11), and promoting the cessation of smoking (12). With the increase of mobile phone usage, health-care workers (HCWs) in developing countries are now able to effectively collect health data in a quick and economical way (13).

Collecting real-time surveillance data provide the foundation for any pandemic preparedness program, but current approaches continue to rely on traditional methods with minimal use of new technology or social engagement. For example, existing infrastructure for influenza surveillance and epidemiology are focused on health-care institutions providing clinical reports of acute respiratory infections as well as laboratorybased confirmed influenza cases (14). These methods usually rely on the symptomatic person visiting a health-care facility, and such systems can be made less efficient by poor healthseeking behavior and delays in disease notifications. Despite their strengths, the setup and maintenance of these systems can be costly, particularly in developing countries (13). During the 2009 H1N1 pandemic, public health bodies worldwide faced difficulties and delays in ramping up such traditional surveillance systems (15).

To address the limitations of routine surveillance systems during pandemic H1N1 in 2009, a number of countries such as the UK urgently developed Internet-based systems to be used by the public (16). These have shown good results and continue to be used for routine seasonal influenza. Other approaches have included the development and use of population web searches on influenza-related terms to help predict an outbreak of infectious disease (17). However, despite early acclaim during pandemic outbreaks, systems such as *Google Flu trends* have been shown to be too sensitive to media reports, resulting in difficult to control biases, particularly during normal influenza seasons (18, 19).

More recently, Lwin et al. (20) reported the application of the PE approach to the conceptual and technological development of a mobile-based crowd-surveillance application called Mo-Buzz for use by public health inspectors and the general public to address dengue outbreaks in Sri Lanka. Other similar initiatives have adopted this approach to bolster the public health management of asthma, and natural disasters such as earthquakes (9). While most of these efforts send health alerts or enable people to report disease experiences, they offer little by way of telling the user how exactly to prevent or protect oneself from the outbreak. Singapore's pandemic preparedness framework—confronted by a significant influenza burden and looming threat of emerging infectious diseases—can be strengthened by utilizing the mobile-based PE approach and improve upon existing ideas by addressing clear gaps (such as a lack of health communication).

The rapid development and innovation of new and affordable tablet devices, digital applications, and geographic information systems have become easily accessible to the Singaporean population, with nearly 90% smartphone penetration. Therefore, Singapore is best positioned to spearhead the development of this public health innovation in the region and to scientifically evaluate its impact on population groups at risk from influenza. These technologies can be integrated to design an innovative dynamic system where health authorities obtain real-time epidemiological and surveillance data from HCWs within Singapore who report disease incidence using smartphones.

The data generated from such a system with its significant time advantage could detect clusters of diseases and could be used as early warning signals for emerging influenza outbreaks within the hospital context, allowing public health authorities to initiate further investigations. The above literature emphasizes how real-time surveillance has become increasingly important in investigating infectious diseases such as influenza, which remains a social and economic burden. Given that smartphones are becoming more widespread in developing countries due to decreasing costs and increasing availability, pandemic preventative programs need to focus on integrating social media to streamline influenza surveillance, treatment, and health communication.

# DEVELOPMENT OF FluMob

### Technical Specifications

The FluMob system blends ubiquitous access to the Internet, and the simple portability of mobile phones to create a digitally integrated syndromic surveillance system. The system, integrating a fully responsive web-based interface and a mobile interface, is made available to HCWs using various types of mobile devices and web browsers. The ease and convenience in using application software on their mobile phones will allow users to provide reports of non-specific syndromes such as influenza-like illness (ILI) on a weekly basis. The near realtime data generated from the system will be complementary to current health-care- and laboratory-based systems in assisting with streamlining hospital outbreak response among HCWs and informing vaccine policy. **Figure 1** shows the overall system architecture of FluMob. The application supports two mediums of data input (web browsers and mobile phones) that are fed into a central server and are subsequently generated as reports to be analyzed.

The FluMob application consists of mobile operating systems (Android and iOS) and a responsive web portal. These applications are integrated into a central database using common web services. Central servers hold the business logics related to the FluMob application and the report analysis module. Once users are registered in the system, they have to log in with user identifications and passwords. There are no identified constraints in the application, and it is a simple, user-friendly process. All required data will be stored in an encrypted manner for security and confidentiality purposes.

# Operating Environment

The operating environment of FluMob can be divided into two components: *software environment (SE)* and *hardware environment (HE)*.

The SE is the collection of software required to operate the application, and those used in the FluMob application are Windows server 2008 R2, Apache/2.4.17, PHP Version 5.5.30, MySQL 5.6, Android studio, and xCode for iOS development.

The HE refers to the set of hardware required to deploy the application. The FluMob central server is configured with Core2 Intel Xeon Processor with four cores, 8 GB of random-access memory, and 500 GB of storage space. The main server supports any number of web clients. Based on the initial system prototype, more than 100 clients are expected, and the system was tested with 500 dummy clients. The system supported 100 concurrent users without any technical malfunctions. The maximum number of sever connections was restricted to 100 connections, which proved to be sufficient, as database servers will be configured to allow connection pooling. There are no specific security mechanisms added to the client application, but predefined private keys to communicate with central servers have been implemented.

# PARTICIPANT ENGAGEMENT

**Figure 2** shows the use case diagram for FluMob. New users are first required to register with the system to define their profiles. The login system provides functionality for users to view the FAQs associated with the system and allows them to make changes to their profile information and reset their passwords. At a predetermined schedule, users are notified to log into the system and carry out the routine survey. At any time, users can view all their past survey returns and changes over time. The accumulated survey results are analyzed and made available to the administrator of the system for further actions.

The FluMob system is being tested and used by consenting participants from Tan Tock Seng Hospital (TTSH) and KK Women's and Children's Hospital (KKH). TTSH has a Communicable Disease Centre and is the designated hospital to handle and manage outbreaks of novel diseases. KKH is a women's and children's hospital, with a large inpatient and outpatient pediatric patient workload. The research is being conducted using standard research practice and ethics guidelines. An optimal sample size of 278 was calculated for the study's statistical validation representing the health-care workforce using G\*Power analysis (21). However, factoring in attrition rates, the researchers aim to recruit 700 HCWs. Participants, who include clinical and non-clinical HCW across these two hospitals, are required to be no less than 21 years old, and own smartphones installed with either iOS or Android software. Hospital staff at all departments were invited to download the app *via* mass emails. Upon responding, users are given a link to the relevant software app store to download the free app. Once the app is loaded on the mobile phone, each user is first asked to register by filling a form capturing demographic, lifestyle details, and medical history. **Figure 3** shows the screenshots of the mobile application on a typical screen.

Clinical and social scientists from collaborating institutions developed and collated a range of questions to capture data relating to HCW demographics, lifestyle, influenza virus symptoms, and prevention. FluMob registration requires participants to fill in a form capturing demographic details (e.g., date of birth, sex, and ethnicity), workplace information (e.g., hospital name, job category, and department), information about family (e.g., how many people in different age groups), lifestyle behaviors (e.g., mode of transport to work and frequency of eating at food centers), medical history (e.g., vaccination records and disease profiles), as well as technology use and acceptance (e.g., usage of mobile phone, Internet, and mobile applications). The questions serve as a baseline for researchers to understand the lifestyle patterns and technology consumption among local HCWs. Descriptive analyses could potentially assist in the development of policies for disease monitoring and preventive measures. The data collected at registration can also be used for analytics at a

later stage to identify any potential relationship between demographics, lifestyle behaviors, medical history, and vulnerability to influenza.

Health-care workers are prompted to submit weekly health reports on whether they have ILI symptoms, a dichotomous "yes-or-no" question is first presented to the users to capture the presence of ILI symptoms after they have chosen their ward/ location of duty. If users answer "no," they will then receive a "thank you" note for submission and can immediately resume their daily work tasks or activities. Conversely, when users have declared having ILI symptoms, they will be asked to specify their symptoms from a list, which includes fever, cough, muscle/joint pain, vomiting, diarrhea, and others. After which, users will then need to provide further information regarding the illness, such as the date of onset and end of symptoms, body temperature, whether they have fever, medical services visited, medication taken as well as some medical leave-related questions. Finally, they will be asked to rate their health status on the day itself on a scale of 0–100.

This component was designed to enhance surveillance efforts with real-time information about ILI episodes among the clinical and non-clinical staff in both hospitals. The reports are submitted on mobile phones or web browsers to assist the research team in detecting potential influenza outbreaks within the hospital. Users are provided with incentives after submitting a certain amount of reports. As soon as a user has submitted the report, the information is stored in a data repository, which allows clinicians and researchers to gather real-time crowd-sourced information for

clinical analytics so as to inform strategies for disease surveillance, prevention, and management.

# PROGRESS AND STATUS

The Android version of the application was introduced to the health workers at TTSH and KKH in May 2016, and saw over 50 HCWs from TTSH signing up for the study within the first week. The iOS version was launched later in June 2016, and there are currently more than 200 iOS users who have installed the FluMob application. At this stage, the team has steadily recruited almost 700 participants. Of these, approximately 50% are regularly submitting weekly reports.

# CHALLENGES AND LEARNING EXPERIENCES

A number of challenges were faced in the development and implementation of the system. This section will look at the challenges faced, and how they were addressed and resolved by the team. The first trial was encountered during the development phase of the application. The most recent data available (22) show that the Android (i.e., Samsung S-series) software for mobile phones dominates the Singaporean market, holding 65.58% of the market share, whereas iOS (i.e., Apple iPhones) holds 27.24%.

Therefore, the technical expertise of the research team focused only on the development of Android-based applications and outsourced the development of the iOS version to an external development specialist. Due to the demands of the project and other unforeseen circumstances, the study was first launched only with the Android application, and interested IOS individuals had to be put on waiting list for more than a month. When the IOS version was finally released and individuals on the wait list were re-contacted, a lot of the initial interests had waned leading to only 75% of them being successfully recruited into the study.

To prevent the coding and programming issues described earlier, a platform where both Android and iOS mobile phone applications can be developed simultaneously can be considered in the future. A software called Appcelerator Titanium (23) can be used to create a full-featured iOS application using JavaScript and can automatically convert the JavaScript code into Objective-C code, which is a requirement of coding for iOS mobile applications. Creating the Android version of the same application is also simplified as the Titanium software will convert the JavaScript code into Java and create an application suitable for the Android Marketplace.

The second challenge pertained to the type and number of survey questions that were to be included in both the registration and the weekly reports sections of the FluMob application. The researchers were faced with the arduous task of filtering through numerous survey questions that effectively measured demographic variables (i.e., socioeconomic status, sex, and age) and overall health of the participant (i.e., smoking status). Sifting through previously published peer-reviewed literature took time, and numerous meetings were required to settle on the questions which were to be included.

This issue was resolved by meeting frequently, and by using scales that have been previously tested and established in their

efficacy at measuring ILI symptoms. The team also resolved differences in opinion in an objective, evidence-based manner, which allowed for more empirical formulation of survey questions. The question list was pilot tested on a small sample (*N* = 10) of participants from TTSH. This allowed for feedback to be collected and amendments made prior to the large-scale implementation of the application.

The final challenge arose in the form of inter-organizational and transdisciplinary research. The research team comprises of clinician scientists, social scientists, and research engineers, hailing from several different institutions; Nanyang Technological University, KKH, TTSH, National University of Singapore, and National Public Health Laboratory. **Figure 4** shows the flowchart visualizing the work flow involved in developing the FluMob application.

In **Figure 4**, the diamond-shaped boxes with numerical values describe the order of the process. As shown in the chart, the idea for the development of the application is the first step, after which grant writing and submission ensue. After approval, the team splits into two groups; the clinical/social science groups (2a) and the research engineering group (2b). After the development of the user interface of the application, the research engineer team should bring the application into its testing phase (3). However, frequent revisions to the application pertaining to both the design and the survey questions were made by the clinical/social science team. This resulted in multiple phases of component design and testing (4), which inherently delayed the implementation of the application (5).

The research team resolved the issue of constant iterations of the survey by completing full scale testing within 1 week and freezing any changes that could be made to the application a week prior to launch. The final version of the survey was fully agreed upon by both clinical and social scientists and allowed for a measurement of the full spectrum of variables that permitted all the research hypotheses to be tested effectively. The nature of having experts of varied specializations gave project a larger research scope, limited to not just social science or clinical science. This is an example of how transdisciplinary research can be both an advantage and a disadvantage to the implementation of such a research project.

# DISCUSSION AND FUTURE DEVELOPMENT

The completion of the study period will see detailed data analysis, which includes an analysis of the weekly reports and cases identified for follow-up. The registration questions will serve as a baseline for researchers to understand the lifestyle patterns and technology consumption among local HCWs. Descriptive analyses will also yield valuable data and could potentially assist in the development of policies for disease monitoring and preventive measures. The data collected at registration can also be used for analytics at a later stage to identify any potential relationship between demographics, lifestyle behaviors, medical history, and vulnerability to influenza.

At the next stage, our plan is to incorporate health education messaging and communication. The present system allows for users to select the option to enable or disable notifications and avoids broadcasting of messages, instead electing to personalize reminder messages for each user. The research team wants to build on this and is considering including, in a subsequent version of FluMob, a health education messaging service that will send out health educational messages to users when they report having flu-like symptoms. For example, if a user were to report fever as a symptom, a notification would be sent to the user to encourage them to wear a mask, avoid contact with others, or to see a doctor. Two areas of academic inquiry are being considered by the research team; the first tests the efficacy of more tailored messages, and the second studies the effects of various modalities of communicating health messages.

The FluMob study is currently under deployment with participants in both hospitals where data are being collated, the results of which will be analyzed in the near term future. At of the time of writing, recruitment numbers are still increasing, and weekly influenza reports from HCWs are being steadily submitted. The research team is presently building upon the knowledge gained to create a novel integrated syndromic surveillance system for general public use, which they hope will further address the gaps in disease prevention on a wider national and regional scale, and streamline influenza surveillance to reduce the burden of emerging infectious diseases.

# ETHICS STATEMENT

This study was carried out in accordance with the recommendations of DSRB, National University of Singapore with

### REFERENCES


written informed consent from all subjects. All subjects gave written informed consent in accordance with the Declaration of Helsinki. The protocol was approved by the National University of Singapore.

# AUTHOR CONTRIBUTIONS

ML, BA, CFY, PY, and SF were involved in the conceptualization of the paper and the overall editing. KJ, AS, and KS wrote the main sections of the paper. HX and CJ were involved in data collection in their respective hospitals. UJ, AK, and KS were involved in the technical development of the application. SC was the overall coordinator for the project.

# ACKNOWLEDGMENTS

The authors would like to acknowledge the contribution of larger team members: Vincent Chow from the National University of Singapore, Raymond Lin and Cui Lin who are involved in the laboratory work at the NPHL (National Public Health Laboratory) as well as Gentatsu Lim in research assistance during the early parts of the project.

# FUNDING

This research was supported by the Singapore Ministry of Health's National Medical Research Council under its Communicable Diseases—Public Health Research Grant (CDPHRG13NOV020).


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2017 Lwin, Yung, Yap, Jayasundar, Sheldenkar, Subasinghe, Foo, Jayasinghe, Xu, Chai, Kurlye, Chen and Ang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Optimizing Patient Risk Stratification for Colonoscopy Screening and Surveillance of Colorectal Cancer: The Role for Linked Data

*David B. Preen1 \*, Iris Lansdorp-Vogelaar <sup>2</sup> , Hooi C. Ee3 , Cameron Platell <sup>4</sup> , Dayna R. Cenin1,2, Lakkhina Troeung1 , Max Bulsara1,5 and Peter O'Leary <sup>6</sup>*

*1Centre for Health Services Research, School of Population and Global Health, The University of Western Australia, Perth, WA, Australia, 2Department of Public Health, Erasmus University Medical Centre, Rotterdam, Netherlands, 3Department of Gastroenterology, Sir Charles Gairdner Hospital, Nedlands, WA, Australia, 4Colorectal Cancer Research Unit, The University of Western Australia, Perth, WA, Australia, 5 Institute for Health Research, University of Notre Dame, Fremantle, WA, Australia, 6 Faculty of Health Sciences, Curtin University, Perth, WA, Australia*

Keywords: colorectal cancer, colonoscopy, screening, clinical guidelines, adenoma, risk stratification

### *Edited by:*

*Matthew Bellgard, Murdoch University, Australia*

### *Reviewed by:*

*Jim Codde, University of Notre Dame Australia, Australia Michael Black, Pathwest Laboratory Medicine, Australia*

### *\*Correspondence:*

*David B. Preen david.preen@uwa.edu.au*

### *Specialty section:*

*This article was submitted to Public Health Policy, a section of the journal Frontiers in Public Health*

*Received: 31 January 2017 Accepted: 18 August 2017 Published: 08 September 2017*

### *Citation:*

*Preen DB, Lansdorp-Vogelaar I, Ee HC, Platell C, Cenin DR, Troeung L, Bulsara M and O'Leary P (2017) Optimizing Patient Risk Stratification for Colonoscopy Screening and Surveillance of Colorectal Cancer: The Role for Linked Data. Front. Public Health 5:234. doi: 10.3389/fpubh.2017.00234*

INTRODUCTION

Colorectal cancer (CRC) is the third most common cancer worldwide, with an estimated 1.4 million new cases and almost 700,000 related deaths globally each year (1). In Australia, CRC is the second most commonly reported cancer and second most common cause of cancer-related death (2). Moreover, Australia has the fourth highest incidence of CRC for men and fifth highest for women internationally (3, 4). Incidence rates of CRC have at least doubled in many countries since the mid-1970s (5–7), although trends vary across countries with stabilizing or declining rates in more recent years reported in Western Europe and the United States (US), respectively. This trend is reversed for high-income nations that have recently made the transition from low-income economies (8, 9).

In the majority of cases, CRC develops from non-malignant precursor adenomatous colonic polyps (adenomas) (10), with the overall adenoma burden dependent on the number, size, villosity, dysplasia grade, and location of adenomas in the colon. Importantly, the average interval from adenoma appearance to development of CRC is >10 years (11), and the removal of adenomas reduces CRC incidence and mortality (12, 13). This affords an excellent opportunity for early detection through screening and regular colonoscopic surveillance, and the condition meets the World Health Organization criteria for diseases suited to screening (14). Patients with prior adenoma are therefore recommended to undergo regular surveillance colonoscopy (15). Increased surveillance, in addition to advances in surgical and adjuvant therapy (16), has been shown to reduce CRC incidence and increase median 5-year survival for CRC from 55.0% in the early 1980s to 65.3% by 2005 (16).

Lifetime prevalence of adenoma is 40–50% (17), however, the majority of adenomas never develop into malignant neoplasms and only 4–5% of the population eventually develop CRC (18). Consequently, simply identifying the presence of adenomas does not represent the most efficient approach for making informed recommendations for the need and timing of follow-up colonoscopic surveillance and the overall adenoma burden and specific adenoma characteristics should be factored into clinical decision making (12, 13).

# USE OF COLONOSCOPY FOR CRC DETECTION

Although some population-based screening programs exist employing fecal occult blood testing (FOBT), colonoscopy remains the "gold-standard" for detection of CRC and precursor adenomas (19). However, others have suggested that colonoscopy is overused as a primary screening and surveillance tool leading to sizable increases in the rates of colonoscopy in many countries (20–22).

In Australia, rising usage of colonoscopy has been seen for over two decades, with Medicare claims for the procedure increasing by 250% in the last 10 years (23). This increase has occurred simultaneously with increased capacity within the private hospital sector (24). Given the current trajectory, and when considered with population aging and the promotion of earlier screening, it is estimated that over 1 million colonoscopies will be performed annually by 2020 in Australia (population 24 million) (25). Similar relative trends have been reported elsewhere, with greater absolute increases, in countries such as the US (26). Such demand is not sustainable for most health systems, both in terms of provider capacity and health-care costs, estimated to be in the multiple billions of dollars annually in western nations (27). Furthermore, if projected increases in demand are realized, access to this service will be compromised, especially in public health systems. Already in Australia waiting times for colonoscopy exceeding 250 days are not uncommon (28, 29).

## RISK STRATIFICATION APPROACHES TO CRC DETECTION AND PREVENTION

Researchers, including our team, have previously called for greater consideration of personalized risk stratification approaches to primary screening for CRC (30); however, less consideration has been given to the potential benefits of such approaches for ongoing surveillance. Targeting colonoscopy to patients who stand to benefit most (i.e., those at higher risk of CRC) through robust risk stratification would reduce the burden of colonoscopies to both patients and the health system, while maintaining the preventive benefits of surveillance colonoscopy. Such targeting could reduce burden for lower-risk patients, who are less likely to benefit and reduce waiting times for high-risk patients who require more regular surveillance. In addition, as most adenoma patients face a lifetime of burdensome colonoscopies with its associated bowel preparation and procedural risks, targeting surveillance to high-risk patients would also likely increase compliance with recommended follow-up colonoscopy intervals, which is often poor; only 36% of patients comply with clinical guideline recommended intervals for surveillance colonoscopy in Australia (31). Moreover, with increasing incidence in CRC seen in younger age groups (32, 33), especially those under eligibility age thresholds for FOBT programs (34), and differential surveillance colonoscopy compliance based on patient insurance status (35), risk stratification holds additional benefits for particular patient groups.

The literature on risk stratification for CRC prevention primarily incorporates factors such as family history and sociodemographics (age, sex, and socioeconomic status) with some models also incorporating genetic variants associated with CRC susceptibility (36). Where surveillance colonoscopy is considered, adenoma number, size, villosity, and dysplasia grade at the most recent investigation are the more common determinants for recommending future surveillance intervals, whereas other factors including proximal or distal adenoma location, and the total adenoma burden over time are often overlooked as risk factors for future CRC.

# INCORPORATING DATA FROM MULTIPLE PRIOR COLONOSCOPIES

The cumulative burden of prior colorectal adenoma has almost exclusively been omitted from risk stratification approaches for surveillance colonoscopy, often due to unavailability of data. Most research in this area has only incorporated data from the most recent colonoscopy. However, it is likely that the risk of adenoma recurrence or development of CRC is modified by prior adenoma and/or changes in adenoma characteristics over time. Therefore, risk increases are likely conditional on adenoma characteristics from multiple earlier examinations rather than just the most recent investigation.

To date, there has been little published work which has considered longitudinal colonoscopy history for risk prediction of CRC. Estimates from a relatively small study (*n* < 3,000) of Dutch patients investigated predictive ability of baseline colonoscopy on adenoma burden for up to two subsequent colonoscopies (37). The authors reported that optimizing timing of colonoscopy surveillance by incorporating multiple risk factors could result in 20% fewer surveillance colonoscopies being required annually, while maintaining the same level of effectiveness in terms of cancer detection and life-years gained (37). Three other studies have reported on rates of advanced adenoma or CRC incorporating up to two surveillance colonoscopies (38–40), although, as commented by the US Multi-Society Task Force on Colorectal Cancer (41), all have important limitations possibly resulting in selection bias. Despite these weaknesses, findings were consistent across these studies suggesting that accounting for longitudinal colonoscopy history could provide important information for CRC risk prediction. While these results are encouraging, there is currently a complete lack of findings in the literature beyond the second surveillance colonoscopy. Consequently, the extent to which adenoma burden over a patient's life mediates future CRC risk is largely unknown.

Due to the lack of empirical data in this area, recommended intervals for follow-up colonoscopy in most national clinical guidelines, such as those in the US, UK, Australia, and Europe (15, 41–43), are almost exclusively based on results of the latest examination alone. Consequently, existing international guidelines are arguably a compromise that may not accurately define optimal intervals for repeat surveillance in patients with detected adenomas over multiple prior colonoscopies.

In Australia, clinical guidelines advocate that a risk assessment combining the results at baseline and at least one repeat surveillance examination may be a superior tool for CRC prediction than reliance on findings at the latest examination (15). However, there is no guidance provided on how to use that information other than a general statement that endoscopists should be encouraged to consider previous colonoscopy findings. The authors of the Australian Clinical Guidelines for Colonoscopy Surveillance recognize this limitation and recommend further research to determine CRC risk after a series of surveillance examinations, stratified by risk parameters of the baseline adenomas (15). This has also been highlighted as an important area in an Australian gap analysis (44).

## OPPORTUNITIES IN THE CURRENT DATA ENVIRONMENT

The emergence of whole-population data linkage systems in many countries has afforded the opportunity to combine comprehensive data from a range of health service data collections for large samples over decades. Such linkage systems provide a powerful resource for conducting longitudinal research on large or even entire populations and have benefits for minimizing, if not overcoming, limitations due to sample size, selection bias, response or recall bias, loss-to-follow-up, and ascertainment of accurate health service exposure and outcome measures. The use of such data has become commonplace in health research (45), and linkage of whole-population non-consented service data for research purposes is an accepted ethical approach (46).

Data from such linkage systems could also lay the foundation for more robust risk stratification of populations, incorporating a wide range of sociodemographic, clinical, and genetic factors depending on the data available to be linked. Linkage systems, such as the Western Australian Data Linkage System (47), use widely accepted probabilistic-matching techniques and already have capacity to link decades of cancer registry, inpatient, pathology, and mortality data, combined with the ability to genealogically link patients at the individual-level to derive familial history of disease and "genetic" risk factors. Such data provide a unique platform to investigate different risk stratification models for CRC detection through colonoscopy surveillance. Moreover, due to the extensive observation periods that can be investigated, these systems provide the opportunity to incorporate data based on findings over multiple surveillance colonoscopies, which have been omitted from the literature to date but are likely an important component for precision targeting of ongoing surveillance windows. Additional linkage to National Bowel Cancer Screening Program records and large cohort studies, which may provide information on a range of health behaviors not routinely captured in administrative data such as smoking, alcohol consumption, diet, and physical activity would further enhance the ability to precisely stratify CRC risk and tailor appropriate follow-up intervals. The lack of such behavioral risk factor information, rarely captured in administrative data, is a potential limitation and arguably does not allow all risk factors to be considered in risk stratification models. However, available administrative data do allow targeting of factors most relevant to guideline-based decision making in this area. Furthermore, the approach proposed in this paper would still provide an advance on existing risk-stratification models as a result of accounting for the cumulative burden of prior colorectal adenoma which has been omitted from risk stratification approaches to CRC screening and surveillance to date.

In addition, when combined with the availability of tools such as MISCAN-Colon, a well-established microsimulation model for CRC (48, 49), evaluation of the cost-effectiveness of different risk stratification models for informing timing of ongoing follow-up colonoscopy for CRC is possible. Such work can also be tailored to jurisdictional-specific settings and precedents exist for the adaption of the MISCAN-Colon model to local settings, such as the Australian-specific variant of MISCAN-Colon (50).

# CONCLUSION

Whole-population data linkage systems are uniquely placed to allow robust longitudinal investigation to develop risk stratification models for CRC surveillance. Systems would require the capacity to link data collections comprising demographic, cancer registry, hospital inpatient, pathology, mortality, and genealogical factors over multiple decades at the whole-of-population level. The ability to link additional behavioral risk factor data (e.g., smoking, alcohol consumption, and dietary intake) from sources such as large cohort studies would also add value. The linking of such data collections would allow relevant risk factors to be accounted for in risk stratification models, including the incorporation of complete colonoscopy history and adenoma burden over time, which represents a potentially important modifying factor for cancer risk but is currently not included in risk modeling for recurrent adenoma of CRC.

In addition to providing greater precision with patient risk profiling, estimates can be used in cost-effectiveness analyses to determine optimal colonoscopy surveillance intervals for patients at different levels of cancer risk. This could reduce costs to the health system without a reduction in the number of CRCs that surveillance colonoscopy prevents. Such information also has capacity to support rational decisions concerning the best strategy for repeat surveillance via colonoscopy for patients at both low and high risk for CRC and reduce excessive delays for surveillance colonoscopy, especially for high-risk patients. Moreover, it creates an evidence-base for recommendations that would be immediately implementable in clinical practice with the potential to influence national colonoscopy surveillance guidelines.

### AUTHOR CONTRIBUTIONS

DP was the lead investigator for the project to which this opinion piece relates and was responsible for concept development, undertaking the relevant literature critique and drafting the initial manuscript. IL-V provided direct health economics and colorectal cancer screening expert input and was involved (along with DP) with developing the overall concept. PO and DC provided genetic and risk stratification for cancer screening input. HE and CP provided colorectal clinical and surgical input. MB and LT provided methodological expertise. All the authors were involved with developing the manuscript and provided detailed feedback and commentary on all iterations of the draft paper.

# FUNDING

This paper arises from a project grant funded by the Australian National Health and Medical Research Council (NHMRC) (APP1123495).

# REFERENCES


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The reviewer JC declared a shared affiliation, with no collaboration, with one of the authors MB to the handling Editor.

*Copyright © 2017 Preen, Lansdorp-Vogelaar, Ee, Platell, Cenin, Troeung, Bulsara and O'Leary. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Corrigendum: SimAlba: A Spatial Microsimulation Approach to the Analysis of Health Inequalities

### *Malcolm Campbell <sup>1</sup> \* and Dimitris Ballas2*

*1GeoHealth Laboratory, Department of Geography, University of Canterbury, Christchurch, New Zealand, 2Department of Economic Geography, Faculty of Spatial Sciences, University of Groningen, Groningen, Netherlands*

### *Edited and Reviewed by:*

*Ori Gudes, University of New South Wales, Australia*

*\*Correspondence:*

*Malcolm Campbell malcolm.campbell@canterbury.ac.nz*

### *Specialty section:*

*This article was submitted to Public Health Policy, a section of the journal Frontiers in Public Health*

*Received: 25 November 2017 Accepted: 29 November 2017 Published: 22 December 2017*

### *Citation:*

*Campbell M and Ballas D (2017) Corrigendum: SimAlba: A Spatial Microsimulation Approach to the Analysis of Health Inequalities. Front. Public Health 5:340. doi: 10.3389/fpubh.2017.00340*

Keywords: spatial microsimulation, urban health inequalities, health policy, scotland, geographic information systems, small area microdata

### **A corrigendum on**

**SimAlba: A Spatial Microsimulation Approach to the Analysis of Health Inequalities** *by Campbell M, Ballas D. Front Public Health (2016) 4:230. doi: 10.3389/fpubh.2016.00230*

In the original article, we neglected to include the Acknowledgments section.

### ACKNOWLEDGMENTS

We acknowledge the contribution of Alison Watkins to cartographic design for Figures 1–7 in this article.

The authors apologize for this error and state that this does not change the scientific conclusions of the article in any way.

The original article has been updated.

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2017 Campbell and Ballas. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# SimAlba: A Spatial Microsimulation Approach to the Analysis of Health Inequalities

*Malcolm Campbell1 \* and Dimitris Ballas2,3*

*1GeoHealth Laboratory, Department of Geography, University of Canterbury, Christchurch, New Zealand, 2Department of Geography, University of Sheffield, Sheffield, UK, 3Department of Geography, University of the Aegean, Mytilene, Greece*

This paper presents applied geographical research based on a spatial microsimulation model, *SimAlba*, aimed at estimating geographically sensitive health variables in Scotland. *SimAlba* has been developed in order to answer a variety of "what-if" policy questions pertaining to health policy in Scotland. Using the *SimAlba* model, it is possible to simulate the distributions of previously unknown variables at the small area level such as smoking, alcohol consumption, mental well-being, and obesity. The *SimAlba* microdataset has been created by combining Scottish Health Survey and Census data using a deterministic reweighting spatial microsimulation algorithm developed for this purpose. The paper presents *SimAlba* outputs for Scotland's largest city, Glasgow, and examines the spatial distribution of the simulated variables for small geographical areas in Glasgow as well as the effects on individuals of different policy scenario outcomes. In simulating previously unknown spatial data, a wealth of new perspectives can be examined and explored. This paper explores a small set of those potential avenues of research and shows the power of spatial microsimulation modeling in an urban context.

Keywords: spatial microsimulation, urban health inequalities, health policy, Scotland, geographic information systems, small area microdata

# INTRODUCTION

*SimAlba* is a spatial microsimulation model, which has been used to estimate geographically sensitive health variables for Scotland's largest city, Glasgow. Spatial microsimulation is now a well-established method in geography for public policy analysis in a wide range of domains (1, 2). Building on these efforts, *SimAlba*<sup>1</sup> has been developed in order to answer a variety of "what-if " policy questions pertaining to health policy in Scotland. We aim to show how this data could be (and have been) used to create "what-if " policy scenarios. A "what-if " policy scenario is an estimation of what may happen to health outcomes as a result of a hypothetical change in policy using modeled data.

There is a significant body of literature describing the uses of complex statistical models to analyze social and spatial inequalities in a variety of contexts. Specifically, the use of spatial microsimulation models (3–8) provide a new perspective on existing data sources and contribute to the relevant academic literature as well as applied health policy analysis efforts offering an opportunity to estimate previously unknown data as well as to analyze both individuals and areas simultaneously.

### *Edited by:*

*Ori Gudes, Curtin University, Australia*

### *Reviewed by:*

*Grace Yajuan Yun, Government of Western Australia Department of Health, Australia Woohyun Yoo, Dongguk University, South Korea*

*\*Correspondence: Malcolm Campbell malcolm.campbell@canterbury.ac.nz*

### *Specialty section:*

*This article was submitted to Public Health Policy, a section of the journal Frontiers in Public Health*

*Received: 23 June 2016 Accepted: 03 October 2016 Published: 21 October 2016*

### *Citation:*

*Campbell M and Ballas D (2016) SimAlba: A Spatial Microsimulation Approach to the Analysis of Health Inequalities. Front. Public Health 4:230. doi: 10.3389/fpubh.2016.00230*

<sup>1</sup>The model is named *SimAlba* as Alba is the Scots Gaelic name for Scotland, and it is a Spatial Microsimulation model of Scotland.

This paper aims to further demonstrate how spatial microsimulation can be used to estimate previously unavailable data and then to show how this data can be analyzed and visualized, using geographic information systems (GISs), to illuminate both the social and the spatial patterns in health-related behavior and outcomes in Glasgow, Scotland (see **Figure 1**). This paper forwards a new small area perspective on health-related variables in Scotland, showing how Scottish Health Survey (SHS) and Census data for Scotland can be combined to create a powerful policy modeling and visualization framework.

The paper is organized as follows: it begins by painting the health landscape of the study area; then giving an introduction to the microsimulation literature and explaining how spatial microsimulation can be operationalized in simple terms. Some outputs of the *SimAlba* model are then presented and explored, particularly focusing on the health-related variables created. A discussion of the relevance of the results simulated follows; concluding with directions for future research and the policy implications of the analysis presented.

# A BACKGROUND TO THE HEALTH LANDSCAPE IN SCOTLAND

The recent past has been marked by a series of deteriorations in Scottish health relative to the rest of Europe, which has led to Scotland being labeled as "the sick man of Europe." This label has been applied to Scottish health more recently, signifying the noticeable divergence from the 1950s onward in terms of health compared with the rest of Europe. Glasgow, in particular, exhibited the highest levels of self-reported bad or very bad general health and psychological distress for both men and women compared across 32 other Europe metropolitan areas (9). The "Scottish Effect" (10) or the "Glasgow Effect" (11) details the excess mortality in Scotland and Glasgow, in particular, even after accounting for socioeconomic circumstances. This suggests that Scotland is peculiar in regards to population health, and that this effect may be even stronger in Glasgow; hence the focus in this paper on the urban area of this city. In other words, after taking account of deprivation, there is still an excess of mortality in Scotland compared to England and Wales (12). This issue is well-studied. For example, a report on Scottish health (13), identified "risk factors" in Scotland as tobacco, alcohol, low fruit and vegetable intake, physical activity levels, and obesity. More broadly, within the UK, there has long been ample evidence on the existence of health inequalities, especially, since the highly influential Black Report (14) that highlighted health inequalities by both place and socioeconomic status that continue to exist and persist over time (15) in the UK. Furthermore, when compared to the rest of Great Britain (GB) or the UK (16, 17) or its western European neighbors (18), Scotland does not do well. There have been many studies examining these broader country level differences over time between Scotland and the rest of GB [for a recent example comparing mortality patterns, see Ref. (12)].

Looking in more depth at Glasgow, the evidence of a specific "Glasgow Effect" as discussed above is a particular concern for this paper. A specific cause of concern is that premature mortality is 30% higher in Glasgow compared to similarly deprived UK cities (11). This paper adds to the understandings of why this may be the case by estimating previously unknown data. For example, discussion around the importance of alcohol consumption or drug use as contributing to half of the excess observed (19), with much of the deprivation potentially unmeasured, points to the usefulness of small area estimates to fill this gap. The specific spatial patterning of deprivation in Glasgow has been examined as a possible cause of the "Glasgow Effect"; evidence suggests that there is a strong impact of deprivation of surrounding areas on health outcomes (20) but not quite as originally hypothesized by McCartney et al. (21) as a concentrated monoculture. As McCartney et al. (21) explains, there are 17 possible explanations for the unique situation in Glasgow, concluding that understanding of the Scottish mortality patterning requires, as well as a clear focus on behaviors, an understanding of the most "upstream" determinants of health, to which spatial microsimulation can add some important value. Previous analysis of poverty and benefit take-up show that there are some geographical patterns, but only at unitary authority level (22), noting that the "worst" areas are concentrated around Glasgow combined with relative affluence nearby. Other work examining the geography of disadvantage in Glasgow (23) notes the persistence of disadvantage in areas in the east end (Shettleston, Easterhouse) as well as to the northwest (Drumchapel) and to the South (Castlemilk) and southwest of the center (Pollok) in the 1970s, 1980s, and 1990s. Of particular note is that Glasgow performs worse on all the deprivation-related variables compared to the Scottish average and the persistence of disadvantage, in particular, small areas of Glasgow. This pattern of higher deprivation in Glasgow continues, linking it with mortality rates, showing a strong bivariate relationship across Scotland; in other words, spatial proximity to deprivation is important for mortality outcomes (24). Qualitative evidence from Glasgow also points to the importance of area on health behaviors, that poorly resourced, stressful environments with strong community norms may foster smoking as well as undermining attempts to increase cessation rates (25). Moreover, the perceptions, as well as the health outcomes in neighborhoods in Glasgow have a social gradient, as outlined by Sooman and Macintyre (26), such that perceptions of an area can influence health outcomes. Overall, we can see the pattern of evidence pointing to the importance of area influence on health outcomes in Glasgow.

The role of smoking, alcohol consumption, diet, and physical activity in explaining socioeconomic differentials in mortality in the west of Scotland noted the importance of these behaviors for longer-term outcomes (27). Thus, having estimates of such behaviors at small area level can help increase understanding of the broader forces of health inequality associated with health behaviors. A Scottish specific issue is the role that alcohol plays in contributing to poor health outcomes linked to the minimum pricing of alcohol as a policy response (28). Scotland has among the highest alcohol-related deaths in Western Europe (29), although this has been falling since the 1990s. Scotland also embarked on a smoke-free policy, designed to reduce exposure to secondhand smoke. Evidence has shown that it has been a success (30) as well as having none of the hypothesizing negative outcome, such as more smoking in the home or economic impacts on businesses. Of particular relevance is the debate around the independence question for Scotland. Although the outcome was a "no," there is still significant potential for further departure with respect to health policy compared to the rest of the UK (31).

Therefore, we can see that Glasgow has been the subject of much research into health inequalities as well as economic and social inequality. We add estimated health variables to this body of work at a small area level to further enhance knowledge and to highlight relevant social and spatial patterns and inequalities.

### A BRIEF BACKGROUND TO SPATIAL MICROSIMULATION MODELING IN HEALTH

Spatial microsimulation is an established methodology in the social sciences with a long successful history in Economics since the late 1950s and with more recent significant developments in other disciplines, including geography in the last three decades (1, 2). In particular, there have been significant advances in spatial microsimulation models, in other words, adding geography to models (32). This adds to the potential uses of microsimulation, for example, by allowing assessment of area-based policies relating to social and health policy (3, 7, 33). Additionally, the geographic distribution of health-related variables can be simulated (3–6, 34), not just the socioeconomic or demographic patterns aspatially. This allows previously unknown small area spatial patterns to be investigated, and the spatial effects to be considered in concert with the socioeconomic and demographic factors. Building on these efforts, *SimAlba* has been developed in order to answer a variety of "what-if " policy questions pertaining to health policy in Scotland, with geography included as a key element. The *SimAlba* model has previously been used to estimate and model in the economic sphere (35, 36). We add to this literature by focusing on health.

### DATA AND METHODS: SIMALBA – A SPATIAL MICROSIMULATION MODEL

The *SimAlba* model was developed with the use of data from the Census of Population 2001 and the SHS 2003. The Census of Population is carried out decennially, while the SHS 2003 was the third survey of Scottish health (after 1995 and 1998) and included all ages. Each SHS samples a new set of addresses and has both an adult and child component with a total of 8,148 adults and 3,324 children interviewed on a variety of health conditions and behaviors as well as socioeconomic and demographic information. The health variables include: smoking and alcohol consumption, physical activity, dental health, general health, and many others.

It is important to point out that the time periods of data collection (2001 and 2003) do not match precisely, but in the absence of any other temporally consistent health data, for Scotland, this is a pragmatic compromise. Spatial microsimulation uses the data contained in the SHS and "upscales" it to reflect the populations of census areas as closely as possible. This can be achieved using a process called deterministic reweighting (3, 8, 37). Deterministic reweighting has become an established method for estimating health variables in multiple contexts such as area smoking prevalence (4, 6) or obesity prevalence (38). Spatial microsimulation works by using a series of constraints that are used to construct the model, and which must be present in both datasets; this limits the potential constraint options available. A constraint variable is chosen by either using the literature or a more formal regression approach to see which variables in the datasets are most correlated with the variable to be predicted. Therefore, the choice of the constraints, though informed by the literature and other empirical research, must be pragmatic. Constraints are keys to the model set up (39) and, therefore, an important part of the spatial microsimulation modeling process.

*SimAlba* uses age, sex, marital status, illness, qualifications, economic activity, tenure, and an employment classification (National Socioeconomic Classification, NSSEC) as constraints. Note that the deterministic reweighting process is not explained in depth in this paper for reasons of brevity [for more details, see Ref. (36)]. The method is deterministic as it produces the same output for the same input data, which were an important consideration for policy end users. The stylized formula that can be applied to create microdata is NWi = Wi × CENij/SHSij.

The equation is constructed as follows: a new weight (NW) for individual i is calculated by multiplying the weight (W) for individual i by element ij of the Census table divided by element ij of the SHS table. This process is completed iteratively until a suitable level of convergence is reached, and NWi is the number of a particular individual created for a specific small area in Scotland. The process was followed to adjust the weights of individuals in the SHS to match census output areas (OAs) populations, which have a minimum population of around 40 households or 100 individuals. The end result is a spatially simulated dataset, which previously did not exist and which can now be used as the basis for further analysis.

Microsimulation has been used to estimate many different types of data in multiple contexts as discussed above. One of the key points of concern in the literature pertains to the reliability and accuracy of the microsimulated data. There is now a growing body of evidence showing that the technique provides robust estimates of health-related variables in particular (6, 38, 40). *SimAlba* has been internally and externally validated (see **Figure 8**) and has demonstrated that it provides robust data (35, 36). From **Figure 8**, it can be seen that the model produces estimates within 10% error, with most of the data falling close to the 45° line, signifying an exact match.

### SPATIAL MICROSIMULATION MODEL OUTPUTS: ESTIMATING HEALTH BEHAVIORS AND OUTCOMES

This section shows some of the microsimulated data tabulated and mapped so as to give a small snapshot of the type of data that can be produced by *SimAlba* and its policy relevance. Several of the variables simulated are now visualized using a quintile distribution, which can help us to better highlight the extremes of the spatially simulated data. Q1 refers to the highest values, Q5 the lowest in the distribution of variables. Only a small fraction of the data that can be mapped is, as any variable in the SHS can, potentially be simulated using the *SimAlba* algorithms.

In this paper, we demonstrate the relevance of the outputs of models like *SimAlba* to policy debates briefly discussed above by focusing on smoking prevalence, subjective well-being, alcohol consumption, and obesity. We therefore pose five policy relevant research questions that are readily applicable to spatially microsimulated data. Specifically, we demonstrate how models like *SimAlba* can be used to address research questions such as:


General health questionnaire (GHQ) scores are a measure of subjective well-being based on a series of questions resulting in a single number summary of mental health, where a higher score denotes increased mental distress. First, the simulated

spatial pattern of subjective well-being is visualized as shown in **Figure 2**. There is a notable series of clusters in the east end of Glasgow. The areas with the lower percentages of individuals (lighter colors) appear to be spread around the west end and to the northern edges of Glasgow, which is what is likely to be expected *a priori* from the socioeconomic geography of Glasgow. In other words, the most deprived areas have worse mental health outcomes. Elsewhere, the pattern of mental well-being appears sporadic in Glasgow with smaller scattered clusters toward Drumchapel for example.

Second, the geography of Glasgow in terms of BMI is looked at briefly in this paragraph. Those areas colored darkest (Q5) with large numbers of obese people are in the east of Glasgow in **Figure 3**, Easterhouse, and Shettleston. Areas with higher proportions of obesity are also concentrated in the Castlemilk area of Glasgow to the south east. There are similar small enclaves of areas in the areas bordering the river Clyde to the western edge on the south side of Glasgow city. The pattern would appear to follow an explanation of poor socioeconomic conditions correlating with obesity in the Glasgow area.

Third, the focus moves to the spatial patterns of alcohol consumption in Greater Glasgow. Overall, the summary is that there is little in the way of a clear pattern (**Figure 4**). The pattern of east end doing "poorly" is not as apparent for this variable. The message overall is that there are few "pockets" of problem drinking, so it is more difficult to conclude that this is linked to the area.

Fourth, the geography of smoking in Glasgow in **Figure 5** shows smokers using over 20 cigarettes a day. Focusing on the spatial pattern, areas toward Castlemilk in the south east, the east end around Easterhouse, and the parts of the central areas bordering the river Clyde have the highest proportions of heavy smokers.

The spatial patterns demonstrated in each of the estimated health outcomes and behaviors, to a greater or lesser extent, mimic the aforementioned patterns of deprivation. The particular social geography within the Greater Glasgow area is therefore important context to the estimates produced here.

## A STYLISED POLICY SCENARIO: IDENTIFYING AREAS OF HIGH NEED

This section explores the power of spatial microsimulation in more depth by again demonstrating some of the consideration advantages over more "traditional approaches." Imagine a policy scenario where the aim is to identify the areas with the most "unhealthy" persons, and the areas in which they reside. This can be achieved in spatial microsimulation modeling. Data can be combined, such that the people who are smoking 20 or more cigarettes a day, drinking more alcohol than the guidelines suggest, have low subjective well-being and also obese simultaneously are selected, then mapped. This combination of factors

could be considered "unhealthy," so finding the areas in which these people live may be a priority so that health policy can target concentrations of "poor" health outcomes. The map in **Figure 6** shows the "high risk" areas in terms of health for Greater Glasgow. The spatial pattern in Glasgow shows that some areas stand out visually. There are areas of clustering in places that are expected to feature in the "poor" health end of the distribution, such as areas in the east end of Glasgow, around Easterhouse, and Castlemilk. Other areas, such as Drumchapel, have pockets of "high risk" health features. On balance, the pattern is concentrated more within the city boundary than outside it, punctuated by smaller clusters spread across the city with notable "gaps" (i.e., white space) in the more affluent areas of the city, such as the west end. The pattern does show elements of the other health maps, which is to be expected as it is a combination of all four of the previous health maps of Glasgow. The concentration of "high risk" areas could have important health implications and additional effects on health that smaller isolated clusters may not exhibit would have a much greater effect where there are combinations of "high risk" health. In other words, the combination of high alcohol consumption, smoking, obesity, and poor mental health may well have longer-term effects as well as compounding effects on individual and area-level health. It could be argued that areabased policies, i.e., targeting a specific neighborhood, would work by targeting these "high risk" areas, and this may well have an

impact at the national or city level in terms of an improvement to health outcomes more generally.

A further example of the power of spatial microsimulation is to combine and cross tabulate socioeconomic and health variables geographically. In **Figure 7**, the map shows the areas with the highest proportions of people who have low income and are smokers. What the map shows is those areas with the darkest reds (Q5) contain between 78 and 96% of people in that category as a proportion of all people in each area. In other words, almost all of the people in some areas of Glasgow are low-income smokers. There is an advantage to know which of those areas are worth focusing resources in terms of stopping smoking services. Areas to the south, such as Shettleston and areas to the East, such as Easterhouse, are highlighted with respect to smoking behaviors and low income.

### DISCUSSION

In 2006, Scotland introduced a nationwide ban on smoking in public places and plans to end tobacco displays in shops as well as to ban sales from vending machines. Scottish studies (41) report that reductions in exposure to secondhand smoke of the order observed in Scotland may generate immediate health gains in the Scottish population as well as longer-term reductions in morbidity and mortality related to secondhand smoke due to the smoking ban. Haw and Gruer (41) argue that quitting smoking is probably the most effective way of reducing secondhand smoke exposure in the home; and that smoking cessation services must continue to be promoted. Additional evidence (30) again supports the thesis that smoke-free legislation has been a success. An option would be to model smokers to better target this group of the population if desired. The use of microsimulation to model smoking rates is not new, as the geography of smoking in Leeds (4) has previously been estimated. The microsimulation of smoking rates in *SimAlba* builds on this type of work and brings it to a Scottish context, which does not appear to have been modeled before. There are also arguments about broader macroeconomic forces, such as income inequality (42), being the cause of a plethora of health and social ills. The debates around greater income inequality leading to higher rates of not just smoking but also poorer mental health outcomes and higher rates of obesity are well rehearsed in the literature.

Another aspect of health that is relevant in Scotland is mental health outcomes. Scotland has high rates of suicide (43) compared to England and Wales. Spatial microsimulation could be used to specifically target "at risk" groups, geographically. Previous modeling has been completed in England (44) showing the spatial patterning of small-area prevalence of psychological distress and alcohol consumption. Also, there have been attempts to estimate happiness in Scotland with the use of spatial microsimulation (34) by combing the British Household Panel Survey (BHPS) with census data. What the analysis in this paper adds is a more complete picture of other health variables, also using a health-specific survey data set (SHS instead of the BHPS), and building on the existing work from elsewhere in the UK.

Alcohol policy is also of particular policy relevance due to the debates on the introduction of a minimum price per unit of alcohol (45). The Scottish government previously introduced an alcohol bill to try and begin the process of legislating for the changes needed, such as the minimum price per unit of alcohol. In the background of alcohol consumption debates is the framework of the recommended daily limits for alcohol consumption of no more than 3 or 4 U (2 or 3 U) of alcohol per day for men or women, respectively. The analysis presented here shows the estimated geographic location and the characteristics of people who drink over the guideline limits adding extra depth to the existing data. As noted by Katikireddi and McLean (28), there is a lack of empirical evidence in this regard which, it could be argued, can be addressed by spatial microsimulation models (e.g., *SimAlba*).

Obesity is a growing problem worldwide. It is also a costly problem with between 0.7 and 2.8% of a country's total healthcare expenditures being spent on this health issue (46). There are complex pathways and dynamics behind the determinants of obesity (47) that explain the doubling of the rate, since 1980 worldwide, to a rate of around 20% in most developed economies, such as the context explored here (48). More concerning is that patterns among children and adolescents continue to show growth in rates of obesity (49). Interestingly, when looking at the relationship between play areas and deprivation and subsequent links to childhood obesity (50), it was found that more deprived areas are better provided for, but, the quality has not been accounted for, neither has the lack of private green space relative to more affluent areas, so causal pathways in some instances are unclear. Moreover, in Glasgow, there is evidence to suggest that more deprived neighborhoods are no more likely to be exposed to energy dense out-of-home eating outlets (51). So, simple explanations relating to providing more play areas and reducing exposure to out-of-home eating outlets are not sufficient explanation for increasing obesity rates, The *SimAlba* model adds to a literature on simulated obesity rates for small areas seen elsewhere in the UK (38). More recent literature (52) has continued in a similar vein, emphasizing the importance of designing policies targeted at the small area level, but also that account for population group differences simultaneously.

### CONCLUSION

A comprehensive dataset, such as that generated by *SimAlba* that provides data on health-related behaviors for individuals and small areas in Scotland, has previously not been available. Although the data simulated are now updated, it provides an important addition to understanding the health behaviors at small area geographies. The missing piece of the puzzle has always been that reliable small area data on all these types of behaviors and conditions are not collected, except, for very broadly, by the Census, which exists for self-reported health for example. What spatial microsimulation adds is the lower level, small area

geography, the ability to examine both composition, and context simultaneously.

Nevertheless, it should be noted that one concern with spatial microsimulation is the issue of validation – how accurate simulated data are – and how to assess quality of outputs. This concern has been addressed or discussed in papers looking at deterministic reweighting models (6), and there are ongoing debates (53) on this specific issue. Therefore, the main limitation of microsimulation is that it is difficult to verify that the outputs against what the real population data may be. The paradox of this approach is that the reason the data are simulated in the first instance is that it is difficult or too expensive to collect. On balance, the *SimAlba* model appears to produce reasonably accurate microsimulated data where validation or use of a proxy variable to test results have been possible as demonstrated elsewhere (35, 36), as well as seen in **Figure 8**.

The analysis presented provides policy makers with an indication of those areas where individuals with a variety of health outcomes (smoking, alcohol consumption, obesity, and mental well-being) are potentially living within Glasgow, and this information could potentially be used to target smaller area interventions compared to a universal intervention. Subjective well-being (measured by GHQ 12 score) has also been examined, and there does not appear to be any other study in which estimated GHQ scores at such small areas in Scotland. Alcohol consumption was also modeled using the *SimAlba* framework. The simulation of data of this nature could be considered valuable to policy makers in showing the differing spatial concentrations

### REFERENCES


of problem drinkers. Furthermore, obesity and various weight categories were simulated using *SimAlba*. The analysis provides an original dataset to explore health outcomes and behaviors in Scotland at either the individual-level or small area-level geography. The estimation of health-related variables; smoking, alcohol, happiness, and obesity at small area level geography is a step forward in understanding what the patterns of health behaviors or health indicators are likely to be. There is still significant potential to use the microdataset created for future research in a variety of fields. The SimAlba model is also able to estimate other variables, which are present in the SHS (e.g., regular exercise), but this would require a modified spatial microsimulation model. The model presented here could also be used as a basis for future modeling work or as the basis of a framework for other survey data sources, for example, to look at spatial and social patterns of tobacco cessation, condom use for disease prevention, seat belt use, or breastfeeding.

### AUTHOR CONTRIBUTIONS

MC collected and analyzed data and wrote the first draft; DB made suggestions regarding the analysis and interpretation and also co-authored and edited the manuscript.

### FUNDING

This work was funded by a grant from the ESRC and the Scottish Government.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2016 Campbell and Ballas. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Physical Activity and Survival among Long-term Cancer Survivor and Non-Cancer Cohorts

*Anthony S. Gunnell1,2, Sarah Joyce3 , Stephania Tomlin3 , Dennis R. Taaffe1,4,5, Prue Cormie6 , Robert U. Newton1,4,7, David Joseph1,8,9, Nigel Spry1,9,10, Kristjana Einarsdóttir11 and Daniel A. Galvão1,4\**

*1Exercise Medicine Research Institute, Edith Cowan University, Joondalup, WA, Australia, 2School of Population Health, University of Western Australia, Nedlands, WA, Australia, 3Public Health and Clinical Services Division, Western Australian Department of Health, East Perth, WA, Australia, 4School of Medical and Health Sciences, Edith Cowan University, Joondalup, WA, Australia, 5School of Human Movement and Nutrition Sciences, The University of Queensland, St. Lucia, QLD, Australia, 6 Institute for Health and Ageing, Australian Catholic University, Melbourne, VIC, Australia, 7University of Queensland Centre for Clinical Research, The University of Queensland, Brisbane, QLD, Australia, 8Department of Radiation Oncology, Sir Charles Gairdner Hospital, Nedlands, WA, Australia, 9 Faculty of Medicine, University of Western Australia, Nedlands, WA, Australia, 10Genesis Cancer Care, Joondalup, WA, Australia, 11Centre of Public Health Sciences and Unit for Nutrition Research, School of Health Sciences, University of Iceland, Reykjavik, Iceland*

### *Edited by:*

*Hugh J. S. Dawkins, Department of Health of Western Australia, Australia*

### *Reviewed by:*

*Nonka Georgieva Mateva, Plovdiv Medical University, Bulgaria Anu Mary Oommen, Christian Medical College, Vellore, India*

> *\*Correspondence: Daniel A. Galvão d.galvao@ecu.edu.au*

### *Specialty section:*

*This article was submitted to Public Health Policy, a section of the journal Frontiers in Public Health*

*Received: 14 November 2016 Accepted: 30 January 2017 Published: 14 February 2017*

### *Citation:*

*Gunnell AS, Joyce S, Tomlin S, Taaffe DR, Cormie P, Newton RU, Joseph D, Spry N, Einarsdóttir K and Galvão DA (2017) Physical Activity and Survival among Long-term Cancer Survivor and Non-Cancer Cohorts. Front. Public Health 5:19. doi: 10.3389/fpubh.2017.00019*

Evidence suggests physical activity improves prognosis following cancer diagnosis; however, evidence regarding prognosis in long-term survivors of cancer is scarce. We assessed physical activity in 1,589 cancer survivors at an average 8.8 years following their initial diagnosis and calculated their future mortality risk following physical activity assessment. We also selected a cancer-free cohort of 3,145 age, sex, and survey year group-matched cancer-free individuals from the same source population for comparison purposes. Risks for cancer-specific mortality and all-cause mortality in relation to physical activity levels were estimated using Cox regression proportional hazard regression analyses within the cancer and non-cancer cohorts. Physical activity levels of 360+ min per week were inversely associated with cancer-specific mortality in long-term cancer survivors [hazard ratios (HR) = 0.30 (95% confidence intervals (CI) 0.13–0.70)] and participants without prior cancer [HR = 0.16 (95% CI 0.05–0.56)] compared with no reported physical activity. Physical activity levels of 150–359 and 360+ min were inversely associated with all-cause mortality in long-term cancer survivors [150–359 min; HR = 0.55 (95% CI 0.31–0.97), 360+ min; HR = 0.41 (95% CI 0.21– 0.79)] and those without prior cancer [150–359 min; HR = 0.52 (95% CI 0.32–0.86), 360+ min; HR = 0.50 (95% CI 0.29–0.88)]. These results suggest that meeting exercise guidelines of 150 min of physical activity per week were associated with reduced allcause mortality in both long-term cancer surviving and cancer-free cohorts. Exceeding exercise oncology guidelines (360+ min per week) may provide additional protection in terms of cancer-specific death.

Keywords: physical activity, cancer, survival, longitudinal, cohort study

# INTRODUCTION

Identification and management of lifestyle risk factors affecting prognosis in cancer survivors is becoming increasingly important as cancer screening and treatments continue to improve. In Australia, the number of cancers diagnosed almost doubled between 1991 and 2009, with a corresponding increase in agestandardized incidence of 12% (1). During a similar period of time, 5-year relative survival following any cancer diagnosis increased from 47% in 1982–1987 to 66% in 2006–2010 (1).

Assessment of physical activity levels and their effects in those who survived cancer has been undertaken by a number of researchers (2, 3); however, the usual period of assessment has been within a relatively short period following cancer diagnosis (almost exclusively less than 2 years). These studies have generally shown positive associations between increasing levels of physical activity and improved quality of life, cancer-specific mortality, and all-cause mortality for survivors of certain cancer types, particularly breast (4), colorectal (5–7), and prostate (8) cancer. Additionally, it has been suggested that cancer recurrence might be positively impacted by physical activity levels postdiagnosis, although evidence appears contradictory (9–12). To date, however, assessment of physical activity levels in terms of their effects on mortality in long-term survivors has been almost non-existent, although a recent study by Inoue-Choi et al. (13) suggested improved survival benefits for cancer survivors may be associated with adherence to the recommended physical activity guidelines. With improved long-term survival in those diagnosed with cancer, it is important to understand whether the survival benefits associated with physical activity extend beyond the immediate rehabilitation stage associated with cancer (and treatment) recovery. Moreover, it is important to assess whether physical activity behaviors contribute similar benefits in these long-term cancer survivors, compared to cancer-naive individuals. By linking the Western Australian Cancer Registry dataset with the Western Australian Health and Wellbeing Surveillance System (HWSS) dataset, we were able to obtain self-reported leisure-time physical activity (LTPA) levels at time points between 2 and 28 years following first recorded cancer diagnosis. This allowed us to investigate whether long-term survivors of "any cancer" benefited from increased physical activity, in terms of future cancer-specific and all-cause mortality risk.

# MATERIALS AND METHODS

### Ethics Statement

The study was approved by the Human Research Ethics Committees of Edith Cowan University and the Western Australian Department of Health and has therefore been performed in accordance with the ethical standards laid down in the 1964 Declaration of Helsinki and its later amendments.

### Study Design

This population-based cohort study utilized self-reported lifestyle survey information (from the HWSS) individually linked with cancer registry data (both held by the Western Australian Department of Health). The HWSS is a comprehensive monthly survey commissioned by the Western Australian Department of Health to provide information on a wide range of issues pertaining to the Western Australian population's physical and mental well-being. It utilizes computer-assisted telephone interviewing to assess approximately 6,000 Western Australians each year who are selected from the WA White Pages® telephone directory using a stratified random process with over-sampling representative to the population in rural and remote areas. Each year since its inception, more than 75% of those contacted completed the survey (14) and a majority (77% in 2010) of participants provided their name, address, and date of birth for the purpose of linkage with administrative health data. Only those HWSS participants who provided consent for their information to be used in this manner were linked to other registries for this study. The probabilistic matching procedures used to link individuals are based on full name and address, phonetic compression algorithms, and other identifiers, and they have been estimated to be 99.89% accurate (15). This linkage allowed identification of incident cancer diagnoses prior to an initial survey date and provided information on behavioral factors and demographics at the time of survey. Mortality data were obtained through linking to the Western Australian Mortality Registry for the entire study period (2004–2011).

### Study Population

Between May 1, 2004 and January 1, 2011, some 44,317 surveys for which consent was provided for data linkage were completed as part of the HWSS. Where participants had been surveyed more than once (1,616 people) during the study period, their last survey was included for analysis. Upon further restriction to those aged 40+ years, to target adult cancers only, 25,433 participants remained. Those with a diagnosed cancer (other than non-melanoma skin cancer) after 1982 and more than 2 years prior to their survey date were identified (1,813). Their first cancer within this period was considered their incident cancer. After further exclusion of cancer survivors with multiple cancers diagnosed prior to the survey, a cohort of 1,667 cancer survivors was selected. Exclusion of a further 78 individuals with missing information for body mass index (BMI), alcohol consumption, and/ or SF-8 questions (**Table 1**) left a cancer survivor cohort of 1,589. These participants were surveyed on average 8.8 years (quartile 1 = 4 years, quartile 3 = 12 years) following their first recorded cancer diagnosis.

A non-cancer cohort (NCC) was selected from the same source population of 44,317 individuals. Stratified random sampling without replacement at a ratio of 2:1 was performed using the age (10-year age blocks from 40 years onward), sex, and survey year frequency distributions of the 1,667 cancer survivors identified previously. This was performed using the "proc surveyselect" function within SAS Inc. software (version 9.3). The resulting 3,334 individuals had no cancer diagnosis prior to their survey (**Table 2**). After exclusion of 189 individuals with missing data for BMI and SF-8 questions, a final 3,145 cancer-free individuals were included in the final analyses.



*\*Significantly associated with LTPA (p* < *0.05).*

### Study Variables

Participants were asked to estimate the number of sessions and minutes per session of LTPA during the past week, in terms of



*\*Significantly associated with LTPA (p* < *0.05).*

walking 10 or more minutes consecutively, performing moderate physical activity (e.g., golf, gentle swimming, and lawn bowls), or vigorous physical activity (e.g., tennis, jogging, and cycling). Using recommendations of the Australian Institute of Health and Welfare (16), total LTPA was calculated using the formula [total LTPA = walk-time + moderate-time + (2 × vigorous-time)]. Sufficient LTPA was defined as follows: no LTPA, <150 min LTPA, 150–359 min LTPA, or 360+ min LTPA per week. While Australian physical activity guidelines (16) collapse the upper two categories (150–359 min and 360+ min), given the approximately equal number of participants in the two upper categories, stratification was preferable in this instance. For those aged 65+ years, no weighting was applied for vigorous physical activity in order to improve comparability between years due to the question not being asked for that age group prior to 2008. Few respondents aged 65+ years reported being vigorously active from 2008 onward. For those whose total LTPA per week exceeded 1,680 min, their summed value was re-coded to 1,680 min for analysis as recommended in the Australian Health and Welfare guidelines (16).

Confounding variables included in the adjusted Cox regression analyses were sex, age at survey, previous cancer type (none, breast, prostate, colorectal, melanoma, other), smoking status (never more than 100 cigarettes, ex-smoker, current-smoker), fruit and/or vegetable intake (<4, 4–5, 6, 7+ servings daily; based on quintile distribution), BMI (<25, 25–29, 30+ kg/m2 ; adapted from World Health Organization classifications), long-term risky alcohol intake (none, ≤2 standard drinks, 3+ standard drinks on a drinking day; based on National Health and Medical Research Council guidelines), SF-8 physical health component score (<50, 50+; based on median values), SF-8 mental health component score (<54, 54+; based on median values), year of survey, and self-reported diabetes status. The SF-8 Health Survey component of the HWSS is an eight item version of the SF-36®. Higher SF-8 scores correspond to better functioning.

Two outcomes were investigated following the participants' surveys, namely cancer-specific mortality and all-cause mortality. Cancer death was identified using the International Classification of Diseases version 10 codes C00-D48, present as either principal or other cause of death.

### Statistical Analyses

Differences in baseline characteristics in relation to level of LTPA were assessed using the Kruskall–Wallis (for ordinal variables) and chi square (for nominal variables) tests.

Person-time was calculated from the date of survey until death, or end of follow-up (January 1, 2011), whichever occurred first. Cox proportional hazards regression models were used to estimate hazard ratios (HR) with 95% confidence intervals (CI) for future mortality. Separate analyses were performed for the two outcome types (cancer-specific and all-cause mortality), with non-cancer mortality being censored in cancer-related analyses.

The final adjusted Cox models, which incorporated a stratum of "time from cancer diagnosis until survey" for the cancer survivor cohort, included LTPA, previous cancer type, age, sex, smoking status, BMI, daily fruit and vegetable intake, survey year, long-term risky alcohol use, SF-8 physical and mental health component scores, and self-reported diabetes. Socioeconomic index for areas and region (metro or regional residence) were not included in the adjusted model as they were not significantly associated with the outcome in crude Cox models.

Differences in survival as a function of prior cancer status (yes/ no) were assessed by combining the CCs and NCCs, and introducing a variable denoting prior cancer status to the adjusted Cox regression model. All aforementioned covariates were also included in this model.

Trends for the effects of LTPA on the two outcomes were estimated by excluding the categorical LTPA variable from the class statement in the adjusted Cox regression model. To test for interaction between LTPA and prior cancer status, the two cohorts were combined, and interaction terms [LTPA × prior cancer status (yes/no)] were added to the models.

The proportional hazards assumption that the ratio of mortality rates according to the exposure variable remained constant for the adjusted models was tested by inclusion of an interaction term between the LTPA variable and log(survival time). No violation of the proportional hazards was observed.

For all analyses, a two sided *p* < 0.05 was considered statistically significant. The statistical software SAS 9.3 was used to perform all analyses.

# RESULTS

### Baseline Characteristics

At time of survey, the median age for both the CC (*n* = 1,589) and the NCC (*n* = 3,145) was 68 years [interquartile range (IQR): 60–76 years]. Median BMI was similar between cohorts (CC = 26.6 kg/m2 , IQR = 23.8–29.9; NCC = 26.0 kg/m2 , IQR = 23.4–29.2), as was median fruit/vegetable servings per day (CC = 5.0 servings, IQR = 4.0–6.0; NCC = 5.0 servings, IQR = 4.0–6.0), median SF-8 physical health component score (CC = 49.6, IQR = 39.9–54.4; NCC = 51.1, IQR = 42.3–55.6), median SF-8 mental health component score (CC = 54.1, IQR = 48.1–57.7; NCC = 54.8, IQR = 49.3–57.7), and percentage with self-reported diabetes (CC = 11.8%; NCC = 11.5%). Percentage of current smokers (CC = 9.5%; NCC = 10.5%) and long-term risky drinking (>2 standard drinks on any given day) (CC = 19.8%; NCC = 18.0%) varied slightly between cohorts.

## LTPA-Stratified Baseline Characteristics in the CC

Compared with those who reported no LTPA per week, those reporting increased levels of LTPA tended to be younger, have lower BMI, greater fruit/vegetable and alcohol intake, and were less likely to be current smokers (**Table 1**). Mental health component scores from SF-8 questions were appreciably higher in the 360+ min of LTPA group and an apparent dose-response between increasing levels of LTPA and percentage of those with SF-8 physical health component scores above the median was observed (**Table 1**). All of the aforementioned factors were significantly associated with LTPA aside from gender and smoking status. In terms of time from first cancer diagnosis until survey, a slightly higher mean number of years were observed in the "no LTPA" group compared to the other LTPA levels. However, no significant association between LTPA and "years from cancer diagnosis until survey" was observed.

### LTPA-Stratified Baseline Characteristics in the NCC

The relationships between LTPA-stratified baseline characteristics for the NCC (**Table 2**) appeared to mirror those observed in the CC; however, significant associations between LTPA and gender, and LTPA and smoking status were observed in the NCC that were not present within the CC. This may relate in part to the greater number of individuals present in the NCC.

### Cohort Follow-up

The median duration of follow-up after survey was 2.6 years (out to 7.6 years) for the cancer survivor cohort, during which time 83 cancer-specific deaths and 135 all-cause deaths occurred. In comparison, a median duration of 2.8 years (out to 7.6 years) for the NCC was observed, during which time 52 cancer-specific deaths and 152 all-cause deaths occurred.

### Survival in Relation to Prior Cancer Status

Prior cancer was associated with a threefold increased risk [HR = 3.05 (95% CI 2.15–4.33)] for cancer-specific mortality, compared to those without prior cancer after adjustment for LTPA, age, sex, smoking status, BMI, daily fruit and vegetable intake, survey year, long-term risky alcohol use, SF-8 physical and mental health component scores, and self-reported diabetes. Adjusted estimates of risk for all-cause mortality were 72% higher [HR = 1.72 (95% CI 1.36–2.17)] in those with a prior reported cancer, compared to those with no prior cancer reported.

### LTPA and Cancer-Specific Mortality

Risks for cancer-specific mortality in participants with prior cancer [HR = 0.30 (95% CI 0.13–0.70)] and without prior cancer [HR = 0.16 (95% CI 0.05–0.56)] were significantly reduced in those reporting 360 min or more of LTPA per week, compared to those reporting none (**Table 3**). For both prior-CCs and NCCs there appeared to be an inverse dose–response relationship between level of LTPA and risk of cancer-specific mortality (CC *p*trend = 0.0024; NCC *p*trend = 0.0016). However, no significant interaction was observed between prior cancer status and LTPA in relation to cancer-specific mortality (*p*interaction = 0.8341) risk.

# LTPA and All-Cause Mortality

All-cause mortality during the follow-up period was significantly reduced by 45–59% (**Table 3**) for those reporting 150–359 or 360+ min per week of LTPA, regardless of prior cancer status. While not significant, results also suggested some reduction in risk for those performing less than 150 min LTPA per week. Significant trends were observed in terms of effects from increasing levels of LTPA on reduction in all-cause mortality, for the cancer (*p* = 0.0178) and non-cancer (*p* = 0.0215) cohorts. No significant interaction was observed between prior cancer status and LTPA in relation to all-cause mortality (*p*interaction = 0.9932) risk.

# DISCUSSION

This observational study aimed to investigate the relationship between LTPA and cancer-specific mortality, and between LTPA and all-cause mortality, in two cohorts; a long-term cancer survivor cohort and a cohort without prior recorded cancer who were frequency matched on age, gender, and survey year to those in the CC. Results confirmed an association between 150 min or more LTPA and reduced all-cause mortality in both cohorts. In relation to cancer-specific mortality, physical activity exceeding 360 min per week was associated with survival benefits regardless of a person's prior cancer status. Lower levels of physical activity (<150, 150–359 min per week) were not significantly associated with reductions in cancer-specific mortality in those with or without a prior cancer.

Physical activity has previously been shown to provide immediate beneficial effects for cancer survivors, including improvements in physiological markers, body composition, physical function, fatigue, and psychological outcomes (2, 17–20). Although evidence related to long-term cancer survivors is sparse, a recent study by Inoue-Choi et al. (13) suggested adherence to recommended levels of physical activity in long-term

TABLE 3 | Risk for cancer-specific death and all-cause death by weekly leisure-time physical activity (LTPA) levels, in cohorts with and without previously reported cancer diagnosis.


*a Cox model includes age at survey, sex, smoking category, long-term risky drinking category, body mass index category, daily fruit and vegetable intake, survey year, self-reported diabetes, SF-8 mental health component score, SF-8 physical health component score, and previous cancer type (for CC).*

cancer survivors may improve all-cause, CVD-specific, and cancer-specific mortality. While the study variables and design used by Inoue-Choi et al. differed to those in our study in that we included men and women, a comparative non-cancer group, and different doses of physical activity (e.g., 150, 150–359, and 360+ min per week), it was of interest to note the clear protective (adjusted) effects of physical activity that existed for both cancer-specific and all-cause mortality outcomes in their study. Similarly, a recent prospective cohort study (21) of 830 long-term prostate cancer survivors assessed physical activity at 2.5, 4.7, and 6.8 years post-diagnosis (comparable to the 8.8-year assessment time point in our study) showed a protective effect of increased physical activity and prostate cancer mortality. In our study, the likelihood of cancer-related death for cancer survivors appeared to decrease for increasing levels of physical activity culminating in a significant 70% reduced risk for those performing 360+ min of LTPA per week. In the same exposure group (360+ min), allcause mortality in cancer survivors was reduced by almost 60%.

In our cohort containing individuals with and without prior cancer, we observed a 72% increased all-cause mortality risk over the follow-up period in cancer survivors compared to those without a prior cancer. We also observed a threefold increased risk for cancer-specific mortality in those with prior cancer compared to cancer-naive participants. Higher risk of non-cancer-related mortality may in part be explained by an overrepresentation of CVD risk factors being observed in cancer survivors (22). Although it would have been interesting to test for CVD-specific mortality, we did not investigate this due to the relatively low numbers of cancer survivors having a CVD-specific death recorded and the likelihood of competing risks between the CVD and cancer mortality outcomes.

For long-term cancer survivors, previous evidence suggests an increased risk for both cancer and non-cancer-related mortality compared to that of the general population (23, 24). A number of studies have highlighted the above average incidence of second primary or recurrent cancers in those surviving an initial cancer (25). This increased risk for subsequent cancer appears to depend upon cancer type and/or cancer treatment, and other individualspecific risk profiles (e.g., lifestyle, genetics, and other exposures) (25). In addition, increases in all-cause and non-cancer-related mortality in cancer survivors have been reported (13). There are a number of mechanisms by which physical activity may further improve cancer-specific survival in long-term cancer survivors as well as those without prior cancer. Since both cohorts (those with and those without prior cancer) observed apparent cancerspecific survival benefits from physical activity, one explanation might be that physical activity reduces the likelihood of cancer incidence, resulting in fewer cancer deaths. However, in some unpublished analyses from this study we observed no significant relationship between physical activity and cancer incidence (or second primary cancer incidence in the prior-CC). An alternative explanation is that physical activity may play a role in improving prognosis of those who develop an incident or second primary cancer. Evidence supporting this has been reported by a number of researchers (26). Although the biological mechanisms through which this is achieved are still unclear, there are a number of promising avenues. The influence of exercise on host factors such as metabolic hormones, inflammation/cytokines, and immune surveillance have been suggested, as have exercise's effects on certain tumor-related factors such as p27, CTNNB1, CACNA2D3, and L3MBTL1 (26).

By selecting an NCC from the same source cohort who possessed similar age and gender distributions as the cancer survivors, we investigated whether differential physical activityrelated effects on our two outcomes (cancer mortality and all-cause mortality) might exist. Certainly, physical activity has been associated with reduced all-cause mortality rates (27) and cardiovascular-related disease (28–30) or cardiovascular-death (31) in population-based cohorts. Results from our study of a positive association between physical activity and decreasing cancer-specific and all-cause mortality in our NCC reflected previous findings. Moreover, our results suggested the benefits of moderate to high levels of physical activity in decreasing cancerspecific and all-cause mortality were comparable between those with or without a prior cancer. From a health promotion and management perspective, this provides a degree of confirmation that physical activity recommendations for the general public are applicable and beneficial for cancer survivors. Moreover, these benefits were observed in an aggregate cancer survivor group and, while cancer type was accounted for in the analyses, it is likely these benefits of physical activity would apply broadly to survivors of most cancer types.

There are a number of strengths attributable to this study. Foremost was our ability to gain all recorded retrospective and post-survey cancer and mortality records for those who participated in the survey. This allowed an assessment of physical activity levels an average 8.8 years following cancer survivors' initial cancer diagnoses. Second, the in-depth survey of participants allowed adjustment of numerous confounding variables associated with lifestyle, physical, and mental health, which provides a greater reliability in estimating the association between physical activity and the outcomes of interest. In addition, access to non-cancer survey participants (at time of survey) enabled comparison of physical activity influences on the outcomes in participants with and without a prior cancer. It is unlikely there would be many instances of misclassification of cancer outcomes as these were mostly derived from pathology laboratories or radiation oncologists (32). For similar reasons, mortality records are unlikely to be a basis for misclassification bias.

Some limitations exist with this study. Given the self-reported nature of the physical activity measurements, some misclassification of the exposure might exist. Any associated bias is likely reduced by the use of quite broad categories for physical activity (0, <150, 150–59, 360+ min per week) and is unlikely to relate to the outcomes due to the prospective nature of the study. While it made sense to classify physical activity based on recommendations and health promotion messages, our aggregation of low, moderate, and vigorous physical activity in calculating amount of LTPA per week excluded our ability to measure differences in effectiveness between low, moderate, and vigorous levels of physical activity. Moreover, although we adjusted for a number of potential confounders in our analyses, possible confounding may still exist and contribute to the identified associations. In addition, loss to follow-up (between survey and outcome) would likely have been minimal since the individuals were followed up through the Western Australian Health registries—in terms of cancer and/or death. This means that virtually all deaths and cancers reported throughout the follow-up period would be included; however, those occurring outside of Western Australia would presumably be lost to follow-up. Finally, the observational nature of the study does not permit us to infer cause and effect.

In summary, this study suggests physical activity is associated with improved cancer-specific and all-cause survival in longterm survivors of cancer. These associations were comparable in magnitude to those seen in a NCC of similar age and gender, selected from the same source population. Evidence also suggested further benefits in survival may be achieved by those exceeding 360 min of LTPA per week, regardless of an individual's prior cancer status.

### AUTHOR CONTRIBUTIONS

AG, SJ, ST, DT, PC, RN, DJ, NS, KE, and DG: substantial contributions to the conception or design of the work or the acquisition, analysis, or interpretation of data for the work; drafting the

### REFERENCES


work or revising it critically for important intellectual content; final approval of the version to be published; agreement to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

### ACKNOWLEDGMENTS

We wish to thank the Western Australian Department of Health for providing the data used for this investigation. We are furthermore grateful to the Data Linkage Branch of the WA Department of Health for linking the data and facilitating the extractions. DG is funded by a Movember New Directions Development Award obtained through Prostate Cancer Foundation of Australia's Research Program and by the Cancer Council Western Australia Research Fellowship.

# FUNDING

This study was supported by Edith Cowan University internal research.

with stage III colon cancer: findings from CALGB 89803. *J Clin Oncol* (2006) 24(22):3535–41. doi:10.1200/JCO.2006.06.0863


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The handling Editor declared a shared affiliation, though no other collaboration, with several of the authors (SJ and ST) and states that the process nevertheless met the standards of a fair and objective review.

*Copyright © 2017 Gunnell, Joyce, Tomlin, Taaffe, Cormie, Newton, Joseph, Spry, Einarsdóttir and Galvão. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Increasing Incidence of Colorectal Cancer in Adolescents and Young Adults Aged 15–39 Years in Western Australia 1982–2007: Examination of Colonoscopy History

*Lakkhina Troeung1 \*, Nita Sodhi-Berry1,2, Angelita Martini1 , Eva Malacova1,3, Hooi Ee4 , Peter O'Leary5,6, Iris Lansdorp-Vogelaar7 and David B. Preen1*

*1Centre for Health Services Research, School of Population Health, The University of Western Australia, Perth, WA, Australia, 2Occupational Respiratory Epidemiology, School of Population Health, The University of Western Australia, Perth, WA, Australia, 3Department of Health, Safety and Environment, School of Public Health, Curtin University, Perth, WA, Australia, 4Department of Gastroenterology, Sir Charles Gairdner Hospital, Queen Elizabeth II Medical Centre, Nedlands, WA, Australia, 5Health Policy and Management, Faculty of Health Sciences, School of Public Health, Curtin University, Perth, WA, Australia, 6School of Women's and Infants' Health, The University of Western Australia, Perth, WA, Australia, 7Department of Public Health, Erasmus MC University Medical Centre, Rotterdam, Netherlands*

### *Edited by:*

*Jason Scott Turner, University of Cincinnati College of Medicine, United States*

### *Reviewed by:*

*Peter D. Baade, Cancer Council Queensland, Australia Peter John Somerford, Government of Western Australia Department of Health, Australia*

*\*Correspondence:*

*Lakkhina Troeung lakkhina.troeung@uwa.edu.au*

### *Specialty section:*

*This article was submitted to Public Health Policy, a section of the journal Frontiers in Public Health*

> *Received: 01 December 2016 Accepted: 03 July 2017 Published: 24 July 2017*

### *Citation:*

*Troeung L, Sodhi-Berry N, Martini A, Malacova E, Ee H, O'Leary P, Lansdorp-Vogelaar I and Preen DB (2017) Increasing Incidence of Colorectal Cancer in Adolescents and Young Adults Aged 15–39 Years in Western Australia 1982–2007: Examination of Colonoscopy History. Front. Public Health 5:179. doi: 10.3389/fpubh.2017.00179*

Aims: To examine trends in colorectal cancer (CRC) incidence and colonoscopy history in adolescents and young adults (AYAs) aged 15–39 years in Western Australia (WA) from 1982 to 2007.

Design: Descriptive cohort study using population-based linked hospital and cancer registry data.

Method: Five-year age-standardized and age-specific incidence rates of CRC were calculated for all AYAs and by sex. Temporal trends in CRC incidence were investigated using Joinpoint regression analysis. The annual percentage change (APC) in CRC incidence was calculated to identify significant time trends. Colonoscopy history relative to incident CRC diagnosis was examined and age and tumor grade at diagnosis compared for AYAs with and without pre-diagnosis colonoscopy. CRC-related mortality within 5 and 10 years of incident diagnosis were compared for AYAs with and without pre-diagnosis colonoscopy using mortality rate ratios (MRRs) derived from negative binomial regression.

Results: Age-standardized CRC incidence among AYAs significantly increased in WA between 1982 and 2007, APC = 3.0 (95% CI 0.7–5.5). Pre-diagnosis colonoscopy was uncommon among AYAs (6.0%, 33/483) and 71% of AYAs were diagnosed after index (first ever) colonoscopy. AYAs with pre-diagnosis colonoscopy were older at CRC diagnosis (mean 36.7 ± 0.7 years) compared to those with no prior colonoscopy (32.6 ± 0.2 years), *p* < 0.001. At CRC diagnosis, a significantly greater proportion of AYAs with pre-diagnosis colonoscopy had well-differentiated tumors (21.2%) compared to those without (5.6%), *p* = 0.001. CRC-related mortality was significantly lower for AYAs with pre-diagnosis colonoscopy compared to those without, for both 5-year

[MRR = 0.44 (95% CI 0.27–0.75), *p* = 0.045] and 10-year morality [MRR = 0.43 (95% CI 0.24–0.83), *p* = 0.043].

Conclusion: CRC incidence among AYAs in WA has significantly increased over the 25-year study period. Pre-diagnosis colonoscopy is associated with lower tumor grade at CRC diagnosis as well as significant reduction in both 5- and 10-year CRC-related mortality rates. These findings warrant further research into the balance in benefits and harms of targeted screening for AYA at highest risk.

Keywords: colorectal cancer screening, young adults, colonoscopy, colorectal cancer, incidence trends

### INTRODUCTION

Australia and New Zealand have the highest rates of colorectal cancer (CRC) internationally (1). The average age at incident CRC diagnosis is 70 years with sharp increases in incidence from 50 years of age (2). Accordingly, current Australian guidelines recommend biennial CRC screening through fecal occult blood tests commencing from 50 years of age for all asymptomatic average-risk persons (3). In the United States (US), CRC incidence and mortality in persons over 50 years have declined over the past decade owing in part to screening initiatives (4). In particular, increased uptake of screening colonoscopy is suggested to be the main driver of declining CRC rates in this age group (5), with early detection and removal of premalignant lesions yielding significant reductions in CRC morbidity and mortality (6–10).

In direct contrast to trends in those over 50 years of age, an increasing incidence of CRC among adolescents and young adults (AYAs) has been reported internationally (11–15) as well as in Australia (16, 17) over the past two decades. A recent report showed that from 1990 to 2010, CRC incidence increased by between 85 and 100% in Australians aged 20–29 years and by 35% in those aged 30–39 years (17). The mechanisms underlying the rising incidence of CRC among AYAs are currently not well understood (15, 18); however, this increasing trend is a population health concern (18). Given the observed benefits of screening colonoscopy in the older population (5), questions have been raised in relation to current CRC screening practices in younger populations and whether average-risk CRC screening should be initiated at an earlier age (18–20). However, there is currently a lack of empirical data on the impact of screening in age groups <50 years to inform decision-making.

We examined trends in CRC incidence and colonoscopy history among AYAs aged 15–39 years in Western Australia (WA) from 1982 to 2007, before implementation of the National Bowel Cancer Screening Program (NBCSP), using whole-population linked hospital and Cancer Registry data. While the AYA age group is currently exempt from the NBCSP framework, it is possible that raised awareness of bowel cancer through the NBCSP may have impacted screening behaviors in the younger population. We therefore selected 2007 as our endpoint to examine pre-NBCSP colonoscopy history in AYAs. Specifically, we sought to (1) examine temporal trends in age-standardized and age-specific CRC incidence rates, (2) examine colonoscopy history in AYAs, and (3) compare age at diagnosis, tumor grade, and 10-year CRC-related mortality for AYAs with and without a record of pre-diagnosis colonoscopy.

### MATERIALS AND METHODS

### Data Sources

Data were obtained on all persons aged 15–39 years with an incident diagnosis of malignant neoplasm in WA between 1st January 1982 and 31st December 2007, as registered with the WA Cancer Registry (WACR). The age range of 15–39 years for AYA classification is based on that used previously (16, 21). Since 1981, notifications of all malignancies within 6 months of diagnosis have been a statutory requirement in WA, with 86% of cases confirmed histologically (22). Extracted WACR records included information on sociodemographic (age, sex, Indigenous status, and area of residence) and tumor characteristics (diagnosis date, tumor site, morphology, behavior, and grade). Hospital records from 1982 to 2007 for the cohort were obtained through probabilistic matching of WACR records to the WA Hospital Morbidity Data System (HMDS) through the WA Data Linkage System (23). The HMDS is a statutory data collection which captures data for all public and private hospitalizations in WA. All colonoscopies in WA are hospital-based procedures and thus captured in the HMDS. Death records for cohort were also obtained through linkage with the WA Mortality Registry (1982–2007).

### Trends in CRC Incidence

Incident primary cases of CRC were ascertained from WACR records using the International Classification of Diseases (ICD) version 9 with Clinical Modifications (ICD-9-CM) codes (153-154) and ICD version 10 with Australian Modification (ICD-10-AM) codes (C18-C21). Incidence rates were calculated by including only the first-ever primary CRC diagnosis for each person (i.e., subsequent CRC diagnoses, even if at different sites, were not counted). Persons registered in the WACR with another type of malignancy prior to CRC diagnosis were included, with date of first-ever CRC used for the analysis.

Five-year age-specific and age-standardized incidence rates of CRC were calculated for all AYAs and by sex using the number of incident CRC cases for each age group in each period as the numerator and the corresponding WA population for each age group in each period as the denominator. Denominators were obtained from population estimates provided by the Australian Bureau of Statistics (24). Age-standardized rates were adjusted by direct standardization against the 5-year age distribution of the Australian population in the 2001 Census.

Temporal trends in CRC incidence over the study period were investigated using Joinpoint regression analyses (25). Joinpoint analysis uses an algorithm to define segments where statistically significant changes in temporal trends occur. The annual percentage change (APC) in each Joinpoint segment represents the rate of change in cancer incidence per year in a given time period and is calculated using generalized linear models assuming a Poisson distribution (26). Changes in rates include shifts in magnitude or direction where a positive APC indicates an increase in cancer incidence for a given segment while a negative APC indicates decreasing incidence. Joinpoint regression analyses were performed using the Joinpoint Regression Program 4.3.1 from the US National Cancer Institute (25).

### Colonoscopy History

Hospital admissions for colonoscopy were ascertained from any of the 11 procedure fields in HMDS records using ICD-9-CM codes (45.21, 45.22, 45.23, 45.24, 45.25, 45.42, 48.24) for admissions between January 1982 and June 1999 and ICD-10-AM codes (32090-00, 56549-01, 32090-02, 32090-01, 90308-00, 90959-00, 90315-00, 32093-00, 32023-00, 32023-03, 32093-00, 32023-02, 32023-01, 32023-05, 32023-04, 32023-01, 92097-02, 32090-00, 32084-00, 32084-02, 32084-01, 90308-00, 90959-00, 90315-00, 32087-00, 30375-23, 56549-01, 32075-00, 32075-01, 32078-00, 32081-00) for hospitalizations from July 1999 onward. We incorporated a 1-year clearance period which excluded 18 AYAs diagnosed with CRC in 1982. A further 16 cases were excluded as they had no hospital records prior to or during the period of cancer diagnosis from which colonoscopy history could be ascertained.

To describe the cohort's colonoscopy history, we divided all colonoscopies into three categories based on the timing of colonoscopy relative to incident CRC diagnosis. "Pre-diagnosis" colonoscopies were defined as any recorded colonoscopy greater than 6 months preceding the date of incident CRC diagnosis as registered with the WACR. "Diagnostic" colonoscopies were defined as any colonoscopies performed which resulted in a diagnosis of CRC within 6 months. "Post-diagnosis" colonoscopies were defined as any colonoscopy admission occurring after date of incident CRC diagnosis. Due to the limitations of administrative data and ICD coding standards, we were unable to determine whether pre-diagnosis colonoscopies were screening/ surveillance (i.e., asymptomatic) or diagnostic colonoscopies (i.e., symptomatic colonoscopy).

Age and tumor grade at incident CRC diagnosis was compared between AYAs with and without a record of pre-diagnosis colonoscopy using *t*-tests and chi-square tests. Tumor grade was examined as data on cancer stage is not documented in the WACR.

### CRC Mortality

Deaths within 5 and 10 years of incident CRC diagnosis were identified using WA Death Registry records. CRC-related deaths were ascertained from the underlying cause of death field in death records using the following codes: ICD-9-CM 153-154 and ICD-10-AM C18-C21. CRC-related mortality rate ratios (MRRs) were compared for AYAs with and without pre-diagnosis colonoscopy using negative binomial regression to account for overdispersion of death data in Stata 14.0. Analyses were adjusted for sex and age at incident CRC diagnosis and Charlson comorbidity index. We restricted our analysis to only individuals who had 5 (i.e., diagnosed 1982–2002; *n* = 251) and 10 years (i.e., diagnosed 1982–1997; *n* = 234) of follow-up time, respectively. Differential person-years of risk for each person were accounted for by including time at risk as an offset variable in negative binomial models. Analysis of mortality rates was selected over survival rates to minimize the effect of lead-time bias commonly observed in cancer screening studies.

### Ethics Statement

Ethics approval for this study was obtained from the University of Western Australia Research Ethics Committees (reference number: RA/4/1/2228).

# RESULTS

A total of 517 incident cases of CRC among AYAs aged 15–39 years were registered with the WACR between 1982 and 2007. There were 256 females (49.6%) and 261 males (50.4%). Mean age at incident CRC diagnosis was 33.7 ± 5.3 years (range 15.2–39.9 years). CRC accounted for 4.2% of all cancers diagnosed in AYAs between 1982 and 2007 in WA.

### CRC Incidence and Trends

Five-year age-standardized and age-specific incidence rates for CRC in AYAs are presented in **Table 1** alongside Joinpoint regression results using annual incidence data. An increasing trend in age-standardized incidence rates for CRC in AYAs was observed over the study period (**Figure 1**). The overall age-standardized incidence of CRC significantly increased from 2.1 to 4.8 per 100,000 AYA population between 1982 and 2007, APC = 3.0 (95% CI 0.7–5.5), *p* = 0.024 (**Table 1**). The age-standardized incidence of CRC among female AYAs also significantly increased over the study period, APC = 3.4 (95% CI 1.1–5.7), *p* = 0.014. While an increasing trend in CRC incidence was observed for male AYAs, this was not statistically significant, APC = 2.6 (95% CI −1.0 to 5.2), *p* = 0.06.

An upward trend in CRC incidence was observed in all age groups but the 15–19 years category, for both males and females (**Figure 1**). However, none of the trends were statistically significant for males. For female AYAs, significant increases in CRC incidence were observed across all age groups except in the 15- to 19-year group. The greatest APC was observed for younger female AYAs, particularly those aged 20–24 years, APC = 10.1 (3.3–17.5), *p* = 0.014, and 25–29 years, APC = 4.9 (1.8–14.3), *p* = 0.050.

### Colonoscopy History

Colonoscopies were recorded for 77.8% (376/483) of the AYA CRC cohort, with 1,377 total hospital admissions for colonoscopy between 1982 and 2007. Almost a quarter of the cohort had no TABLE 1 | Five-year age-specific and age-standardized and Joinpoint analysis of annual colorectal cancer incidence rates per 100,000 in adolescents and young adults aged 15–39 years in Western Australia during 1982–2007.


*APC, annual percentage change.*

*\*APC is statistically significant at a 0.05 level.*

*a The model with 0 Joinpoints (i.e., 1982–2007) was most optimal in all analyses.*

recorded colonoscopy over the study period (22.2%, 107/483). For these individuals, CRC was diagnosed during surgical procedure with no follow-up colonoscopies recorded over the study period.

The majority of colonoscopies (70.5%, 971/1,377) were performed post-CRC diagnosis for surveillance purposes to prevent metachronous cancer (**Figure 2**). Colonoscopy was uncommon among AYAs prior to CRC diagnosis, with only 6.8% (33/483) of the cohort with any record of pre-diagnosis colonoscopy. Mean age at index colonoscopy for the cohort was 34.3 ± 5.7 years (range: 16–52 years). For the majority of AYAs, the index colonoscopy was performed during the hospital admission where CRC diagnosis was made (70.5%, 265/376). Only 8.8% of AYAs (33/376) had their index colonoscopy in the pre-diagnosis period, while 20.7% (78/376) had their index colonoscopy during treatment follow-up.

### Age and Tumor Grade at Diagnosis

Adolescents and young adults with a recorded pre-diagnosis colonoscopy were significantly younger at index colonoscopy (29.7 ± 6.8 years) compared to those with index colonoscopy at CRC diagnosis (34.8 ± 5.4 years), *p* < 0.001 (**Table 2**). AYAs with pre-diagnosis colonoscopy were also significantly older at time of incident CRC diagnosis (36.7 ± 0.7 years) compared to those with no pre-diagnosis colonoscopy (32.6 ± 0.2 years), *p* < 0.001. At CRC diagnosis, a significantly greater proportion of AYAs with pre-diagnosis colonoscopy had low grade (well-differentiated) tumors (21.2%) compared to those with no pre-diagnosis colonoscopy (5.6%), *p* = 0.001. A greater proportion of AYAs with no pre-diagnosis colonoscopy had high grade (poorly differentiated) tumors (34.1%) compared to AYAs with pre-diagnosis colonoscopy (24.2%), *p* = 0.001.

### Five- and Ten-Year Mortality

A total of 146 and 117 AYAs died within 5 and 10 years of incident CRC diagnosis, respectively (**Table 3**). There was no significant difference in all-cause 5- or 10-year mortality rates for AYAs with and without a pre-diagnosis colonoscopy. CRCrelated 5-year mortality was 56% lower in the group with prediagnosis colonoscopy than those without, MRR = 0.44 (95% CI 0.27–0.75), *p* = 0.045. Similarly, CRC-related 10-year mortality was 57% lower for those with pre-diagnosis colonoscopy compared to those without, MRR = 0.43 (95% CI 0.24–0.83), *p* = 0.043.

### DISCUSSION

While the overall age-standardized incidence of CRC among AYAs in WA remains low (4.8 per 100,000) relative to the overall incidence in all age groups [62 per 100,000 in 2012 (27)], our results show a clear and significant upward trend in CRC incidence in this younger age group. Between 1982 and 2007, a 3.0% annual increase in CRC incidence was observed among AYAs in WA. In particular, CRC incidence in female AYAs rose significantly in all age groups with the exception of those aged 15–19 years.

Increasing trends in CRC incidence were also observed for male AYAs, although trends were not statistically significant.

Our results are consistent with a growing number of studies demonstrating a significant increase in CRC incidence in those aged under 50 internationally. In the US, Bailey et al. (28) recently showed that at the present rate, the incidence of CRC among young adults will almost double by 2030 while simultaneously declining by more than 30% in adults over 50 years of age. The reasons underlying the rise in CRC in the younger population TABLE 2 | Comparison of age and tumor grade at colorectal cancer (CRC) diagnosis between adolescents and young adults (AYAs) with and without prediagnosis colonoscopy (*n* = 483).


*\*p* < *0.05, significantly different from AYAs with no pre-diagnosis colonoscopy.*

are currently not well understood (15, 18). However, modern Westernized lifestyle and behaviors have been implicated as potential contributors, including high consumption of takeaway and processed food and red meat in addition to obesity and low


TABLE 3 | Five- and ten-year colorectal cancer (CRC)-related mortality for adolescents and young adults with and without pre-diagnosis colonoscopy.

*\*p* < *0.05.*

*MRR, mortality rate ratio based on negative binomial regression adjusted for sex and age at incident CRC diagnosis and Charlson comorbidity index; CI, confidence interval.*

physical activity, which are known risk factors for CRC (29–31) and prevalent in contemporary Australian society (29, 32). Although smoking rates among Australian AYAs have reduced drastically over the past two decades (33), excessive alcohol consumption among AYAs has substantially increased (34) and may also partially account for the rising incidence of CRC in this population (35, 36).

Pre-diagnosis colonoscopy was uncommon among AYAs in our cohort with only 6.8% with a recorded pre-diagnosis colonoscopy and 71% being diagnosed with CRC at index colonoscopy. In Australia, national guidelines recommending routine CRC screening in adults over 50 years were introduced in 1999 with the NBCSP subsequently launched in 2006 (37). An Australian report on adults aged 45 years and above showed that screening colonoscopy was associated with a 50% reduction in risk of subsequent CRC diagnosis compared to no screening (38). In the US, successful implementation of CRC screening programs in the older population have been credited as the main driver of declining CRC rates in those aged above 50 years (5, 39). Austin et al. (5) demonstrated a significant inverse correlation between state-level APC of CRC incidence and colonoscopy rates in the US between 1998 and 2009 in adults aged 50 years and over. Specifically, states with greater reduction in CRC incidence rates over the study period tended to have higher rates of screening colonoscopy. A significant inverse correlation between CRC mortality rates and CRC screening rates between 1990–1994 and 2003–2007 has also been demonstrated in the older US population (39).

Interestingly, a number of studies have found that AYAs with CRC exhibit more advanced disease at diagnosis compared to older adults and receive more aggressive cancer treatment (15, 40–42). While it is currently unclear why this phenomenon occurs, some researchers have suggested that young-onset CRC may represent a different, more aggressive underlying disease process compared to later-onset CRC (43), although robust evidence of a more rapid adenoma-carcinoma sequence in younger adults is yet to be established. Others have implicated the absence of routine screening in this age group. As younger persons are currently omitted from routine CRC screening, CRC is typically detected in younger patients only when it becomes symptomatic or emergent and generally at more advanced stage of disease (15, 18, 20, 42). Thus, more aggressive treatment is required due to delayed diagnosis (42). Consistent with this hypothesis, we found that just under a quarter of our AYA cohort were likely emergency presentations with no admission for colonoscopy prior to CRC diagnosis and incident diagnosis made during a surgical procedure. Over 60% of our cohort had moderately or poorly differentiated tumors at CRC diagnosis. The opportunity for cancer prevention through detection and removal of premalignant lesions is also not available to young Australians. A recent study forecasted that due to late detection and accelerated progression of disease, CRC patients younger than 50 years will have the worst outcomes of any age group (20).

While colonoscopy prior to CRC diagnosis was uncommon among AYAs in our cohort, our results highlight some potential benefits of pre-diagnosis colonoscopy for younger adults, which may warrant further investigation. On average, AYAs with a pre-diagnosis colonoscopy were diagnosed with CRC at an older age relative to those with no pre-diagnosis colonoscopy history. Over 20% of AYAs with pre-diagnosis colonoscopy had welldifferentiated tumors at presentation compared to only 5% of those without. Moreover, both 5- and 10-year CRC-related mortality rates were reduced by over 50% for AYAs with pre-diagnosis colonoscopy compared to AYAs without any colonoscopy history prior to diagnosis. These findings likely highlight the opportunity for early detection and removal of any premalignant adenoma through pre-diagnosis colonoscopy which could both delay CRC onset and enhance survival.

Our current findings add to an emerging body of research calling for action to address the rising incidence of CRC in the younger population (17, 18, 42). While the simplest suggestion may be to initiate average-risk CRC screening at an earlier age given the demonstrated benefits of screening colonoscopy in the older population (44), the costs and risks of widespread application of colonoscopy screening need to be carefully balanced with potential benefits (18, 44). CRC screening in average-risk persons younger than 50 years is unlikely to be cost-effective given that young-onset CRC comprises less than 7% of all CRC cases (19).

Risk-stratified screening for CRC in the average-risk population is a growing area of interest and may offer the most optimal solution (45–47). Current CRC screening models assume equal risk of CRC in the average-risk population with undifferentiated screening approaches for adults aged 50 years and above. However, research suggests that the population presently considered at "average risk" is not homogenous in terms of CRC risk and could be further stratified into distinct risk groups with tailored screening approaches and intervals for each risk level (46, 48, 49). Tailored screening for AYAs with higher than average risk for CRC likely offers a more cost-effective method of CRC screening for this group. A number of risk stratification models for advanced neoplasia and CRC have been developed in recent years; however, most are developed for the older population and their current predictive power is suboptimal (48). To better target population level screening interventions for CRC, future risk models need to simultaneously consider the average-risk population under 50 years given the demonstrated rising incidence of CRC in this age group. The challenge for researchers and policymakers remains how to best identify persons, including AYAs, at-risk of CRC and for whom early screening would be beneficial (42).

### Limitations and Directions for Future Research

Our findings show an increasing trend in CRC incidence in WA over 25 years; however, trends over the most recent decade could not be explored due to lack of post-2007 data as our analysis was based on an existing data source with end date of 2008. However, our results are consistent with other Australian and international research (11–17) showing a rising incidence of CRC in the AYA population over recent years. To date, trends in CRC incidence among Australian AYAs have only been explored to 2010 (17), with very limited other research examining colonoscopy use and costs and benefits in the younger population. Future research examining CRC incidence trends and colonoscopy uptake in Australian AYAs over the most recent decade will provide valuable insight into whether extending average risk screening into the younger population is warranted. Other limitations include we were unable to quantify the number of Lynch syndrome cases and investigate trends in hereditary vs. sporadic CRC cases over time as the WACR does not document this data, and we were

### REFERENCES


unable to examine cancer stage at presentation in our analyses as this data is not collected by the WACR.

# CONCLUSION

In summary, our study found a growing increase in CRC incidence in AYAs in WA. Pre-diagnosis colonoscopy was rare in AYAs but where performed it was associated with later age and lower tumor grade at diagnosis and a greater than 50% reduction in CRC-related mortality within 10 years of incident diagnosis. Future research identifying strategies for early CRC detection in the AYA population is warranted.

# ETHICS STATEMENT

Ethics approval for this study was obtained from the University of Western Australia Research Ethics Committees (reference number: RA/4/1/2228).

# AUTHOR CONTRIBUTIONS

LT designed the study, designed and performed the statistical analysis, and drafted and revised the manuscript. NS-B, AM, EM, and IL-V revised the draft manuscript. HE and PO obtained funding and revised the draft manuscript. DP obtained funding, revised the draft manuscript, and provided study supervision.

# FUNDING

This study was supported by a Cancer Council Western Australia Capacity Building and Collaboration Grant.

colorectal cancer: a multicentre randomised controlled trial. *Lancet* (2010) 375(9726):1624–33. doi:10.1016/S0140-6736(10)60551-X


the Colon Cancer Family Registry. *Cancer Res* (2016) 76(14 Suppl):3425. doi:10.1158/1538-7445.AM2016-3425


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2017 Troeung, Sodhi-Berry, Martini, Malacova, Ee, O'Leary, Lansdorp-Vogelaar and Preen. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Variation in Population Vulnerability to Heat Wave in Western Australia

*Jianguo Xiao1 \*, Tony Spicer1 , Le Jian1,2, Grace Yajuan Yun1 , Changying Shao1 , John Nairn3 , Robert J. B. Fawcett4 , Andrew Robertson1 and Tarun Stephen Weeramanthri1*

*1Public Health Division, Department of Health, Government of Western Australia, Perth, WA, Australia, 2School of Public Health, Curtin University, Perth, WA, Australia, 3Australian Bureau of Meteorology, Adelaide, SA, Australia, 4Australian Bureau of Meteorology, Melbourne, VIC, Australia*

Heat waves (HWs) have killed more people in Australia than all other natural hazards combined. Climate change is expected to increase the frequency, duration, and intensity of HWs and leads to a doubling of heat-related deaths over the next 40 years. Despite being a significant public health issue, HWs do not attract the same level of attention from researchers, policy makers, and emergency management agencies compared to other natural hazards. The purpose of the study was to identify risk factors that might lead to population vulnerability to HW in Western Australia (WA). HW vulnerability and resilience among the population of the state of WA were investigated by using time series analysis. The health impacts of HWs were assessed by comparing the associations between hospital emergency department (ED) presentations, hospital admissions and mortality data, and intensities of HW. Risk factors including age, gender, socioeconomic status (SES), remoteness, and geographical locations were examined to determine whether certain population groups were more at risk of adverse health impacts due to extreme heat. We found that hospital admissions due to heat-related conditions and kidney diseases, and overall ED attendances, were sensitive indicators of HW. Children aged 14 years or less and those aged 60 years or over were identified as the most vulnerable populations to HWs as shown in ED attendance data. Females had more ED attendances and hospital admissions due to kidney diseases; while males had more heat-related hospital admissions than females. There were significant dose–response relationships between HW intensity and SES, remoteness, and health service usage. The more disadvantaged and remotely located the population, the higher the health service usage during HWs. Our study also found that some population groups and locations were resilient to extreme heat. We produced a mapping tool, which indicated geographic areas throughout WA with various vulnerability and resilience levels to HW. The findings from this study will allow local government, community service organizations, and agencies in health, housing, and education to better identify and understand the degree of vulnerability to HW throughout the state, better target preparatory strategies, and allocate limited resources to those most in need.

Keywords: heat wave, vulnerability, socioeconomic status, geographical variation, morbidity, mortality, Western Australia

*Edited by:* 

*Edward Broughton, University Research Co., USA*

### *Reviewed by:*

*Woohyun Yoo, Incheon National University, South Korea (Republic of) Maria Anastasova Semerdjieva, Plovdiv Medical University, Bulgaria*

> *\*Correspondence: Jianguo Xiao jianguo.xiao@health.wa.gov.au*

### *Specialty section:*

*This article was submitted to Public Health Policy, a section of the journal Frontiers in Public Health*

*Received: 16 December 2016 Accepted: 15 March 2017 Published: 03 April 2017*

### *Citation:*

*Xiao J, Spicer T, Jian L, Yun GY, Shao C, Nairn J, Fawcett RJB, Robertson A and Weeramanthri TS (2017) Variation in Population Vulnerability to Heat Wave in Western Australia. Front. Public Health 5:64. doi: 10.3389/fpubh.2017.00064*

# INTRODUCTION

A heat wave (HW) is a prolonged period of excessively hot weather. Heat waves have caused more deaths in Australia since European settlement than all other natural hazards combined, and are predicted to increase in frequency, duration, and intensity, with a doubling of the number of HW-related deaths in the next 40 years (1, 2). Currently, there is no standardized definition for HW internationally or among different jurisdictions in Australia. A recent study conducted in Western Australia (WA) found that the excess heat factor (EHF) metric was the best HW indicator among the three metrics examined to predict greatest health service demand (3). That study's outcomes were only based on Perth's metropolitan population, which required new studies to test the validity of EHF for the whole of WA.

Heat waves typically affect large geographical areas over the course of three or more days. Many jurisdictions, including WA, have created extreme heat emergency management plans to respond to HW events. With large populations and limited resources, many jurisdictions lack the precision to target the most at risk populations with appropriate public health interventions, and many HW plans are based on assumptions and research from other states and countries. There has been little verification of whether a particular population group is at higher risk or even resilient to HWs, although acclimatization, individual susceptibility, and community and geographical characteristics all affect heat-related effects on mortality and morbidity (2, 4–7). Past epidemiological studies have established consistently identifiable vulnerable groups to extreme heat. Young children and the elderly are commonly identified at high risk of morbidity and mortality during the period of HWs (8–11), whereas people with renal, respiratory, and cardiovascular conditions are susceptible to heat due to hyperthermia and dehydration (12–14). However, there have only been a limited number of studies examining the geographical variation and effects of socioeconomic status (SES) on people's response to HW.

Western Australia is the largest state in Australia with varying geographic features and climates that range from temperate areas in the south to tropical areas in the north. Seventy-eight percent of the population is based in the Perth metropolitan area with the remaining 22% scattered throughout regional and remote areas. To improve preparedness and response arrangements for HWs, there is a need to determine which populations are at higher risk of heat exposure and what are the risk factors related to it. Our study aims to characterize the relationship between HW intensity and health service demand of different population groups in different regions of WA. By identifying vulnerable populations, agencies, service providers, and local government authorities can better target their limited resources to those populations most at risk.

### MATERIALS AND METHODS

Western Australia is Australia's largest state with an area of more than 2,500,000 km2 and over 12,500 km of coastline. It has a population of approximately 2.6 million people (15). The southwest corner of the state has a mediterranean climate (i.e., hot dry summers and cooler wet winters) where about 85% of the WA population lives. The central four-fifths of the State are semiarid or desert and are lightly inhabited. An exception to this is the northern tropical region that has an extremely hot monsoonal climate.

### Derive HW Intensity, Adjust for Delayed Effects of HW, and Identify Significant Health Service Usage Measures

Heat waves were measured using HW intensity at each geographical area represented by statistical local area (SLA) in WA. A HW day, defined by an EHF, was defined as the exceedance of the previous 3-day mean daily temperature (DT) above the 95th percentile threshold, multiplied by the difference between the 3-day mean DT and the mean of the prior 30 days. Nairn and Fawcett (16) provide the full equation. The EHF was then normalized and expressed as a heat wave severity index (HWSI), dividing the EHF by the long-term 85th percentile of positive EHF at every location. The HWSI data at SLA level were sourced from the Australian Bureau of Meteorology (BoM). The BoM identifies severe HWs when HWSI is greater than 1, which becomes extreme when HWSI is greater than 3. In our analysis, severe and extreme HW days were combined to severe/extreme HW days, as the counts for extreme HW days were very small and not suitable for a separate statistical analysis. Low-intensity HW days occur if the HWSI value was between 0 and 1; and non-heat wave days were defined as having a HWSI less than or equal to 0. The EHF was calculated over a 3-day period (16), and we applied the EHF value to the first day in an attempt to identify any possible delayed effects of HW.

Time series design was used for the study. The daily health service usage data from 1 November 2006 to 30 April 2015 for warm months (November to April in the following year) for the whole of WA was obtained and measured from following three datasets: (1) hospital admission data from WA hospital morbidity data system (HMDS), including overall (all admissions), cardiovascular diseases (defined as having a principal diagnosis of ICD-10-AM seventh Edition codes between I00-I99 plus G45), respiratory diseases (J00–J99), kidney diseases (N10–N19), and heat-related diseases (having a principal or any additional diagnosis of L55, L74.0, T67, X30, or X32); (2) overall emergency department (ED) attendance data from WA ED data collection; and (3) death data from WA registry of births, deaths and marriages. The chosen HMDS conditions were based on existing literature where conditions were identified as having a strong association with HWs (10, 17). The health service utilization rates in different HW intensities were compared with those during non-HW days for all aforementioned conditions.

Estimated resident populations (ERPs) by age group, gender, and SLA were sourced from Australian Bureau of Statistics. The monthly populations were computed using a linear interpolation method, based on mid-year ERPs. Such populations were then applied to all the days in the month of the study period. Daily health service usage rates were calculated by age group, gender, and SLA. The total population covered in the study period (i.e., warm months) from 1 November 2006 to 30 April 2015 was 11,698,702 person-years. To assess for the delayed effects of HW on health service usage, service usage rates were first derived by dividing daily service usage counts by daily populations on the same day as the HW day, or 2- to 30-day cumulative counts divided by corresponding cumulative populations.

A Pearson correlation analysis was then conducted between the EHF for a day (i.e., first day of the 3-day period) and its corresponding health service usage rate of that day. The rates or cumulative rates with the highest positive correlation with a significance value of 0.05 or less were then selected and used in the bivariate and multivariate statistical analyses to assess the potential risk factors of high service usage during HW exposure. Only health service usage measures with significant correlation with EHF were included in the further analyses below.

Sensitivity of datasets to HW was examined, and only results identified as having significant association with EHF will be reported in this paper.

### Determine Risk Factors and Interactions between Risk Factors and HW Intensity

A literature review was conducted to identify potential risk factors of HW. Age, gender, HW intensity, SES [measured by the socioeconomic index for areas (SEIFA)], and service accessibility [measured by the accessibility/remoteness index of Australia (ARIA)] were identified as key risk factors in the WA context.

Health service usage measures with significant associations with EHF among different population groups during HW days were compared with those during non-heat wave days. *Via* bivariate analyses, the interactive effects of HW and risk factor were examined to identify vulnerable groups. Risk factors included age group (0–14, 15–59, and 60+ years), gender, SEIFA, ARIA for 2011, and geographical areas [local government areas (LGAs)] sourced from the ABS.

Poisson regression modeling was then used to evaluate the potential association between HW intensity and the number of presentations to EDs and inpatient admissions for heat-related causes. In the models, daily health service usage counts by age group, gender, and SLA were used as a dependent variable and regressed on all potential risk factors described above. The offset variable was daily populations by age group, gender, and SLA. Where an excess of 0 count was identified, zero-inflated Poisson regression was used. Where there was an over-dispersion of counts of health service usage, negative binomial regression was applied.

The interactions between each risk factor and HW intensity were assessed in the regression models. Variables such as public holidays and weekend days were also included in the model to adjust for their confounding effects when assessing the vulnerability of populations to HW.

### Determine Geographical Variations Using Composite Rankings

To compare the health service utilization rates among different geographical regions, both crude rates and age standardized rates (ASRs) were calculated. The 2001 Australian standard population was used for standardization.

Where a health service usage indicator was identified as having a significant association with HW, it was further examined by LGAs. To summarize the overall impact (combined effect) of HW on three significant health service usage indicators (i.e., overall ED attendances, hospitalizations due to heat-related episode, and chronic kidney disease) in different LGAs, the following formula was used to derive a composite score for each LGA.

$$\text{Composite score} = \sum\_{j=1}^{3} \text{DASR}\_{j} \times \frac{\text{RR}\_{j}}{\sum\_{i=1}^{3} \text{RR}\_{i}}$$

where DASR*j* is the difference of ASRs for a particular health service usage indicator between HW days and non-HW days, and RR*j* is the relative risk between HW and non-HW days for that health service usage indicator. Finally, the composite score was divided into five quantiles representing the least, small, median, high, and highest impact of HW for a particular LGA with the highest impact areas being defined as hotspots.

Significant difference was defined as having a *p*-Value <0.05. SAS Enterprise version 5.1 (SAS Institute Inc.) was used for statistical analysis.

Ethics approval for this project was obtained from the WA Department of Health Human Research Ethics Committee. Health service utilization and mortality data are routinely collected by the Department of Health WA. This study was given approval to access and analyze de-identified data to ensure that patient confidentiality was maintained.

### RESULTS

Only results related to ED attendances and hospitalizations due to heat and kidney diseases are presented here, as measures in other datasets were not identified as having significant association with the EHF.

### Association between Risk Factors and Health Usage Measures

**Table 1** shows the association between each of the main risk factors and their associations with HW intensity for health service usage measures without adjusting for other risk factors. Only those measures that had a significant association with HW intensity were included. A dose–response relationship between measured health service usage rates and HW intensity was apparent regardless of age group, gender, SEIFA, and ARIA. The more intense the HW, the higher the health service usage rates. For hospitalization, there was also a strong dose–response relationship between age and rates under each HW intensity category. The older the population group, the higher the health service usage rates. However, young age (0–14 years) was more vulnerable to heat than the other two age groups in terms of ED attendance.

Males had higher heat-related hospitalization and ED attendance rates than females. However, females had higher rates of hospitalization due to kidney diseases.

There was an apparent dose–response relationship between health service usage rates and SEIFA categories. Overall, the more socially advantaged the population, the lower the rate.


### TABLE 1 | Crude health service usage rates and 95% CIs by risk factors and HW intensity, November 2006–April 2015, Western Australia.

*a Reference category for HW intensity; HW, heat wave.*

*Bold numbers denote a higher rate during for the low or severe/extreme HW intensity days compared to non-HW days.*

The rates during low intensity or severe/extreme HW days were significantly higher than those during non-HW days.

There was also an apparent dose–response relationship between service accessibility and heat-related hospitalization rate. The less remote a population, the lower the health service usage rate.

# Identify Vulnerable Populations through Adjusting for Risk Factors *via* Regression Analyses

**Table 2** presents the final regression analysis results showing risk factors and their interaction with HW intensity when examining effects of HW on health service usage measures. Only risk factors with significant interaction with HW intensity were included in the final results.

First, we observed that there was an apparent dose–response relationship between the HW intensity and health service usage rates. The more intense the HW, the higher the health service usage rates. Those aged 15–59 and 60 years and over were more at risk of heat- or kidney disease-related hospital admissions than those aged 0–15 years. Meanwhile, those aged 0–14, and 60 years and over, had higher chance to attend ED than those aged 15–59 years.

Males had nearly two times higher heat-related hospitalization rates than females. However, females had higher kidney diseaserelated hospitalization rates and ED attendance rates than males with HW exposure.


TABLE 2 | Adjusted rate ratios and 95% CIs of risk and confounding factors for health service usage measures, November 2006 to April 2015, Western Australia.

*a Under these headings, any cells without RRs and 95% confidence intervals (CIs) are reference categories, and in the brackets are 95% CIs; HW, heat wave.*

There was an apparent dose–response relationship between SEIFA and all three health service usage measures. The more disadvantaged the population, the higher the rate of health service usage.

There was also an apparent dose–response relationship between ARIA and heat-related hospitalization, ED attendance, and kidney disease-related hospitalization rates overall. The less accessible services were, the higher the health impact.

Emergency department attendance rates were higher during pubic holidays and weekend. However, there were no statistically significant differences in admission rates for heat- and kidney disease-related hospital admissions between these two periods. Other variables, such as year and month, were also used to adjust the possible impact of these confounding factors on the three health service utilization rates. Interactions of age and gender with HW effect were also examined. For details of their impact, refer to **Table 2**.

## Identify Geographical Variation in Population Response to HW

**Figure 1** shows the composite ranking of the effects of HW by LGA in WA. Only three significant health service usage measures were included in the calculation of the composite ranking. In the populous Perth metropolitan area (as shown in the insert in the map), the overall impacts of HW were between small to high. In the majority of the southern areas, there was a higher impact from HW than the northern areas. However, the highest impact areas were all located in regional and remote areas.

### DISCUSSION

### Sensitivity of Data Sources/Conditions

Outcomes from this study indicated that the heat-related hospitalizations and overall ED presentations were the two most sensitive measures for assessing the impact of HW on health services. Hospital admissions due to kidney diseases were also sensitive. However, overall hospital admissions, hospital admissions due to cardiovascular and respiratory diseases, and all-cause deaths were not sensitive to HW. Similar findings were observed in other studies between HW and ED attendances and hospital admissions due to kidney diseases (18, 19). We also found that the effect of HW on health service indicators examined were not usually immediate and different data sources and conditions had diverse delayed effects of HW. Overall ED attendances and heat-related hospitalizations showed an early effect of HW within 3 and 5 days of a HW event, respectively. This is consistent with some previous studies where lag effects of HW were apparent with a short lag effect for ED attendance data (14, 20, 21). However, kidney disease-related hospital admissions reached their peak rate 25 days after a HW event.

The different lag effect in different data sources is most likely due to varying patient case-mix and structure of the general population involved. In heat-related hospitalizations, only records with heat-related conditions were included. These data sources may potentially fail to identify patients affected by HW but who present to hospital due to exacerbation of pre-existing comorbidities. Although we have excluded elective patients from hospitalization data, in an attempt to identify hospitalizations potentially related to heat exposure, we could not identify an apparent association between all-cause hospitalizations and HW intensity. In ED attendance data, however, all patients were included and potentially heat-related conditions would be included.

Indicators such as heat-related hospitalizations and overall ED attendances can provide responding agencies with insight into the impact of HW on health services. ED datasets are rapidly accessible and could be used for syndromic surveillance. However, hospitalization and mortality data are usually not available for up to 6 months or even longer, which render them unsuited for timely identification of HW-related vulnerable populations and activation of emergency response strategies. Findings from this study reinforce the response strategy of using rapidly accessible ED data to monitor heat-related health impacts during HW events. Therefore, the design of HW service provision must take into account the sensitivity and timeliness of data sources.

### Resilience and Vulnerability to HW

Our study confirmed that age was an important risk factor for HW: people aged 60 years and over were more vulnerable to HW than other age groups and attended health services more frequently, and young people aged 14 and less were more vulnerable to HW for ED services. Gabriel and Endlicher (10) and Tong et al. (11) indicated that the elderly may suffer more due to poor thermoregulation and hormonal changes. Older people with chronic diseases such as cardiovascular and respiratory diseases were particularly vulnerable (22, 23). Previous studies also found that children were at high risk of morbidity and mortality during HWs (8, 9, 14, 24) and children's inability to lose heat through sweating could cause convulsions and disorientation (17).

Anderson et al. (25) found that there was no significant difference between the vulnerability of males and females during HWs in relation to respiratory hospitalizations. However, our study did identify a significant difference between males and females in heat- and kidney disease-related hospitalizations. Our study observed a higher impact on males than females in heat-related hospitalizations while the study from Rainham and Smoyer-Tomic (26) observed that females had a higher relative risk of mortality than males. This may be due to more men working outside when there is a HW; however, the exact reason warrants further exploration.

Previous studies found that chronic diseases are also risk factors for increased health service utilization among people in extreme heat weather (12–14). People with chronic kidney and cardiovascular conditions are among the most susceptible to heat due to hyperthermia and dehydration. Although our study did not identify a strong association between the rate of hospitalization due to cardiovascular conditions and heat, we did find a strong association between the rate of hospitalization due to chronic kidney diseases and HW intensity, consistent with findings of Nitschke et al. (18) and Williams et al. (19).

Our findings on the main risk factors for HW morbidity were consistent with those identified by Reid et al. (27), which included SES (as indicated by education and poverty), social isolation, and

proportion of elderly. The importance of SES in the evaluation of effects of HW was highlighted in several studies of vulnerability to HW (11, 28). Overall, populations with lower SES, poor accessibility to services, and older or younger age groups have higher vulnerability to HWs. The more risk factors in a population, the higher its vulnerability due to the additive feature of the regression models we applied. That means, the contribution of each risk factor would be added up to create a greater health utilization rate. Such vulnerable groups should be the main focus in the development and implementation of HW-related health promotion programs by relevant government and non-government agencies.

The possible joint effects of HW and age or gender were examined in this study and the regression modeling results showed in **Table 2**. The associations of risk factors and HW intensity were more complicated than expected. For example, the interaction between those aged over 60 years and the intensity of HW on heatrelated hospitalization showed a clear dose–response relationship. However, the interaction between the two did not show an apparent dose–response relationship in age group 15–59 years. Instead, the heat-related hospitalization rate increased significantly in age group 15–59 years during low intensity HW exposure. Similarly, patterns were observed on interaction analysis between HW and males on heat-related hospitalization and kidney diseases-related hospitalization. Whether such a pattern was due to the impact from other unexamined risk factors warrants further exploration.

### Regional Differences

This study was able to reinforce some assumptions on HW vulnerability and resilience in regional areas. Depending upon the data sources and conditions, regional responses to HW varied. For example, residents living in far north LGA regions (those with blue colors in **Figure 1**) with hot dry summer/cool or cold winter climate were least impacted by HWs. However, those living in LGAs with hot dry summer/mild winter climate were more vulnerable to HWs. Physiological acclimatization is likely to be an important factor limiting heat-related health service usage in hot humid or hot dry environment (4), and our study partially confirmed such an observation. However, the sensitivity of data should also be considered for obtaining most suitable health service indicators to explore the effect of HW on health service utilization.

All these regional differences in the study are most likely related to residents' acclimatization, the region's SES, accessibility to services, and age/gender and ethnicity distribution of the population as described in other studies (29–31). Which of these factors have played the most important role, and how they interact with each other, warrants further study.

The use of weighted ranking of difference in ASRs between HW and non-HW days by LGA allows us to combine the effects of HW on three data sources/conditions that showed a strong association with effects of HW. Local governments are the main government agencies who would implement the HW strategies. The hotspots identified *via* composite scores will be more reliable than a single health service usage measure in assisting local government agencies to allocate limited resources to those in most need.

# Policy Implications for Emergency Management

As Michelozzi et al. (32) indicated in the consideration of global climate change that the impacts of heat on health will assume greater public health significance in future. The results from this study have significant policy implications for emergency management of HW. This study identified at risk population groups and provided a visual display mapping tool of HW vulnerability and resilience to assist local governments and emergency management regions.

By demonstrating the areas of greatest vulnerability, responding agencies are able to better target prevention and preparedness programs to those most in need. The findings from this study can also be used by local government authorities to better target, engage, and represent the needs of identified at risk groups within their boundaries. The geographical breakdown of HW risk factors will allow responding agencies to better understand and contextualize areas of vulnerability to HWs within their community and appropriately tailor community awareness programs, appropriate risk communication, and HW response plans to the needs of the community.

It is also worth noting that, in the design of the health promotion programs to tackle HWs, identified risk factors should be considered together, so that the programs can be implemented effectively and in an integrated fashion.

# Limitations and Future Directions

The study did not include factors such as air quality measures and their potential interaction with HW intensity measures, as indicated in several studies (33, 34). This study did not include aboriginality as a risk factor, although this is a population group that experiences high rates of chronic kidney and cardiovascular disease (35), and social disadvantage (36). In addition, we did not adjust for the effect of green space on the health outcomes due to unavailability of such data in a vast state such as WA.

Limited diagnostic information in ED data prevented further examination of HW effects on populations with different causes of ill health. Improvements in ED data collection, particularly of diagnostic information, should be considered so that health education messages can effectively target higher risk groups.

Further studies are needed to explore the effects of HW on various disease conditions and on possible mechanisms that explain why populations living in different geographic locations have varied responses to HW. It is also important to conduct evaluation studies to assess the effectiveness of current preventative programs in relation to HW.

# AUTHOR CONTRIBUTIONS

JX, TS, LJ, GY, and CS designed the study and contributed to draft and revision. JX, LJ, GY, and CS analyzed this work. JN and RF compiled the meteorological data used in the study. JN, RF, AR, and TW provided contribution to drafts and also revisions. All the authors confirmed the last version.

### ACKNOWLEDGMENTS

The authors thank Peter Somerford and Shannon Carter of Epidemiology Branch, Department of Health WA for their support to the study; Dimpal Patel, master student of Curtin

### REFERENCES


University of Technology for identifying relevant literature; and National Extreme Heat Warnings project members from University of Adelaide, Departments of Health New South Wales and South Australia for fruitful discussion on the methodology and issues in relation to the study.


determinants of health in Australia, Canada and New Zealand from 1981–2006. *BMC Public Health* (2014) 14:201. doi:10.1186/1471-2458- 14-201

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2017 Xiao, Spicer, Jian, Yun, Shao, Nairn, Fawcett, Robertson and Weeramanthri. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Improving the Estimation of Risk-Adjusted Grouped Hospital Standardized Mortality Ratios Using Cross-Jurisdictional Linked Administrative Data: A Retrospective Cohort Study

*Katrina Spilsbury1 \*, Diana Rosman1,2, Janine Alan1 , Anna M. Ferrante3 , James H. Boyd3 and James B. Semmens1*

*<sup>1</sup> Centre for Population Health Research, Curtin University, Perth, WA, Australia, 2Data Linkage, Department of Health WA, Perth, WA, Australia, 3PHRN Centre for Data Linkage, Centre for Population Health Research, Curtin University, Perth, WA, Australia*

### *Edited by:*

*Simone Rauscher Singh, University of Michigan, USA*

### *Reviewed by:*

*Harry Staines, Sigma Statistical Services, UK Afolaranmi Olumide Tolulope, Jos University Teaching Hospital, Nigeria*

*\*Correspondence:*

*Katrina Spilsbury katrina.spilsbury@curtin.edu.au*

### *Specialty section:*

*This article was submitted to Public Health Policy, a section of the journal Frontiers in Public Health*

*Received: 14 November 2016 Accepted: 23 January 2017 Published: 08 February 2017*

### *Citation:*

*Spilsbury K, Rosman D, Alan J, Ferrante AM, Boyd JH and Semmens JB (2017) Improving the Estimation of Risk-Adjusted Grouped Hospital Standardized Mortality Ratios Using Cross-Jurisdictional Linked Administrative Data: A Retrospective Cohort Study. Front. Public Health 5:13. doi: 10.3389/fpubh.2017.00013*

Background: Hospitals and death registries in Australia are operated under individual state government jurisdictions. Some state borders are located in heavily populated areas or are located near to major capital cities. Mortality indicators for hospital located near state borders may not be estimated accurately if patients are lost as they cross state borders. The aim of this study was to evaluate how cross-jurisdictional linkage of state hospital and death records across state borders may improve estimation of the hospital standardized mortality ratio (HSMR), a tool used in Australia as a hospital performance indicator.

Method: Retrospective cohort study of 7.7 million hospital patients from July 2004 to June 2009. Inhospital deaths and deaths within 30 days of hospital discharge from four state jurisdictions were used to estimate the standardized mortality ratio of hospital groups defined by geography and type of hospital (grouped HSMR) under three record linkage scenarios, as follows: (1) cross-jurisdictional person-level linkage, (2) withinjurisdictional (state-based) person-level linkage, and (3) unlinked records. All public and private hospitals in New South Wales, Queensland, Western Australia, and public hospitals in South Australia were included in this study. Death registrations from all four states were obtained from state-based registries of births, deaths, and marriages.

Results: Cross-jurisdictional linkage identified 11,116 cross-border hospital transfers of which 170 resulted in a cross-border inhospital death. An additional 496 cross-border deaths occurred within 30 days of hospital discharge. The inclusion of cross-jurisdictional person-level links to unlinked hospital records reduced the coefficient of variation among the grouped HSMRs from 0.19 to 0.15; the inclusion of 30-day deaths reduced the coefficient of variation further to 0.11. There were minor changes in grouped HSMRs between cross-jurisdictional and within-jurisdictional linkages, although the impact of cross-jurisdictional linkage increased when restricted to regions with high cross-border hospital use.

Conclusion: Cross-jurisdictional linkage modified estimates of grouped HSMRs in hospital groups likely to receive a high proportion of cross-border users. Hospital identifiers will be required to confirm whether individual hospital performance indicators change.

Keywords: cross-jurisdictional record linkage, hospital standardized mortality ratios, risk adjustment, epidemiology, cohort studies

## INTRODUCTION

Advances in information technology are changing the research environment in public health with increasing access to affordable, large, and complex administrative and surveillance health datasets. The potential of such data to improve population health outcomes is undisputed as whole populations can be followed more precisely in time and space. It has been proposed that precision public health could have particular benefit in preventative health with earlier detection and more precise risk estimates (1). However, the ethical and legal responsibility of protecting individual confidentiality must be balanced against the health benefits as these large amounts of data are brought together.

Following a \$20 million government investment strategy, the Population Health Research Network (PHRN) was established to develop an accurate, reliable, and load-bearing national capability for data linkage in Australia. In 2009, the Centre for Data Linkage (CDL) was established within Curtin University and it provides the secure data linkage infrastructure necessary for cross-jurisdictional linkage of health-related data in Australia (2). The PHRN commissioned several proof of concept projects to demonstrate the feasibility and benefit of linking large datasets from across the country; the findings presented are from the first of these projects with the aim of demonstrating how estimation of the hospital standardized mortality ratio (HSMR) can be improved through crossjurisdictional linkage.

Deaths in hospitals have long been of interest as an indicator of the quality of hospital care. The HSMR is an attempt to measure whether a hospital has a higher (or lower) number of hospitalrelated deaths relative to the overall mortality experience. HSMR is calculated by dividing the observed number of deaths by the expected number of deaths in that hospital. The expected number of deaths is estimated as the average of all deaths in all hospitals after accounting for case-mix variation by a range of possible risk-adjustment methodologies.

Hospital standardized mortality ratios as a measure of hospital quality of care have been the subject of considerable debate as to their value and how they should be used. It has been argued that HSMRs are a poor indicator of quality of care for several reasons. First, risk adjustment usually relies on variables collected from administrative data and not all may have been identified and reported accurately (3); second, a non-constant association of case-mix variables with death across hospitals could result in biases referred to as the constant risk fallacy (4), third, the statistical phenomenon that smaller hospitals are more likely to occur at the top and bottom of league tables (5), fourth, the fact that most hospital deaths are not avoidable means there is low signal to noise ratio in trying to assess the rarer preventable deaths (6); fifth, concerns have been raised that hospitals may modify their coding practices or policies, such as refusing to accept very ill patients in an attempt to modify their HSMR (7); and finally, there is very little consistent or reliable evidence that hospitals with higher HSMRs actually provide poorer quality of care (8, 9).

Proponents of the HSMR argue that they should be used as a screening tool that alerts institutions to a possible problem rather than being a definitive measure of quality of care (10). Moreover, they counter that HSMRs are computed from data already existing in hospital databases and therefore are practical and cost-efficient to estimate (11), the constant risk fallacy is unlikely to be an issue for most hospitals (12), they are only used as a small part of an overall system for monitoring quality of care (10) and they can be used to monitor hospital changes over time (11).

In Australia, the Australian Commission on Safety and Quality in Health Care (ACSQHC) developed a toolkit that contains a set of risk-adjusted coefficients constructed from national inhospital mortality data (13). This enables hospitals to compare their HSMR against the Australian average. While practical to implement, a limitation of the current Australian approach for estimating HSMRs is that they are based on unlinked hospital records. This means that (i) multiple hospital records belonging to the same individual may not be brought together even if they are part of the same hospital admission that will fail to describe the patient pathways accurately or account for patient transfer policies, (ii) any deaths that occur soon after hospital discharge are not captured and therefore the HSMR is subject to discharge biases, and (iii) important historical or longitudinal patient characteristics are not available for use in the case-mix risk-adjustment process.

In the absence of a unique person identifier in Australia, some of these limitations can be overcome by using person-level linkage methods. Until recently, person-level linkage of administrative hospital and death records has been limited to only two standalone state-based data linkage centers; the Western Australia (WA) Data Linkage System and the Centre for Health Record Linkage in New South Wales (NSW). A constraint of state based or within-jurisdictional person-linkage is that it cannot follow patients if they cross state borders to attend hospital, a problematic issue when major urban areas such as Brisbane (QLD) are located close to a heavily populated region across a state border (NSW). Cross-jurisdictional linkage can overcome this limitation.

Cross-jurisdictional linkages of hospital and death records from NSW, WA, SA, and Queensland (QLD) were generated by the CDL, the first study in Australia to combine hospital and death data from multiple jurisdictions at the person level (14). This allowed an understanding of the patterns of cross-border hospital use not previously attempted (15). It further enabled assessment of the impact of cross-jurisdictional person-level linkage on the estimation of HSMRs. Due to hospital confidentiality concerns, identification of individual hospitals was not possible for this proof of concept study; therefore, estimated standardized mortality ratios were limited to groups of hospitals based on peer group and geographical location instead, that is, a grouped hospital SMR (GHSMR).

# MATERIALS AND METHODS

### Study Design

A retrospective cohort of all persons who were discharged (separated) from a NSW, WA, SA, or QLD participating hospital during the period 1st July 2004 to 30th June 2009 was identified. An additional 5 years of prior hospital separation records back to 1st July 1999 (where available) were used to identify past history of inpatient hospital use and preexisting comorbid medical conditions.

The main outcome measure was hospital-related deaths: both inhospital deaths and deaths that occurred within 30 days of separating from the last hospital stay. These deaths were used to estimate SMRs under three different record linkage scenarios, as follows: (1) cross-jurisdictional linkage, (2) jurisdictional (statebased) linkage, and (3) unlinked records. Ethical approval for this study was obtained from Human Research Ethics Committees in WA Health, QLD Health, SA Health Departments, the Cancer Institute NSW, and Curtin University (WA).

A detailed description of the hospital and death records used in this study have been published elsewhere (15). Briefly, inpatient records from public, psychiatric, and private hospitals, and private day surgery centers were available from NSW, WA, and QLD. SA provided public hospital inpatient records only. Death registration data were obtained from state-based registries of births, deaths, and marriages. The CDL created a set of person-level national linkage keys that linked all the hospital and death registration records across the four jurisdictions. These keys allowed the data custodians from each jurisdiction to provide relevant de-identified extractions of clinical and death data for analysis. The details of the cross-jurisdictional linkage process involved in this study are presented elsewhere (16).

### Data Cleaning and Standardization

Hospital records from the four jurisdictions underwent extensive cleaning and standardization to maximize analytical comparability. A standard set of exclusions included hospital boarders, organ procurements, aged care residents, funding hospital (duplicate) cases, canceled procedure admissions, unqualified newborns, and healthy qualified newborns. Records with missing age, sex, principal diagnosis or mode of separation were also excluded. Consensus categorical variables were constructed based on the variables from the jurisdictions that provided the least number of categories compared to other jurisdictions.

A number of jurisdictional coding differences were observed. For example, admissions for chemotherapy (ICD-10-AM code Z51.1) in public hospitals in NSW are mostly coded as outpatient events and were not included in the data, whereas they were coded as inpatient events and included in the data from the other three jurisdictions. Jurisdictional variations were identified by systematic cross-checking and with reference to the published metadata and local expertise.

### Variable Definitions

Eligible hospital stays had (i) acute care or, for multiple episodes of care, the first episode of care was acute care, (ii) a final discharge date that fell from 1st July 2004 to 30th June 2009, (iii) a total length of stay less than 1 year, and (iv) an Australian postcode of residence.

For this study, a hospital transfer was defined as a compilation of hospital records that indicated either a subsequent transfer to another acute hospital or a statistical discharge within the same hospital had occurred. A maximum of 48 h was allowed for a patient to transfer from one acute hospital to another.

The principal reasons for admission to hospital (principal diagnosis codes) were aggregated into broader diagnostic groups by recoding the ICD-10-AM code into one of 256 Clinical Classification System (CCS) groups (17). These 256 CCS groups were further aggregated into 150 CCS group classifications similar to that reported by Campbell et al. (18) when constructing the summary hospital mortality index (SHMI) with some modification. For example, there were sufficient numbers of hospital stays to create a separate category for melanoma and non-melanoma skin cancers.

The Quan ICD-10 coding algorithm for the Deyo/Charlson index was used to create a Charlson comorbidity score (19) with a 5-year look back period for person-level-linked records and no look back period for unlinked hospital records. An average depth of coding weighting was estimated to account for the extent to which preexisting medical conditions were coded in each calendar year and within each hospital group. Variation in the comprehensiveness of hospital coding practice has been shown to impact estimation of HSMRs (20).

# Risk Adjustment and GHSMR Estimation

Estimation of GHSMRs was restricted to (A) principal referral and specialist women's and children's hospitals, (B) large hospitals, (C) medium hospitals, and (D) small acute hospitals peer groups as defined by the Australian Institute of Health and Welfare (21). Hospital groups were created by splitting the four peer groups A, B, C, and D into smaller categories defined by geographical location and state jurisdiction; this created 43 different hospital groupings. Hospital geographic classifications were major city, inner regional, outer regional, and remote as assigned by the providing jurisdiction. Hospital-related deaths were attributed to the hospital associated with the first episode of care in a multicare episode hospital stay involving transfers.

The method for risk adjustment was based on that reported for the SHMI (18) with modification. The probability of a hospital-related death was estimated by fitting separate logistic regression models for each of the 48 most frequent CCS diagnostic groups that accounted for 80% of hospitalrelated deaths for each of the three different linkage scenarios. The dependent variables in these models were either (a) all hospital-related deaths (inhospital and 30-day deaths) or (b) inhospital deaths only. The independent variables used in these models were those factors likely to be associated with patient mortality outcomes and included patient age as quadratic term, gender, year, average depth of ICD coding weighting, length of stay, raw Charlson comorbidity score (5-year look back period), urgency of the hospital admission, accessibility to services (ARIA+), socioeconomc status (Index of Relative Social Disadvantage), marital status, aboriginality, number of times hospitalized in previous 5 years, whether the hospital stay involved intensive care or a ventilator, and whether the hospital stay involved a hospital transfer. Hospitalization history was excluded from the unlinked regression models. The discriminatory ability of each of these regression models to correctly classify hospital-related deaths was quantified using the area under the curve (c-statistic) from receiver-operating characteristic (ROC) analysis.

The expected number of hospital-related deaths was calculated by summing the probability of a hospital-related death for each hospital stay over each of the 43 different hospital groups. The GHSMRs were calculated as the ratio of actual observed number of hospital-related deaths in a hospital grouping to the expected number of deaths in that hospital grouping × 100. The 95% confidence intervals for the GHSMR estimates were calculated using Byar's approximation to the exact results based on the Poisson distribution (22). To increase the sensitivity of detecting differences in GHSMRS between those estimated using cross-jurisdictional links and those estimated using jurisdictional links in the absence of unique hospital identifiers, a subset analysis was performed. This involved conducting the risk adjustment and GHSMR estimation on the subset of patients who lived in statistical local areas (SLAs) where more than 1,200 patients crossed a state border to attend hospital over the 5-year study, an effective sample size of 302,191 (2.7%) hospital stays. GHSMRs are presented only for the hospital groups with more than 10 observed deaths within this population subset.

### RESULTS

There were 19.7 million hospital records from July 2004 to June 2009 that met the inclusion criteria. After applying jurisdictional person-level linkages that allowed multiple records pertaining to the same individual and admission to be bought together into a single hospital stay, the total number of records reduced 4% to 18.9 million hospital stays, which represented 7.8 million unique individuals (**Table 1**).

The further addition of cross-jurisdictional linkages brought together both episodes of care that involved hospital transfers across a state border (*n* = 11,116) into a single hospital stay and allowed patients who had hospital stays in more than one jurisdiction to be merged into a single patient. Cross-jurisdictional linkage reduced the number of unique hospital stays by 0.6% and reduced the total number of individual patients by a further 1.4% compared with jurisdictional linkages.



*a Percentage represents proportion of deaths in individuals who had a hospital stay in the 5-year period.*

*bPercentage represents proportion of 30-day deaths in individuals who were discharged alive from their last hospital stay (i.e., excluded individuals who died in hospital). c SA data included public hospitals only.*

*dNSW inpatient data include deaths in emergency departments.*

*e Proportion of hospital stays by non-state residents (cross-border users) relative to all hospital stays in the jurisdiction.*

The number and proportions of hospital-related deaths also varied depending on the data linkage scenario used (**Table 1**). When cross-jurisdictional linkage was used, there were 207,000 inhospital deaths identified, of which 48,380 (23%) occurred during hospital stays involving multiple episodes of care (transfers). Around 22,000 of these inhospital deaths were identified only in the person-linked data scenarios compared with unlinked data because the primary acute care episode of care in a hospital stay involving a transfer was linked to a subsequent non-acute episode of care in which the death occurred.

A further 170 inhospital deaths were identified using cross-jurisdictional linkage compared with jurisdictional links because it detected patients who had a hospital transfer across a state border to receive non-acute care and who then died in hospital. Additionally, there were 496 patients who died within 30 days of discharge and their death was registered in a different jurisdiction; 433 deaths in a different jurisdiction and 53 patients who had dual death registrations (all were dual registered in QLD and NSW).

The logistic regression models used to estimate the probability of hospital-related death in each of the 48 most frequent diagnostic groups had areas under the ROC curve (c-statistics) that ranged from 0.95 for the cardiac arrest and ventricular fibrillation to 0.70 for non-hypertensive congestive heart failure; a consistent finding for both the cross-jurisdictional and singlejurisdictional linked data. The ability of the logistic regression models to correctly classify inhospital deaths in the unlinked separation-level data varied from the person-linked hospital data with a maximum c-statistic of 0.95 for biliary tract disease and a lower 0.82 for cardiac arrest and ventricular fibrillation. The average c-statistic for the unlinked separation-level data for inhospital deaths was 0.84, slightly less than the average for person-linked data models at 0.85.

Grouped hospital SMRs estimated using inhospital deaths only were compared for cross-jurisdictional and unlinked hospital records (**Figure 1A**). The addition of the person-level links allowed episodes of care for an individual to be bought together into a single admission and resulted in a change of GHSMR toward the group average GHSMR of 100 in most cases.

For example, Hospital Group 1 with a SMR of 118 (95% CI: 116–119) using unlinked data dropped to 109 (95% CI: 108–110) with person-level cross-jurisdictional linked records. For some hospital groups with relatively low numbers of observed deaths, the observed changes in GHSMR were not always statistically significant. For example, Hospital Group 7 with around 50 observed deaths had an unlinked GHSMR of 118 (95% CI: 83–162) that increased to 136 (95% CI: 101–180) with person-level crossjurisdictional linkage.

The inclusion of deaths within 30 days of hospital discharge into the GHSMR estimates for the cross-jurisdictional linkage scenario resulted in GHSMR changes more consistently toward the group average (**Figure 1B**). In some cases, the addition of 30-day deaths reversed the change in GHSMR observed when person-level cross-jurisdictional links were first added to unlinked data (see Hospital Group 7 in **Figures 1A,B** for example). Overall, the inclusion of cross-jurisdictional person-level links to unlinked separation data reduced the coefficient of variation among the hospital groups from 0.19 to 0.15; the inclusion of 30-day deaths reduced the coefficient of variation further to 0.11.

There were only minor changes to the GHSMR estimates when cross-jurisdictional linkages were compared to jurisdictional linkages (**Figure 2A**). Hospital groups in remote areas tended to show the greatest difference as a result of cross-jurisdictional linkage. To increase the sensitivity of this comparison due to the limitation of not having individual hospital identifiers, the GHSMR estimation was restricted to the subset of patients residing in SLAs with high proportions of cross-border hospital users (**Figure 2B**). This restriction demonstrated increased variation in GHSMRs estimated using cross-jurisdictional links compared with jurisdictional links for several of the 11 hospital groups that had more than 10 observed deaths.

# DISCUSSION

We have demonstrated that using cross-jurisdictional linked hospital and death records can modify estimates of SMRs based on broad hospital groupings compared with both unlinked and jurisdictional linked records. For this study, the largest changes in GHSMRs for inhospital deaths were between unlinked records and person-level linked data. Person-level data allowed multiple episodes of care to be bought together into a single hospital stay. This allowed more accurate estimation of the number of patients, and their care pathways, and improved the identification of hospital-related deaths during non-acute care that were linked to an acute care admission. Additionally, the more complete ascertainment of patient comorbidity and hospital stay history improved the GHSMR estimation.

Including all 30-day deaths into the GHSMR estimation reduced the overall spread of GHSMRs and tended to bring outlying hospital groups toward the group average. This is consistent with previous work for NSW hospital data that showed that including 30-day deaths reduced the variation in HSMRs (23). It is likely that this overall reduction in variation occurs because including 30-day deaths into GHSMR estimation reduces the hospital-related death variation associated with early-discharge practices and varying hospital transfer processes.

Estimation of GHSMRs for hospital-related deaths using cross-jurisdictional links compared with jurisdictional links included additional deaths associated with the 11,116 crossjurisdictional hospital transfers and the 496 cross-border hospital deaths. These additional deaths made only minor changes to the GHSMRs in this study because of the reduced sensitivity of using hospital groups rather than individual hospitals. In this study, individual hospital identifiers were not available and SMR estimation was restricted to broad geographical and peer group categories. It is expected that significant changes in mortality rates could result for hospitals located close to jurisdictional borders when cross-jurisdictional linkages are included at an individual hospital level. This hypothesis is supported by the larger effect observed for cross-jurisdictional linked GHSMRs when restricted to patients living in high cross-border hospital use regions.

The risk-adjustment method used in this report were designed to make full use of the linked data available and thus differs from the method presented in the toolkit developed by the ACSQHC for hospitals to estimate their HSMR core hospital-based outcome indicators (13). While the regression models used to estimate the expected number of hospital-related deaths had high c-statistics, the approach used here would be impractical to implement on a real-time basis for monitoring hospital performance unless timely access to death registration data to identify deaths within 30 days of discharge can be contrived.

A condition of data release for this study prevented identification of individual hospitals, which was a major limitation. This restriction was primarily the result of privacy concerns and prevented the comparison of individual hospitals with similar characteristics. As a result, the GHSMRs reported here cannot, nor are meant to be, interpreted in any clinically meaningful way. This limitation highlights that there are still ethical, legal, and social barriers to overcome before cross-jurisdictional linkage is implemented regularly in Australia. Ensuring public confidence in the technology of data linkage to maintain individual confidentiality, advocating for changes to out-dated legislation and providing a strong ethical base to research training undertaken by organization such as the PHRN and the Centre for Big Data Research in Health will contribute to positive change. Other innovations such as secure remote-access computer environments and the development and use of privacy-preserving record linkage techniques will continue to play a role in the future of data linkage.

# CONCLUSION

We have shown that linking individuals and their hospital stays across jurisdictional borders can modify estimates of standardized mortality ratios. Hospital identifiers will be required to confirm these findings. Improving the precision of the HSMR as a hospital performance indicator is particularly relevant for hospitals that are located close to borders or that have relatively high numbers of interstate travelers.

# AUTHOR CONTRIBUTIONS

KS carried out the data manipulation, data analysis, and drafted the manuscript. DR conceived the design of the study, negotiated data acquisition, and contributed to manuscript preparation. JA conceived the design of the study, negotiated data acquisition, and contributed to manuscript preparation. AF contributed to data acquisition, data analysis, and manuscript preparation. JB contributed to data acquisition, data analysis, and manuscript preparation. JS contributed to overall study concept and critically reviewed the manuscript.

# ACKNOWLEDGMENTS

This study would not have been possible without the collaboration, assistance, and expertise provided by a large number of people including Lee Taylor, Kim Lim, Baohui Yang, and Zoran Bolevich from the NSW Ministry of Health; Almond Sparrow and Stacy Vasquez from SA-NT DataLink; Paul Basso, Tina Hardin, and Tomi Adejoro at SA Department of Health and Ageing; Helen Paues from SA Registry of Births Deaths and Marriages; Darren Shaw at Promadis; Jessica Lee, Alexandra Godfrey, Carol Garfield, and Paul Stevens from WA Department of Health; WA Registry of Births, Deaths and Marriages; the Health Statistics Branch, QLD Department of Health; Julie Hall and Erica Finlay at QLD Registry of Births Deaths and

### REFERENCES


Marriages; Sean Randall and Jacqui Bauer at the Centre for Data Linkage; Merran Smith, Felicity Flack, Angela Rate, Natalie Wray, Emma Fuller, and Tony Woollacott at the PHRN Program Office; James E. Harrison, Flinders University; Bruce Armstrong, University of Sydney; and Neville Board, Australian Commission on Safety and Quality in Health. Part of this manuscript was presented at the International Population Data Linkage Conference 2016, Swansea, UK.

### FUNDING

This work was funded by the PHRN, an initiative of the Australian Government being conducted as part of the National Collaborative Research Infrastructure Strategy.


**Conflict of Interest Statement:** KS, AF, and JB were funded by the PHRN to conduct the study reported here. The other authors declare no conflict of interest.

*Copyright © 2017 Spilsbury, Rosman, Alan, Ferrante, Boyd and Semmens. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Where Sepsis and Antimicrobial Resistance Countermeasures Converge

### *Timothy J. J. Inglis1,2\* and Nadia Urosevic1,2*

*<sup>1</sup> The Marshall Centre for Infectious Diseases Training and Research, School of Biomedical Sciences, University of Western Australia, Perth, WA, Australia, 2Department of Microbiology, PathWest Laboratory Medicine WA, Queen Elizabeth II Medical Centre, Nedlands, WA, Australia*

The United Nations General Assembly debate on antimicrobial resistance (AMR) recognizes the global significance of AMR. Much work needs to be done on technology capability and capacity to convert the strategic intent of the debate into operational plans and tangible outcomes. Enhancement of the biomedical science–clinician interface requires better exploitation of systems biology tools for in-laboratory and point of care methods that detect sepsis and characterize AMR. These need to link sepsis and AMR data with responsive, real-time surveillance. We propose an AMR sepsis register, similar in concept to a cancer registry, to aid coordination of AMR countermeasures.

### *Edited by:*

*Matthew Bellgard, Murdoch University, Australia*

### *Reviewed by:*

*Arnold Bosman, Transmissible, Netherlands Gregory Dore, University of New South Wales, Australia*

> *\*Correspondence: Timothy J. J. Inglis tim.inglis@uwa.edu.au*

### *Specialty section:*

*This article was submitted to Public Health Policy, a section of the journal Frontiers in Public Health*

*Received: 14 November 2016 Accepted: 17 January 2017 Published: 06 February 2017*

### *Citation:*

*Inglis TJJ and Urosevic N (2017) Where Sepsis and Antimicrobial Resistance Countermeasures Converge. Front. Public Health 5:6. doi: 10.3389/fpubh.2017.00006*

Keywords: antimicrobial resistance, sepsis, integrated systems biology, biocomplexity, microbial forensics, infection control

### INTRODUCTION

The United Nations high-level meeting on antimicrobial resistance (AMR) was calculated to thrust the issue of AMR into public view (1) and represents the latest milestone in a global awareness-raising campaign by public health authorities. At first glance, this appears to be the antithesis of precision public health, which places an emphasis on targeted multidisciplinary application of emerging biotechnology to the specific health needs of individuals (2). However, this onslaught against a leading global health challenge is built on a foundation of laboratory AMR surveillance and powered by similar multidisciplinary application of emerging highthroughput biotechnologies (3). The big data outputs obtained in such a way are attractive to public health precisely because they are amenable to mathematical modeling of the ecological and evolutionary processes that lead to AMR (4). These dynamic aspects of infection are complex and have led to a widening comprehension gap. Consequently, the growing public recognition of AMR has yet to acquire a more sophisticated understanding of its personal implications (5, 6). Health professionals who share our concern about escalating AMR support the translation of global policy into action at local, national, and international levels (7). A global campaign to contain and control AMR needs translation from strategic policy into day-to-day health-care practice. Strategy; the practice of the art of war by the *strategos* or general, includes the broader considerations of game theory, complexity, business, and management strategy (8). Biocomplexity provides an attractive framework for placing the cell and molecular biology or biomedical end of the AMR scale in a broader context that includes the clinical pathology of tissues and organs, and ultimately population health including all professional, social, and government regulation (9). So, to understand the mechanistic workings of an emerging public health phenomenon such as the rise in AMR infections, it is necessary to descend the scale of biological organization from population health to the molecular and cellular mechanisms of multiple-drug resistance in different bacterial species (10). A robust assessment of the broad consequences of AMR requires the converse; an ascent from a specific AMR phenotype to multinational surveillance review (11, 12). An unavoidable feature of AMR is its capacity for unpredictable double transmission: the ability to not only enhance case-clusters of transmissible disease, but also for transmission between resistant and previously sensitive bacteria contributing to novel disease case-clusters, as seen in the dissemination and proliferation of multiple mechanisms of carbapenem resistance (13). Both specific mechanisms and means of AMR transmission need consideration, since both the AMR mechanism and its transmission will impact on the ecology and epidemiology of AMR infection and have implications for the measures needed to control AMR (14). New analytical systems biology tools provide scope for evidence-based design of AMR surveillance and control (15). The complex picture that emerges can be used to develop an AMR narrative that covers the wide range of AMR molecular signatures, multiple bacterial species, and AMR mechanism combinations across the broad scale of biological organization (3). However, other emerging systems biology methods such as proteomics, metabolomics, and bacterial cytomics have yet to be integrated in a holistic AMR analysis that forms a more compelling argument for a specific causal effect (16). Practical use of this approach to attribution of causality has been explored in the field of microbial forensics and has wider application in linking the different tiers of analysis up to a strategic level (17). The O'Neill Review identified critical vulnerabilities that could be exploited in control of the global AMR problem and made a series of recommendations (18):


# THE CRITICAL DECISION CONTINUUM

The O'Neill Review recognizes that no single measure will solve the problem of AMR and only seeks to lay out a broad agenda. The review's introduction emphasizes the inability of current diagnostic procedures to provide rapid and comprehensive answers, noting that it is

…incredible that doctors must still prescribe antibiotics based only on their immediate assessment of a patient's symptoms, just like they used to when antibiotics first entered common use in the 1950s.

Antibiotic prescribers face three major obstacles: (a) AMR is an abstract concept for all but its victims and their physicians; (b) detection of specific forms of AMR does not conclusively determine the best choice of anti-infective therapy; and (c) in severe infections, the wait for laboratory evidence on which to base a choice of antibiotic can have fatal consequences. This last consideration remains a key promoter of emerging AMR and could be described as poorly targeted personal medicine; the antithesis of precision public health. Half a millennium ago, Machiavelli observed that the increase in diagnostic certainty with the passage of time leads to reduced treatment success (19). This makes the physician reluctant to wait for the definitive culture results and subsequent antimicrobial susceptibility before commencing treatment. The clinical laboratory still relies on culture-based methods (20), despite continued interest in sepsis biomarker and other culture-independent technologies. The definition of sepsis has been a point of debate, since it rests on a range of non-specific clinical features and laboratory indicators. The most recent consensus statement on sepsis recognizes only two clinical categories (sepsis and septic shock) and recommends preliminary patient assessment with an easily applied clinical scoring method (qSOFA) (21). The three critical decision steps in the early stages of clinical management of sepsis occur before-, at-, and immediately after hospital admission, which approximate to determination of illness severity, its etiology and the choice of definitive therapy (**Figure 1**). From a precision public health perspective, these correspond to pre-hospital point of care tests that distinguish viral from bacterial infection, rapid hospital biomarker tests for sepsis, or culture-independent tests for severe viral infection and bacteremia and rapid determination of antimicrobial susceptibility. The greatest benefit is most likely to be a pre-hospital, rule-out test that distinguishes possible bacterial from viral infection (22). Improved speed and accuracy of bacterial detection and antimicrobial susceptibility testing has thus become a priority in managing the subsequent stages of sepsis and demands a culture-independent approach (23).

# ANTIMICROBIAL SUSCEPTIBILITY TESTS

The mechanisms of AMR are numerous, increasing in variety, prevalence, and geographic distribution (24), but the ecological inevitability of AMR should not have caught us by surprise. Many antimicrobial agents are derivatives of naturally occurring compounds, whose corresponding AMR has its origins in the environment in which the antimicrobial compound evolved (25). However, the global success of a small number of multiresistant species such as *Klebsiella pneumoniae* (26) happened faster than predicted. The invisible, abstract nature of this public health threat is one of the more difficult aspects of the challenge we now face. It is unfortunate that the clinical laboratory markers of AMR do not translate into specific infectious diseases like

septicemia, pneumonia, or meningitis. The bacterial species names that appear on public health notification lists are not by themselves notifiable diseases. Despite its limitations, the international standard method of antimicrobial susceptibility testing; broth microdilution minimum inhibitory concentration (MIC), converts the susceptibility of a particular bacterial isolate into a comprehensible measurement (27). The widely performed disc diffusion susceptibility test converts antimicrobial susceptibility into a visible and qualitative approximation to clinical outcome; sensitive or resistant. Disc diffusion and MIC tests, therefore, generate measurable and clinically valuable indicators of the antimicrobial effect against named bacteria, whereas resistance mechanism detection by nucleic acid amplification, gene sequencing, or other molecular means is not a reliable quantitative measure of antimicrobial sensitivity. The guidance these susceptibility tests give the prescriber in their choice of antimicrobial agent relies on a second growth step, which adds a further delay to the clinical laboratory process. Many prescribers are not interested in the specific identity of AMR mechanisms, particularly if the overall AMR phenotype is a combination of multiple molecular mechanisms, with varied *in vivo* expression and an unpredictable impact on clinical outcome. A carbapenem-resistant *K. pneumoniae* septicemia cannot be treated with a carbapenem, whether the mechanism of resistance is NDM-1, OXA-48, VIM, or IMP. The antimicrobial susceptibility phenotype is, therefore, a critical decider in the sepsis management continuum, even if the laboratory result comes 24–48 h after the initial choice of presumptive antimicrobial therapy. The susceptibility phenotype currently determines definitive therapy and ultimately informs the wider public health community. At present, surveillance data on antimicrobial susceptibility vary with laboratory capability, capacity, and locally determined public health priorities. These are all under-resourced, particularly in remote regional settings and in low-income countries (28). Nevertheless, multinational networks such as EARSS and CAESAR collect regional AMR data and interest is growing in standardizing the susceptibility tests on which surveillance relies (29–31). The monitoring task is easier when centers that combine a longstanding interest in sepsis and AMR collect prospective data from invasive infections (32).

### EMERGING LABORATORY APPROACHES TO AMR

Rapid, culture-independent phenotypic tests are needed that improve precision in antimicrobial prescribing (17, 18). In particular, tests are needed that measure antimicrobial susceptibility, indicate effective treatment choices and deliver their results closer to the point of care. The wide diversity of molecular mechanisms of AMR limits the value of nucleic acid amplification (PCR assays) as a guide to antibiotic selection in acute clinical settings, particularly for carbapenem-resistant Gram-negative bacteria, which require supplementary tests to improve test sensitivity and overall coverage (33). Much effort has been devoted to detection of AMR mechanisms by rapid whole bacterial genome sequencing (3). Though this approach is not yet feasible as a routine service in the clinical laboratory, bacterial genome sequencing has clear application to public health investigations of AMR infection (3, 11, 26, 34), where decision triggers and task selection procedures can be applied to avoid overloading reference laboratory capacity. Clinical microbiologists who have to cope with the practical scientific challenge of detecting AMR while patients are still under treatment concentrate their effort on standardizing accurate measurement of the AMR phenotype (29). Faster methods of antimicrobial susceptibility testing are now a high priority, as noted in one of the O'Neill Review's technical reports (35). It is here that systems biology applications are beginning to bear fruit (36). However, careful validation is necessary before

emerging technologies can be used in the clinical laboratory. This requires test verification and harmonization to maximize analytical value and avoid poorly coordinated proliferation (29, 30). Systematic validation of new antimicrobial susceptibility test methods against agreed reference standards is a necessary step to delivering sufficient confidence in emerging laboratory methods before they can be used for surveillance and control purposes. High profile incentives such as the UK Longitude Prize are being used to attract new candidate tests for this lengthy development process (37).

# A BLEND OF COUNTERMEASURES

Countermeasures need purpose, intent, direction, and evidence for their efficacy. An understanding of the complex intersection of laboratory, clinical, and public health insights will improve their beneficial effect (16). AMR-specific countermeasures, therefore, operate at three levels (**Figure 2**) beginning with faster and more accurate phenotypic laboratory assays that use agreed international standards (29, 30, 36). The O'Neill Review expects new laboratory technology to enable recognition of sepsis, its etiology and antimicrobial susceptibility faster than current culture-dependent methods (35). At the clinical level, prescribing physicians need incentives such as faster confirmation of the etiology of infection and its antimicrobial susceptibility to use the evidence-based antimicrobial therapy advocated in the O'Neill Review (18). In addition to the recommended clinical sepsis score (21), prescribing physicians need a bacterial infection rule-out test to support their initial sepsis triage (22) and innovative methods of rapid antimicrobial susceptibility testing to support their decision-making at the point of care. However, a clearer picture of the global burden of AMR and the measures to control it will not emerge until variations in regional AMR notification have been harmonized through introduction of a sepsis/AMR registry (**Figure 2**). Other fields of medicine, such as oncology, use case registries to develop and refine their disease-specific countermeasures (38, 39). A sepsis registry could be used in similar manner as a precision public health tool to stratify sepsis by syndrome, etiology, AMR phenotype, and resistance mechanism, and, therefore, to coordinate AMR countermeasures. The recent consensus definition of sepsis is a helpful starting point for discussion of a sepsis registry (21), but requires a stronger laboratory-based emphasis on bacterial etiology and AMR. Precision is measurable, particularly when supported by archival material in bacterial culture collections and registered clinical biobanks. Claims for the increased accuracy of new methods should thus be verifiable and linked with the clinical laboratory, where the precision of antimicrobial susceptibility tests is already monitored against reference standards and verified by regulatory agencies (29, 30).

# CONCLUSION

Antimicrobial resistance has become a global tragedy of the commons, driven by a complex bacterial survival trade-off at a cellular level (40). Now that AMR is recognized as a global priority, it is time to learn to use additional systems biology tools to improve the speed and accuracy of antimicrobial prescribing at an individual patient level and simultaneously increase the precision of AMR sepsis surveillance. Improved confidence in the recognition of early sepsis, faster determination of its etiology, and antimicrobial susceptibility phenotype, and real time surveillance through an AMR sepsis registry will lead to more effective coordination of clinical, laboratory and public health AMR countermeasures. Given the speed with which antimicrobial agents have been compromised by AMR, there is no time to lose introducing these laboratory and surveillance tools into wider use.

# AUTHOR CONTRIBUTIONS

The authors are working together on culture-independent pathology test development. TI prepared the initial draft. NU reviewed, edited, and supplemented the first draft with an emphasis on sepsis. Subsequent versions of the manuscript were exchanged between the authors who both approved the final version.

### FUNDING

The authors' work on AMR and sepsis countermeasures is supported by translational research project grants from the Department of Health, Government of Western Australia, a Grand Challenges Award from the Bill and Melinda Gates Foundation (OPP 1150984), the NATO SPS Programme

### REFERENCES


(project grant 984835), philanthropic donations from Rotary Clubs and Lab Without Walls Inc., and in-kind contributions from Thermo Fisher Scientific and Biomerieux Australia. This research is conducted in accordance with the Government of Western Australia's governance requirements and supervised by the Department of Health's Research Development Unit.


40. MacLean RC. The tragedy of the commons in microbial populations: insights from theoretical, comparative and experimental studies. *Heredity (Edinb)* (2008) 100(5):471–7. doi:10.1038/sj.hdy.6801073

**Conflict of Interest Statement:** The authors are supported by a Grand Challenges award from the Bill and Melinda Gates Foundation, as stated in the acknowledgments above. This and research translation grants from the Government of Western Australia are being used to develop culture-independent pathology tests for sepsis and AMR countermeasures. Thermo Fisher Scientific and Biomerieux have provided in-kind support to the authors' research group, under supervision of the WA Health Department's Research Development Unit. Neither author has received funding from these companies for any purpose. No supporting organization or its members had any role in the preparation of this manuscript, which is the opinion of the two authors.

*Copyright © 2017 Inglis and Urosevic. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Ensuring Privacy When Integrating Patient-Based Datasets: New Methods and Developments in Record Linkage

*Adrian P. Brown, Anna M. Ferrante\*, Sean M. Randall, James H. Boyd and James B. Semmens*

*Centre for Population Health Research, Curtin University, Bentley, WA, Australia*

In an era where the volume of structured and unstructured digital data has exploded, there has been an enormous growth in the creation of data about individuals that can be used for understanding and treating disease. Joining these records together at an individual level provides a complete picture of a patient's interaction with health services and allows better assessment of patient outcomes and effectiveness of treatment and services. Record linkage techniques provide an efficient and cost-effective method to bring individual records together as patient profiles. These linkage procedures bring their own challenges, especially relating to the protection of privacy. The development and implementation of record linkage systems that do not require the release of personal information can reduce the risks associated with record linkage and overcome legal barriers to data sharing. Current conceptual and experimental privacy-preserving record linkage (PPRL) models show promise in addressing data integration challenges. Enhancing and operationalizing PPRL protocols can help address the dilemma faced by some custodians between using data to improve quality of life and dealing with the ethical, legal, and administrative issues associated with protecting an individual's privacy. These methods can reduce the risk to privacy, as they do not require personally identifying information to be shared. PPRL methods can improve the delivery of record linkage services to the health and broader research community.

Keywords: record linkage, data integration, privacy, encryption, data quality, linkage quality

# INTRODUCTION

Unabating growth in the creation of data, coupled with advances in information technology and Internet connectivity, provides tremendous potential for data-driven breakthroughs in the understanding, treatment, and prevention of disease. These health research innovations are being complemented by data from non-traditional sources (i.e., from sources other than administrative health and survey records). Opportunities include the use of mobile phone records (1) and Google search histories (2) for disease surveillance, patient collected data from wearable devices (3), and manual journaling through mobile phone applications (4). Data from the private health sector and government administrative datasets that lie outside the health sector (5) are also of interest, as is spatial information that has direct application for understanding exposures and inequalities (6).

### *Edited by:*

*Paul Michael Kelly, ACT Health, Australia*

### *Reviewed by:*

*Arnold Bosman, Transmissible, Netherlands Ronan Foley, Maynooth University, Ireland*

> *\*Correspondence: Anna M. Ferrante a.ferrante@curtin.edu.au*

### *Specialty section:*

*This article was submitted to Public Health Policy, a section of the journal Frontiers in Public Health*

*Received: 30 November 2016 Accepted: 15 February 2017 Published: 02 March 2017*

### *Citation:*

*Brown AP, Ferrante AM, Randall SM, Boyd JH and Semmens JB (2017) Ensuring Privacy When Integrating Patient-Based Datasets: New Methods and Developments in Record Linkage. Front. Public Health 5:34. doi: 10.3389/fpubh.2017.00034*

Genetic information unavailable a generation ago is already used in clinical decision making (7), and its importance is only likely to increase. The key to unlocking these data is in relating details at an individual patient level to provide an understanding of risk factors and appropriate interventions (8).

A key methodology that has supported health research is record linkage, a process of accurately bringing together records from multiple datasets that belong to the same person. Through record linkage, it has been possible to construct and analyze population-wide datasets comprising "linked" administrative records pertaining to each individual. Health-based record linkage frameworks have been established, which routinely integrate data from hospital admissions, emergency departments, primary care facilities, birth, death, and disease registries (1, 2), creating a rich analytic resource to support evidence-based decision making (9–11).

Present models of record linkage use trusted third parties (TTPs) or data linkage units (DLUs) to accurately match records using personal identifiers (12). Incorporating information from new and diverse data sources into these linkage frameworks are likely to have significant benefits to research; however, the operational and administrative overheads are substantial. Technical issues (i.e., scalability, efficiency) and effects on linkage quality (accuracy) will also be impacted and need to be assessed.

Sharing of public and private datasets also presents privacy and confidentiality challenges. Protecting the privacy of individuals is paramount in the record linkage process and essential to maintain community support and trust. There are serious ethical implications in combining information on individuals (generally without direct consent) from government and other sources; essentially a form of surveillance of an entire population. For some privacy advocates, this is a bridge too far, conjuring up images of an Orwellian dystopia or the excesses of totalitarian regimes (13, 14). Health researchers argue that privacy risks can be minimized and that the public benefit of utilizing these rich datasets outweighs the risk to privacy; that is, there is an ethical imperative to conduct record linkage for research (15). The public's view on this issue is not always clear; numerous surveys have been conducted in Australia, which sometimes return contradictory results regarding Australian views on the use of personal health information [see Ref. (16). for a review]. Similar contradictions have been observed in results from Canadian surveys (14).

While a number of existing processes and techniques are used to maintain patient privacy during record linkage (17), the development of new and improved linkage methods may provide an opportunity for alternative approaches that further reduce privacy risks without compromising on linkage quality.

This article discusses the emergence and potential benefit of record linkage techniques that limit the release of personal identifiers for linkage. These methods, collectively referred to as privacy-preserving record linkage (PPRL), operate in such a way that they do *not* require the release of personally identifying information by data custodians. PPRL methods work on information that has been permanently encoded, encrypted, or transformed before releasing the data for linkage. Through PPRL methods, the benefits of linkage can be realized without the risks associated with disclosure of personal information.

# EXISTING RECORD LINKAGE FRAMEWORKS

There is a long history in Australia of record linkage supporting both jurisdictional level and national research and health decision making (10, 12, 18). Record linkage capabilities in all jurisdictions (19–21) have recently been strengthened, and in many cases expanded, through strategic national investment: through the National Collaborative Research Infrastructure Strategy in Australia; the Canadian Institutes of Health Research in Canada; and through the Farr Institute initiative in the United Kingdom (22).

The record linkage framework adopted by most of these jurisdictions is a TTP model, whereby dedicated linkage units undertake record linkage to service and support research. Administrative data collections (such as hospital discharges, emergency presentations, mortality, and cancer registers) have typically formed the backbone of enduring record linkage systems (18, 23). Such collections are highly confidential, containing sensitive personal information that is protected by law.

### RECORD LINKAGE AND PRIVACY

Linkage of person-level records through the use of personally identifying information, and generally without consent, has significant ethical and legal implications that have been at the forefront of issues confronted and addressed by DLUs (12, 24).

The extent to which data can be used in record linkage depends on the applicable legislation in each jurisdiction. Some administrative collections are bound by specific laws which either prohibit or severely curtail the release of personal information from these systems.1 It has been claimed that more than 500 secrecy and privacy provisions exist in Australian Commonwealth laws, imposing considerable limits on the availability and use of identifiable data (25). At Commonwealth level, privacy laws permit some level of disclosure of personal information by authorities for human research (*Commonwealth Privacy Act 1988 s 95*). The release of personal data for linkage can be authorized if public benefit outweighs the privacy of individuals (26).

Working within these legal frameworks, data custodians, DLUs, and the research community in Australia have developed secure data access and usage models that provide important safeguards to privacy. DLUs have also implemented best practice data governance policies and practices to minimize further the privacy risks posed by their operations (12, 18, 19, 27–29).

This includes utilizing the "separation principle" (30), a simple method for restricting the type of data received by each organization in the linkage process. Under this principle, the DLU receives only the personally identifying information required for linkage, but not the content data. The researcher, on the other

<sup>1</sup> In Western Australia, for example, both the WA Children's Court Act 1988 and the Young Offenders Act 1994 curtail the release of information for research in relation to juvenile offenders. In South Australia, state-based regulations restrict the release of information from the SA Perinatal Statistics Collection (SA Health Care Variation Regulations 2010, Reg 4). Similar legal barriers exist in other jurisdictions, both locally and internationally.

hand, receives only the content but not personal identifying information. Only the data custodian has access to both personal identifying information and clinical content data.

The use of the separation principle greatly enhances privacy. However, in many instances, the risk to privacy can be still large. For instance, knowledge that a particular individual has a record within a data collection is itself revealing, especially for specific data collections such as mental health inpatient datasets or cancer registries. This information will be still provided to the linkage unit under the separation principle.

The release of personally identifying information always carries some additional risk, as more individuals have access to this information. While rare, attempting to determine whether a person of interest is contained within a dataset does occur; for instance, US intelligence agents have used their surveillance capabilities to spy on romantic interests (31), as have Australian telecommunications workers (32).

Some custodians remain averse to the release of personal information for reasons that extend beyond privacy risks, such as discrimination, reputational damage and/or embarrassment, criminal misuse of the data, and commercial harm (25).

Legislative barriers and risk aversion by data custodians are currently being challenged by open data policies and a growing need by and for government to work with private industry to more effectively service community needs. A recent Productivity Commission Inquiry into the benefits and costs of increasing the availability and use of public and private sector data recognizes the barriers and risks associated with working with named data (25). The Inquiry outlines a framework for data sharing underpinned by legislative change, governance structures (to remove blocks and increase data access), and the development of "systems and processes […] to identify, assess, manage and mitigate risks related not just to data release and sharing, but also data collection and storage" [(25), p.9].

The issues being encountered in Australia are shared internationally. DLUs in the United States, Canada, and Europe face similar legal and risk-related hurdles (e.g., the United States: *Health Insurance Portability and Accountability Act 1996*, Canada: *Personal Health Information Protection Act 2004*, and Europe: *Data Protection Directive 95/46/EC*). German laws in relation to the disclosure of personal information are particularly restrictive (*Bundesdatenschutzgesetz—Federal Data Protection Act of Germany*) and, in some cases, only a single data item can be used for anonymous linkage (33).

### PRIVACY-PRESERVING SOLUTIONS

Privacy-preserving record linkage protocols utilize algorithms and techniques to conduct linkage on encrypted or masked information; these methods do not require data custodians to release personal identifiers to third parties. This reduces the risks associated with the release of personal data. Three important attributes characterize all PPRL protocols: accuracy, efficiency, and privacy.

Different classes of privacy-preserving linkage methods provide differing levels of privacy protection. These range from techniques such as the statistical linkage key that simply amalgamates parts of a person's identifiers into a single variable (34) to methods that encrypt or encode the data so that those with access cannot learn any information directly from the encrypted values. The exact level of privacy required will always depend on context, but all things being equal, a protocol with higher privacy is preferred.

An important difference in PPRL protocols is the method of matching which impacts on linkage quality (accuracy). Protocols may perform matching on a particular set of identifiers, using either exact or similarity comparisons. Similarity matching enables records with slight differences to come together, which is vital for obtaining high-quality linkage results (accuracy). For this reason, PPRL protocols that utilize approximate matching are favored.

Efficiency can be often a concern for record linkage and will continue to present challenges to DLUs as the volume of data continues to grow. Although there are no established performance standards, record linkage is computationally slow, and for any PPRL protocol to be practical, it must complete within a reasonable time frame.

The extent to which these protocols are used in practice varies. To date, most PPRL implementations use exact matching on particular attributes of a dataset (35), which are typically irreversibly encoded to ensure privacy (36). Though efficient, these methods have reduced linkage quality and, therefore, are operationally unsuitable in DLUs.

Of all PPRL methods, the Bloom filter method appears to be the most promising for operational use (37). An advantage of the Bloom method over other PPRL methods is that it utilizes approximate matching while providing similar or superior privacy protection. The method has been evaluated on large-scale, real world health datasets, with results returning equal linkage quality and similar efficiency to traditional linkage methods (which use personal identifiers in the matching process) (38). No record linkage method, privacy preserving or not, achieves perfect accuracy—to be able to achieve equal accuracy to the standard non-privacy-preserving approach is a considerable accomplishment. The security of the protocol has been rigorously investigated (39–41). Cryptographic attacks on the algorithm found ways to reveal some identifiers (40). However, modifications to the protocol have rendered these attacks fruitless (42); there are currently no known security vulnerabilities with the protocol.

The introduction of the Bloom filter method brings new challenges (17). As well as operational requirements around designing optimal linkage strategies, new ways of validating record linkage results need to be developed. In traditional record linkage, linkage results are validated through clerical inspection (or "manual review") of personal identifiers; however, in a privacy-preserved context where all data are encoded, there is no way to manually review the data or correct possible data or linkage errors. New methods for validating linkage results under privacy-preserved linkage model are emerging, however (43).

### PPRL: AN EXAMPLE

Consider the (hypothetical) scenario: to attempt to reduce the rate of youth suicide, the government of the day has invested in a comprehensive mental health care package for those who have attempted suicide. The government wishes to see whether their program has worked in reducing the rate of suicide and attempted suicide.

To answer this question, two datasets will be required: a hospital admissions dataset and a mortality register. From the hospital admissions dataset, records will be required to be sent to the linkage unit for all those persons who have attempted suicide before and after the start of the health intervention; all records from the mortality register will be required by the linkage unit. The linkage unit will receive only the personal identifying information required for linkage (i.e., name, date of birth, gender, address). The linkage unit identifies which records from the supplied hospital dataset have associated mortality records. The linkage unit passes this information back to the data custodians, who then provide the content data (i.e., not personally identifying information) to the researcher for the hospital records, and any linked mortality records, along with a key that identifies which records belong to which individual. The researcher can then use this information to determine whether the intervention reduced suicide and attempted suicide rates.

The privacy risk in the aforementioned scenario is the delivery to the linkage unit of personal identifying information from hospital records of those who have attempted suicide. This extremely sensitive information has been made available to a third party. The use of privacy-preserving linkage methods would remove this risk; instead, the linkage unit would receive encrypted personal identifiers; they would have no means of identifying any of these individuals, but would still have the ability to determine which records belong to the same individual between datasets.

### GROWING INTERNATIONAL INTEREST IN PPRL

With a growing demand for linked data from government and the university sector, interest in PPRL, particularly the Bloom filter method, is flourishing. Interest stems from two principal sources: at a technical level, by computer scientists and cryptographers with interests in information and data security, and at an operational level, by groups with interest in and responsibility for delivering record linkage services.

Several groups are actively developing and refining PPRL methods at the scientific level including the German Record Linkage Center (University of Duisburg-Essen) (44, 45), the Research School of Computer Science (Australian National University) (46–48), and the Health Information Privacy Laboratory (Vanderbilt University) (39, 49). Researchers from these groups and others recently participated in a 2016 Data Linkage and Anonymisation programme at the Isaac Newton Institute for Mathematical Sciences (Cambridge University, supported by EPSRC grant no EP/K032208/1)2 ; this 6-month international programme included seminars and workshops on linkage and privacy protection to share and advance knowledge in the mathematical sciences and related disciplines. A key goal of the forum was to "enhance opportunities for the analysis of data, especially obtained through linkage, whilst protecting privacy and taking account of related practical constraints."

At an operational level, PPRL featured prominently in the 2016 International Population Data Linkage Network Conference (Swansea University), with several presentations on the topic including a keynote session that described a collaboration between international research institutions in Canada, Australia, and Wales (44, 46, 50–53).

### OPPORTUNITY AND CHANGE MANAGEMENT

In addition to reducing the privacy risks associated with record linkage, the advent of PPRL protocols potentially heralds a new era of population-focused research using linked data, bridging gaps, and opening up opportunities for new and different forms of linkage-based research. PPRL methods may provide an avenue to access previously "hard to get" datasets (i.e., those with significant legal or regulatory constraints). PPRL methods may also provide a mechanism for accessing and integrating data from new and emerging sources. As well as data from new technologies (e.g., wearable devices, smartphone apps), these new sources may include the private health sector that has, to date, had limited exposure to, and engagement with, data linkage frameworks (54, 55).

New methods may require new or adjusted models of operation. Some custodians have expressed a desire to have flexibility in record linkage models to accommodate the features of different data collections (50). However, different or altered data linkage operating models can have significant implications for end-user timeframes, operational efficiency, and linkage quality (50), and these need to be carefully managed and monitored. It is important that the strengths and limitations of the PPRL methods are understood. This will require conversations with stakeholders (i.e., data custodians, linkage units, researchers, and the community) around the risk–benefit of these new models and the expected realization of public benefit.

### CONCLUSION

The implementation of PPRL methods that do not require the release of personal information but protect privacy through other mechanisms (e.g., encryption methods) represents a breakthrough in record linkage, substantially reducing privacy risks without negatively impacting on linkage quality. By utilizing methods that do not require the release of personally identifying information, concerns regarding personal surveillance and government overreach can be allayed. Supplementing traditional linkage methods with PPRL methods will increase the number and type of datasets that can be included in record linkage studies.

The advent of PPRL methods to protect patient privacy expands the toolkit of techniques that are available to DLUs. Used in conjunction with traditional linkage methods, PPRL widens the net of record linkage without compromising privacy or linkage quality. These methods will hopefully allow more diverse, patient-centered data sources to be utilized for health research,

<sup>2</sup>https://www.newton.ac.uk/event/dla.

bringing enormous opportunities to increase our understanding of disease and to tailor interventions and treatment to each individual.

### AUTHOR CONTRIBUTIONS

AB and AF accept immediate responsibility for the manuscript. AF, AB, SR, JB, and JS each contributed to the conception and design of the paper. AF and AB drafted the first version of the

### REFERENCES


article, with SR, JB, and JS providing important additional input and intellectual content. All authors were involved in revising the manuscript and approving its final form.

# FUNDING

This work was discussed at the Isaac Newton Institute for Mathematical Sciences, Cambridge, supported by EPSRC grant no EP/K032208/1.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2017 Brown, Ferrante, Randall, Boyd and Semmens. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# The Problem with Big Data: Operating on Smaller Datasets to Bridge the Implementation Gap

*Richard P. Mann1 , Faisal Mushtaq2 \*, Alan D. White2,3, Gabriel Mata-Cervantes2 , Tom Pike2 , Dalton Coker3 , Stuart Murdoch4 , Tim Hiles4 , Clare Smith4 , David Berridge4 , Suzanne Hinchliffe4 , Geoff Hall2,4, Stephen Smye2,4, Richard M. Wilkie2 , J. Peter A. Lodge2,4 and Mark Mon-Williams2*

*1School of Mathematics, University of Leeds, Leeds, UK, 2 Faculty of Medicine and Health, University of Leeds, Leeds, UK, 3Big Data and Analytics Unit, Institute of Global Health Innovation, Imperial College London, London, UK, 4 Leeds Teaching Hospitals NHS Trust, Leeds, UK*

### *Edited by:*

*Tarun Stephen Weeramanthri, Government of Western Australia, Australia*

### *Reviewed by:*

*Gregory Dore, University of New South Wales, Australia Sarah Smith Lunsford, EnCompass LLC, USA*

> *\*Correspondence: Faisal Mushtaq f.mushtaq@leeds.ac.uk*

### *Specialty section:*

*This article was submitted to Public Health Policy, a section of the journal Frontiers in Public Health*

*Received: 19 July 2016 Accepted: 24 October 2016 Published: 01 December 2016*

### *Citation:*

*Mann RP, Mushtaq F, White AD, Mata-Cervantes G, Pike T, Coker D, Murdoch S, Hiles T, Smith C, Berridge D, Hinchliffe S, Hall G, Smye S, Wilkie RM, Lodge JPA and Mon-Williams M (2016) The Problem with Big Data: Operating on Smaller Datasets to Bridge the Implementation Gap. Front. Public Health 4:248. doi: 10.3389/fpubh.2016.00248*

Big datasets have the potential to revolutionize public health. However, there is a mismatch between the political and scientific optimism surrounding big data and the public's perception of its benefit. We suggest a systematic and concerted emphasis on developing models derived from smaller datasets to illustrate to the public how big data can produce tangible benefits in the long term. In order to highlight the immediate value of a small data approach, we produced a proof-of-concept model predicting hospital length of stay. The results demonstrate that existing small datasets can be used to create models that generate a reasonable prediction, facilitating health-care delivery. We propose that greater attention (and funding) needs to be directed toward the utilization of existing information resources in parallel with current efforts to create and exploit "big data."

### Keywords: big data, small data, surgery, health economics, length of stay

The "big data" revolution is central to the long-term vision of health services across the globe (1). For example, big data are central to the UK's Department of Health plans to save £5bn by 2020 through improved operational productivity (2). However, there is a mismatch between the political and scientific optimism surrounding big data and the public's perception of its benefit (3). In this regard, Big Data constitute a deceptively difficult health-care policy. The research community needs to persuade a skeptical public whose personal health data should be made available for analysis if the big data recommendations are to be realized (4, 5) – see for example, the controversial NHS England "care.data" program. Our concern is that the lack of demonstrable *benefits* from data analytics in the short-term may reinforce skepticism and erode government enthusiasm (and support) for big data projects. The UK, where national policy on Big Data is currently under review (6), might serve as a useful test-bed for other countries. We propose that one of the solutions to the many problems facing Big Data could be bridged by demonstrating the benefits of data analytics using smaller, readily available data.

There are already many examples of local and regional routinely collected data sets being used to improve health-care services (4). In fact, the idea of health service research providing useful information to hospital management is, of course, far from new. These cases suggest that analyzing existing, routinely available health data ("small data") might be a good starting point for altering public perception, given the difficult strategy of exploiting larger datasets. However, progress in these domains often proceeds in an *ad hoc* manner and success is self-contained. We suggest a systematic and concerted emphasis on developing *models* from these data could illustrate how data science can produce tangible benefits. In order to demonstrate the value of a small data model-based approach, we produced a proof-of-concept model predicting hospital length of stay (LOS).

We chose LOS because the average cost of an excess bed is approximately £273 per day, and the average cost of an elective inpatient stay is £3,366 (7). A model that could predict LOS with some accuracy would mean that fewer operations would be canceled at short notice because of a lack of bed space, thus saving staff and equipment costs, and crucially, provide an improved service for patients.

The current system of bed planning stands as a testament to the remarkable abilities of staff within a hospital – individuals who use extensive insight and knowledge to juggle beds in an environment where both acute and emergency operations can change the requirements on a moment-to-moment basis. The complex, dynamical nature of the hospital is analogous to a weather system and shows similar characteristics (for example, "chaotic" features such as a sensitive dependence on initial conditions). The difference between the weather forecaster and the hospital bed planner lies in the quality of the models they can run to simulate the system of interest. Cognitive science has shown that humans are poor at making decisions under conditions of high uncertainty (8) and tend to prioritize immediate problems over longer-term planning (9, 10), whereas mathematical models can assist in optimizing decision making (11). We therefore studied whether we could utilize existing NHS data to build a simple predictive model as a precursor to one who could help forecast the need for beds following elective surgery (as the scheduling of elective operations in a very large acute NHS Trust, Leeds Teaching Hospitals, has an element of flexibility that offers a degree of control to those running hospitals). We used available data that had been routinely collected by clinicians, health service practitioners, and administrators on an internal system on a daily basis.

For illustrative purposes, we created a model (see Datasheet S2 in Supplementary Material) that could use predictors known *a priori*, and *post hoc* knowledge (e.g., operation time) to provide estimations of LOS for patients undergoing laparoscopic cholecystectomy (LC). We focused on this procedure because it is estimated that 10 and 15% of the adult western population have gallstones (12) – the most common and costly digestive disease (13) – and LC is the preferred treatment option for symptomatic gallstones (14). Due to its prevalence, we reasoned that a predictive model might complement individual intuition and help hospitals plan elective procedures and associated beds in a more efficient manner. This could be beneficial as the costs associated with discharging patients too early can be greater than the initial investment of bed stay and a day-case surgery policy is not suitable across all specialties and procedures – despite demonstrable success in some areas (15). Previous research indicates that modeling LOS is technically feasible (16, 17), yet these approaches are rarely used in practice. This is particularly surprising given the costs associated with sub-optimal bed allocation and the nature of current approaches to scheduling – even the most rudimentary model should provide information of value – that could ultimately translate to economic benefits in the long run.

Our analysis revealed that month, weekday, year, patient age, and operation time were all predictive of LOS using data from 2004 to 2012. **Figure 1** shows how each predictor influences LOS, if all other predictors are held constant.

Since patient age and operation time were the strongest predictors of stay duration (**Figures 1C,D**), we extracted a two-dimensional plot of their partial effect in combination (see **Figure 2**). Note that operation time is a variable only known post-surgery [though surgeon expertise and difficulty are correlates (18)]. The pseudo-*R*<sup>2</sup> for this model was 0.29. Based on 1,000 permutations where stay durations were permuted and the model refitted, this model reached statistical significance at a threshold of *p* < 0.001. For younger patients (<55 years), we found a relatively weak relationship between operation length and LOS up to the 3-h mark, but for older patients (>55 years), stay duration increased strongly with surgery length.

There is, of course, substantial between-patient variation unaccounted for by our model (given data limitations), and there is considerable room for further predictive improvement. Intrinsic (but unrecorded) differences between patients are always likely to make prediction difficult. In reality, decision makers face scheduling problems on different time horizons; here, we included operation length as a predictor, but this is clearly not known until the operation is complete. Without this predictor, the best-fit model selects month, year, and patient age as factors, with a pseudo-*R*<sup>2</sup> value of 0.17. It is remarkable that a simple fivefactor model can account for this amount of variance in the data given the complexity of the system.

The results of this exercise demonstrate that existing small data sets could be used to create models that allow a reasonable prediction of hospital LOS after surgery. It is notable that the data we used had considerable limitations. For instance, we focused on one procedure with data from one (albeit, large) NHS Trust, and we did not identify the incidence of complications. These issues could be addressed and may have a substantial impact. If this process is replicated across multiple procedures and hospitals, we could be in a better position to plan for 23 h, 5.5-day facilities instead of full in-patient facilities. This information could ultimately influence how hospitals plan and flex their bed-base.

In summary, we have demonstrated a "proof of concept" that a proportion of the variance associated with patient LOS can be predicted from a limited number of factors. Many applications of medical statistics, such as tests for the efficacy of drugs, require careful experimental design to determine causal effects of putative interventions. In contrast, scheduling problems only require accurate prediction given the observable traits of the patient, since no intervention is proposed. This is where predictive modeling from existing small data sets has the lowest barrier

and the gray region represents the 95% quantile). (A) These data show a steady downwards trend in stay duration over the nine relevant years; (B) the duration of stay is longer for operations on Saturdays – most likely due to weekday discharge; (C) for patients above 55 years of age, stay duration rapidly increases with age; and finally, (D) stay duration also increases with operation time – presumably an indicator of complications in surgery or intrinsically more difficult cases. Interestingly, stay duration reaches a plateau for operations over ~3 h, though there are relatively few data points for surgeries of this length – and as such, this relationship should be treated with caution.

to entry. Systematically recording and utilizing more of these data would allow these data to inform the best computational model and allow schedulers to use the model ahead of time when it can be most efficacious. Crucially, these models could be rapidly developed and deployed from existing datasets. Providing, for example, fewer cancelations of elective operations as a result of the effective implementation of a small data LOS model would provide a tangible example of the benefits of data analytics to the public. We suggest that this could provide *one* solution to the reticence of a public who are skeptical about the benefits of their data being collected, particularly if existing datasets can be utilized in novel and clinically beneficial ways.

Finally, while our example is from the UK NHS – an organization that is the largest health-care provider and one of the largest global producers of health data – the resulting predictive model could be used across other health-care systems. Moreover, a demonstration of the usefulness of data analytics in any country can help change the public's (and clinicians) perception of the value of big data. The UK NHS Hospital Trusts data systems provide an opportune vehicle by which the big data implementation gap can be addressed and, if successful, could serve as a model for others to follow. We therefore propose that greater attention (and funding) needs to be directed toward the utilization of existing data resources, in parallel with current efforts to create and exploit "big data" sets. It is probable that smaller analytical projects yielding efficiency in the short-term ("small data") will persuade society of the longer-term merits of exploiting data, as well as identify the challenges and opportunities in analytics in a more tractable fashion than is afforded by still-to-be-created big data repositories.

# AUTHOR CONTRIBUTIONS

ADW collected the data. RPM analyzed the data. FM, RPM, GM-C, TP, DC, SM, TH, CS, DB, GH, SH, SS, RMW, JPAL, and MM-W wrote the paper.

### REFERENCES


### FUNDING

This work was supported by funding from the Leeds Teaching Hospitals Charitable Foundation to ADW, RMW, JPAL and MM-W and an MRC Grant (R17427) to MM-W.

### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at http://journal.frontiersin.org/article/10.3389/fpubh.2016.00248/ full#supplementary-material.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2016 Mann, Mushtaq, White, Mata-Cervantes, Pike, Coker, Murdoch, Hiles, Smith, Berridge, Hinchliffe, Hall, Smye, Wilkie, Lodge and Mon-Williams. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Comprehending the Health Informatics Spectrum: Grappling with System Entropy and Advancing Quality Clinical Research

*Matthew I. Bellgard1 \*, Nigel Chartres <sup>2</sup> , Gerald F. Watts 3,4, Steve Wilton1,5,6, Sue Fletcher 1,5,6, Adam Hunter <sup>1</sup> and Tom Snelling7,8,9*

*1Centre for Comparative Genomics, Murdoch University, Murdoch, WA, Australia, 2Health Informatics Society of Australia, North Melbourne, VIC, Australia, 3School of Medicine, University of Western Australia, Perth, WA, Australia, 4 Lipid Disorders Clinic, Cardiometabolic Service, Royal Perth Hospital, Perth, WA, Australia, 5Centre for Neuromuscular and Neurological Disorders, University of Western Australia, Nedlands, WA, Australia, 6Perron Institute for Neurological and Translational Science, Nedlands, WA, Australia, 7Princess Margaret Hospital for Children, Perth, WA, Australia, 8Wesfarmers Centre of Vaccines and Infectious Diseases, Telethon Kids Institute, University of Western Australia, Perth, WA, Australia, 9Menzies School of Health Research and Charles Darwin University, Darwin, NT, Australia*

### *Edited by:*

*Rumen Stefanov, Plovdiv Medical University, Bulgaria*

### *Reviewed by:*

*Danice Brown Greer, University of Texas at Tyler, United States Valeria Manera, University of Nice Sophia Antipolis, France*

*\*Correspondence:*

*Matthew I. Bellgard mbellgard@ccg.murdoch.edu.au*

### *Specialty section:*

*This article was submitted to Public Health Policy, a section of the journal Frontiers in Public Health*

*Received: 27 March 2017 Accepted: 08 August 2017 Published: 14 September 2017*

### *Citation:*

*Bellgard MI, Chartres N, Watts GF, Wilton S, Fletcher S, Hunter A and Snelling T (2017) Comprehending the Health Informatics Spectrum: Grappling with System Entropy and Advancing Quality Clinical Research. Front. Public Health 5:224. doi: 10.3389/fpubh.2017.00224*

Keywords: health, informatics, clinical research, information communication technology, clinical practice

Clinical research is complex. The knowledge base is information and data rich where value and success depend upon focused, well designed connectivity of systems achieved through stakeholder collaboration. Quality data, information, and knowledge must be utilized in an effective, efficient, and timely manner to affect important clinical decisions and communicate health prevention strategies. In recent decades, it has become apparent that information communication technology (ICT) solutions potentially offer multidimensional opportunities for transforming health care and clinical research. However, it is also recognized that successful utilization of ICT in improving patient care and health outcomes depends on a number of factors such as the effective integration of diverse sources of health data; how and by whom quality data are captured; reproducible methods on how data are interrogated and reanalyzed; robust policies and procedures for data privacy, security and access; usable consumer and clinical user interfaces; effective diverse stakeholder engagement; and navigating the numerous eclectic and non-interoperable legacy proprietary health ICT solutions in hospital and clinic environments (1, 2). This is broadly termed health informatics (HI).

We outline three scenarios from across the health spectrum where these issues are exemplified: (i) for a given clinical trial methodology and study design, the nature of how quality data is captured, by whom, how it is aggregated, reused and repurposed is just as critical as the data content itself. This becomes critical with the desire to simultaneously evaluate and optimize the effective and cost-effective use of new medications (3); (ii) in a systems biology context, clever strategies to combine disparate datasets at the gene, gene expression, protein as well as at a protein–protein interaction levels are essential to unlock underlying molecular mechanisms that affect routine clinical decisions (4); and (iii) in evidence-based medicine, encoding expert clinical knowledge into decision support systems and data standards for collecting diverse patient's physiological measurements are critical to ensure effective cross jurisdictional data sharing for diseases (5).

These three examples highlight the potential broad spectrum of the role of ICT in health. Simply stated, at one end of the spectrum, health ICT systems are critical for the routine day-to-day running of hospitals and clinics. These systems are used by various health stakeholders for a diverse range of clinical services and administrative procedures. More recently, there is an increasing demand to reuse and repurpose health data contained within these ICT systems for clinical research and reporting such as compliance, efficiency metrics, funding of health programs, epidemiological studies, and health promotion. On the other end of the spectrum, clinical research embeds ICT and its application involving bioinformaticians, biostatisticians, and analytic workflow environments within research projects. There is a growing demand to embed outputs of this research as evidence to inform health-care policy and improve clinical practice.

The significant challenge is how we bridge these two ends of the spectrum. While the overall driver of improved patient outcomes is shared, the demands placed on available ICT systems for data capture, access, and analysis are usually beyond what they were originally designed for. We contend that the field of HI is the important bridge that delivers the promise spanning ICT spectrum in both health care and clinical research. We now explore the challenges in HI that need to be overcome.

# KEY HI CHALLENGES WITHIN THE CURRENT ENVIRONMENT

### Key Challenge 1: Defining HI

There are numerous broad definitions of HI. One such definition is that HI is "an evolving scientific discipline that deals with the collection, storage, retrieval, communication and optimal use of health and related data, information and knowledge" (6). The discipline draws on computational and information science methodologies and technologies to support clinical decisionmaking to improve health care. Such a broad definition has both advantages and disadvantages. On the one hand, this definition is a "catch all" for the spectrum of ICT in health care and clinical research. On the other, such a broad definition impacts a diverse range of health-related stakeholders from researchers, clinicians, nurses, public, allied health, health professionals, government departments, administrators, and software engineers. This presents a significant challenge of ensuring effective communication and uptake of robust HI.

# Key Challenge 2: Current Health ICT Ecosystems

In reality, health ICT ecosystems are largely fragmented (7, 8). For example, typically within a hospital ICT system environment, there are stand-alone systems, meaning that important health data are also siloed. Depending on the nature of these systems (some of which are as simple as spreadsheets), it is highly likely to contain significant data entry errors, duplications, inconsistencies, and incompleteness. The key challenge here is that fragmented ICT systems impedes the ability to monitor chronic diseases, effectively follow-up patients after hospital discharge, prevent avoidable complications (for example, hospital readmissions), or enable longitudinal epidemiological studies. This has a flow on cost burden effect and can inhibit efficiency gains within the health system. In Australia, numerous healthcare business units (such as radiology, pharmacy, pathology, and radio oncology) typically have their own ICT systems that do not interface with each other, and most hospital systems do not interact with external systems, such as general practice clinics or private clinic rooms. Therefore, ownership and management of data become an important barrier between health-care business units and affect the quality of patient care. Furthermore, when proprietary systems are deployed and hosted by third parties, the ability of the client to exercise their ownership rights over their data requires clarification at the outset of the hosting arrangements.

## Key Challenge 3: Underlying Causes of Issues with Current ICT Ecosystems

Many papers and conferences addressing significant issues inherent in the challenges of introducing successful ICT ecosystems into the health sector continue to identify some key underlying causes for system failures and continuing difficulties in achieving meaningful connectivity within the health-care system, for example, see Ref. (9). These issues generally fall under the following 10 headings.

### Leadership and Governance

Currently, the required degree of alignment of shared leadership and appropriate governance arrangements, across the many areas of responsibility, needed for systems synergies, are limited. Program management is equally important to project management to ensure shared learning of technical and interpersonal expertise.

### Policy and Funding Models

Although health reform agendas mean to streamline policymaking and funding models, many stakeholders consider that very little has really been achieved that delivers any significant improvements into the way health systems operate. In this context, there are significant funding and resourcing pressures on any given state/national health system. The nature of these pressures unfortunately means that the focus reverts back to a business-as-usual paradigm within health systems. Furthermore, the current budgetary and operational pressures on the health sector restrict the ability of leadership within the sector to respond to contemporary challenges.

### Regulatory Impediments

Existing and complex regulatory environments are viewed as a major issue where very little practical and beneficial change has been able to be introduced.

### Productivity and Performance

It is recognized that significant progress has been made in reporting/compliance arrangements and systems that are focusing on transparency and accountability of health-care service providers. Given the current widespread lack of active use of data standards utilized within the fragmented health ICT ecosystem, it is difficult to harness the big data opportunities inherently available in health-care performance metrics (10). As such it is neither feasible nor practical to be able to use performance metrics to assess productivity in meaningful depth that could introduce transformative efficiencies into service delivery models.

### Standards

Globally, there is much valuable work on developing open standards in health, for example, Ref. (11, 12). However, there remain many challenges in their widespread adoption related to limited funding, limited leadership capacity, widespread agreement, and limited workforce skills and resources. A particular issue concerns the focus on data collection and data entry rather than what we refer to as a more holistic approach to data management including the *purposeful application of collected data* to improve health outcomes.

### Business Models and Processes—The Illusion of Risk Free Procurement

A significant barrier to the successful deployment of new systems is managing the transition from legacy ICT systems and data management processes in delivery of health services. This has further exacerbated the disparity between implemented ICT solutions and the business models and processes, which they purport to support. For instance, the procurement processes of health ICT solutions should be continually reviewed and iteratively refined along the dimensions of digital disruption, accountability, risk assessment, risk mitigation, risk averse strategies.

Evidence of potential suboptimal processes is highlighted by the patient journey through the health system, which invariably spans organizational and operational boundaries whose systems are typically not seamlessly connected to support the overall delivery of health care (9). In the case of rare disease diagnosis, a patient's navigation through the health system is referred to more as an odyssey than a journey (3).

In addition, business model and process reform which is required systemically throughout the health system and much of which depends upon regulatory reform, is considered one of the most significant barriers to any beneficial transformation of the health system.

### Sociotechnical Complexities

Sociotechnical complexities (complexities that span societal and technical boundaries) are inextricably linked to many aspects of business models and their associated business processes. Many of these complexities are inherently cultural in nature, in so far as many health workforce participants operate within long standing conventionally designed systems ecosystems. So while some progress continues to be achieved in specific situations, the big breakthroughs can only be achieved through large-scale business model and process reform as driven by regulatory change. If these are not addressed, then, for example, emerging trends such as patient empowerment *via* the measured self (13), the Internet of things (5), and personalized medicine will only see these complexities exacerbated (14).

Another key aspect of this concerns a real focus on business models, business processes, and systems, which collectively enable much more community engagement at all levels in consultation on matters such as prevention, patient care, diagnosis, treatment, management, privacy, and consent.

### Infrastructure Component Connectivity

Technical and communication infrastructure is no longer viewed as the major issue as it was in recent times. It is clear that more effort needs to be made to connect existing infrastructure components to enable better communication between health-care service providers and so achieve more coordination of services.

### Workforce

A barrier to success exists in the form of limited staff capacity across a range of administrative, clinical, research and technology disciplines to overcome the significant business-as-usual pressures of national health systems. This must be addressed to implement transformative change. ICT systems inherently can track performance, which can give rise to fear of inappropriate exposure for suboptimal clinical decision-making.

### Clinical Research

There are limited virtual spaces where the health sector can interface with the research sector. Health departments do not have infrastructure to provide analytic environments for their big data, academic environments are typically not structured to handle health data, despite possessing the analytic capabilities.

### CASE STUDY: DEMENTIA

The Organisation for Economic Co-operation and Development (OECD) has recognized that there is clear potential to improve science and innovation systems through big data and open science for the prevention and care of dementia. In 2010, 35 million people worldwide were diagnosed with dementia with annual health costs estimated at USD 604 billion with the number of people diagnosed to exceed 115 million by 2050. The multifactorial nature of the condition requires the collection, storage, and processing of increasingly large and very heterogeneous datasets (behavioral, genetic, *-omics*, environmental, epigenetic, clinical data, brain imaging, and so forth) (10).

To successfully apply informatics systems to big data, current barriers, issues, and challenges need to be recognized and addressed along with implementing key critical success factors. For example, the OECD identified data sharing as the most significant barrier in managing dementia (15). The root cause of this significant barrier arises from current cultural, technical, administrative, regulatory, infrastructure, and financial obstacles that need to be overcome. In addition, data standards, data sharing, new analytic approaches, security and protecting privacy, along with approaches for engaging stakeholders and the public are critical factors for effectively and successfully harnessing big data*.* Hence, the future opportunity for big data in improving health-care systems requires carefully crafted strategies at both

policy and ICT implementation levels across a broad range of HI challenges. In particular, regard needs to be paid to the established discipline of data governance, which is particularly important for providing a solid structural basis for managing human resources, processes, and technologies (1, 2).

# THE FUTURE CONTRIBUTION OF HI TO IMPROVING HEALTH OUTCOMES

A learning health-care system requires a number of critical ingredients that can improve care of patients. These entail definition of clinical context, accurate collection of patient characteristics and outcome data, availability of decision support systems, utilization and application of real world data, and effective engagement of all stakeholders.

# Introducing a Guiding Model for the Role of HI to Span the Spectrum of ICT in Clinical Research

Owing to the current complexities and issues inherent in making substantial progress in improving health outcomes through the deployment of ICT enabling systems it is clear that there needs to be a better understanding of the role which HI plays.

**Figure 1** provides an overview of a proposed guiding model highlighting the ideal role that HI plays in health care. Within this model, for example, clinical research will generate and analyze data such as a personalized genome sequence, to obtain clinical validity of candidate pathogenic mutations (16). The identified pathogenic mutation data are captured as one of myriad of patient phenotypes and patient reported outcomes to ascertain clinical utility, such as in a disease registry, e.g., Ref. (17, 18), as part of clinical services and practice, in a personalized medicine context (14). In the third axes, pathogenic mutations data can be aggregated in a de-identified manner across geographical locations to inform policy and community awareness (19) and undertaking important population health research.

# CONCLUSION AND FUTURE PERSPECTIVE: STRATEGIES FOR SHAPING EFFECTIVE AND SUSTAINABLE SYSTEMS

From our experience, there are three key linked and iterative strategies for shaping and delivering successful systems. These are to:


These, necessarily, need to be very skilfully planned, managed, and executed, which requires professional systems thinking HI practitioners who also have a very pragmatic working knowledge of the health system. This topic will be the subject of further commentary.

Information communication technology solutions must be discussed in an open and willing environment where risk is understood and carefully managed to facilitate strategic planning. These solutions must be designed to be able to apply open data standards and open system principles that promote interoperability, service oriented architectures, application programming interfaces, and appropriate assessment of legacy ICT systems (12, 20).

# AUTHOR CONTRIBUTIONS

All the authors have contributed to this work.

# FUNDING

The authors gratefully acknowledge the combined supportin-part funding for this work. This includes the RD-Connect-European Union Seventh Framework Programme (FP7/2007– 2013 program HEALTH. 2012.2. 1.1-1-C) under grant agreement number 305444: RD Connect: an integrated platform connecting databases, registries, biobanks, and clinical bioinformatics for rare disease research, the financial support of Australian National Health and Medical Research Council (APP1055319) under the NHMRC–European Union Collaborative Research Grants scheme, the Wellcome Trust [REF 104746], and the Australian Bioinformatics Facility funded through Bioplatforms Australia Pty. Ltd., an Australian National Collaborative Research Infrastructure Strategy initiative.

# REFERENCES


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2017 Bellgard, Chartres, Watts, Wilton, Fletcher, Hunter and Snelling. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Policy Making in Newborn Screening Needs a Structured and Transparent Approach

### *Marleen E. Jansen1,2\*, Karla J. Lister3 , Henk J. van Kranen2,4 and Martina C. Cornel1*

*1Section Community Genetics, Department of Clinical Genetics, Amsterdam Public Health Research Institute, Amsterdam, Netherlands, 2 Institute for Public Health Genomics, School for Oncology and Developmental Biology (GROW), Faculty of Health, Medicine, and Life Sciences, Maastricht University, Maastricht, Netherlands, 3Screening Policy Section, Office of Population Health Genomics, Department of Health, Government of Western Australia, Perth, WA, Australia, 4Center for Nutrition, Prevention and Health Services, National Institute for Public Health and the Environment, Bilthoven, Netherlands*

### *Edited by:*

*Gareth Baynam, Genetic Services of Western Australia; The University of Western Australia; Murdoch University, Australia*

### *Reviewed by:*

*Shanti Balasubramaniam, Children's Hospital at Westmead, Australia Paul Lacaze, Monash University, Australia*

> *\*Correspondence: Marleen E. Jansen m.jansen5@vumc.nl*

### *Specialty section:*

*This article was submitted to Public Health Policy, a section of the journal Frontiers in Public Health*

*Received: 15 December 2016 Accepted: 01 March 2017 Published: 21 March 2017*

### *Citation:*

*Jansen ME, Lister KJ, van Kranen HJ and Cornel MC (2017) Policy Making in Newborn Screening Needs a Structured and Transparent Approach. Front. Public Health 5:53. doi: 10.3389/fpubh.2017.00053*

Purpose: Newborn bloodspot screening (NBS) programs have expanded significantly in the past years and are expected to expand further with the emergence of genetic technologies. Historically, NBS expansion has often occurred following *ad hoc* consideration of conditions, instead of a structured and transparent approach. In this review, we explore issues pertinent to NBS policy making, through the lens of the policy cycle: (a) agenda setting, (b) policy advice, (c) policy decision, (d) implementation, and (e) evaluation.

Methods: A literature search was conducted to gather information on the elements specific to NBS and its policy making process.

Results: The review highlighted two approaches to nominate a condition: a structured approach through horizon scanning; and an *ad hoc* process. For assessment of a condition, there was unanimous support for a robust process based on criteria. While the need to assess harms and benefits was a repeated theme in the articles, there is no agreed-upon threshold for benefit in decision-making. Furthermore, the literature was consistent in its recommendation for an overarching, independent, multidisciplinary group providing recommendations to government. An implementation plan focusing on the different levels on which NBS operates and the information needed on each level is essential for successful implementation. Continuously monitoring, and improving a program is vital, particularly following the implementation of screening for a new condition. An advisory committee could advise on implementation, development, review, modification, and cessation of (parts of) NBS.

Conclusion: The results highlight that there are a wave of issues facing NBS programs that policy makers must take into account when developing policy processes. What conditions to screen, and the technologies used in NBS, are both up for debate.

Keywords: decision making, newborn screening, public health policy, genomic screening, genetic testing

# INTRODUCTION

Newborn bloodspot screening (NBS) is the longest running and most successful population screening program worldwide (1). NBS tests newborns within the first days of life for multiple serious conditions (2). The traditional aim of NBS is to prevent serious consequences for the newborn by enabling timely diagnoses and treatment for early onset childhood conditions (3). In recent years, technological developments, changes in understanding of conditions, and new treatments, have fueled the expansion of NBS (4, 5). The aim of this study is to explore issues influencing each phase of the policy cycle. In doing so, this study provides policy makers insight in the pressures facing NBS to inform them on approaches to successfully guide programs into the future.

An archetypal example of the policy pressures facing NBS was the advent of tandem mass spectrometry (MS/MS), and the resultant impact it had upon programs worldwide. This technology emerged in the 1990s and made it possible to test for several conditions at once in a cost-effective manner (4, 5). Correspondingly, several programs adopted the technology and significantly increased the number of conditions screened. Programs using these technologies regularly screen from 9 to over 50 conditions (1, 6–9). Even more than MS/MS, genetic technologies may enable screening for a larger number of conditions (3, 10, 11). Debate abounds in the academic literature on the appropriateness of expanding NBS (4, 10, 12): some authors advocate for targeted approaches (13, 14); while elsewhere next generation sequencing is being applied in the research setting to study it's potential for NBS (15, 16).

Previous expansions and the divergent programs that have evolved, suggest that the emergence of genetic technologies is likely to be a significant turning point for NBS. Given the reach of NBS, that NBS tests our most vulnerable population, and the potential to increasingly expand programs, it is essential that decisions on what to screen are carefully considered. Thus policy approaches are needed which can successfully navigate in the changing environment (14). It can be expected that an expansion occurring in some countries will become an example of what is possible for other countries (17). At a minimum, further debate will emerge on the pros and cons of expanded screening, and experts, consumers, and advocacy groups are likely to increase calls for screening for specific conditions (17). The emergence of debates on further expanding NBS presents decision makers worldwide with the challenge of weighing the benefits and harms of screening in the changing landscape of NBS. Policy frameworks, which are developed in light of the range of policy issues, will be essential for policy makers to ensure their programs can effectively respond to the pressures facing the program, now and in the future.

Given the pressures for NBS, the current study aims to identify what the scientific literature outlines are the key policy considerations currently facing NBS. This is achieved by exploring issues pertinent to NBS policy making, through the lens of the policy cycle (10): (a) agenda setting, (b) policy advice, (c) policy decision, (d) implementation, and (e) evaluation. Without detailing current developments in genomic technologies, we aim to explore issues influencing each phase of the policy cycle. In doing so, this study will enable policy makers responsible for existing or emerging NBS programs to consider the best policy structure to respond to the changing environment in which NBS operates.

# METHODS

We explored academic literature to summarize relevant factors pertinent to policy making for NBS. International policy making processes have been recently reviewed elsewhere (1, 12). The current study builds upon what is known about the tangible policy making process, by highlighting issues facing NBS identified in academic literature. This then provides policy makers an outline of the issues that should be considered in the development of policies. To be included in the review an article needed to discuss one or more of the following topics: nomination of a condition (agenda setting); consideration of a condition (policy advice); deciding on a condition (policy decision); addition of a condition (implementation); or quality assurance and improvement (evaluation). Articles were also included if they discussed all elements of the policy cycle, such as mention of a comprehensive policy framework for NBS policy making.

We searched PubMed for articles regarding *newborn screening* and *policy*. We combined the two key search terms with the following search terms: 1. *program development*, 2. *decision-making*, 3. *governance*, 4. *management*, 5. *perspective*, 6. *future*, and 7. *disease* or *condition*. Only English publications on dried bloodspot screening were included. Articles concerning other types of newborn screening (e.g., hearing, hip dysplasia) were excluded, because we wanted to focus on the complex policy making specific to bloodspot screening.

# RESULTS

The initial literature search identified 59 articles. Twenty-seven articles discussed one or more of the elements of the policy cycle (Table S1 in Supplementary Material). Most literature originated from western societies, predominantly the USA (13 of 27 articles, Table S1 in Supplementary Material) and was initiated from a clinical background rather than a public health background (Figure S1 in Supplementary Material). The main technology discussed in the articles shows a shift through time from MS/MS toward discussing genetic technologies as the challenge for NBS: 3 of 13 articles until 2008 discuss mainly genetic technologies and 6 of 14 published since 2009 (Table S1 in Supplementary Material). Recent articles discussing MS/MS report on results from current screening programs or previous decision-making processes (18). The following outlines the results, stepping through the policy cycle.

# Agenda Setting: Nominating a Condition

The review highlighted two approaches to nominating a condition. The first is a structured approach focused on horizon scanning; the second approach is much more *ad hoc* and influenced by external drivers, such as advocacy (18–20).

The structured, horizon scanning approach generally includes an independent body that undertakes horizon scanning to identify a range of relevant conditions to evaluate for NBS and support expansion through an evidence-based process (10, 21, 22). Such an approach has been successfully used by several countries in agenda setting based on an objective threshold of criteria (18, 22). In horizon scanning, potential conditions are identified and recommended for further in-depth review, through initial assessment of criteria. Contrary to the organized horizon scanning approach, the majority of the literature focused on an *ad hoc* approach. In this approach, conditions became the focus of an assessment for NBS in response to new technologies, broader disease definition, insight into pathophysiology, and advocacy (5, 23–26). In the past, NBS policy direction and program expansion have been strongly influenced by technological drivers, often evaluated *ad hoc* (18–20).

Advocacy is a key driver for change within NBS. Pressure by consumers, clinicians, and scientists to screen for a condition dates back to the very first condition screened in NBS, phenylketonuria (PKU). NBS for PKU was advocated by Dr Guthrie, whose son was born mentally handicapped and whose niece experienced intellectual disability due to undiagnosed and unmanaged PKU (5). Dr Guthrie developed a test to identify the condition and advocated for mass screening of PKU through community support groups (5). Recent examples of advocate pressure leading to the introduction of a condition include in the instance of X-linked adrenoleukodystrophy and Krabbe disease in the USA (5, 10, 27). However, the benefits of screening, for Krabbe are disputed in literature and referred to as "dangerous and expensive" (27).

### Policy Advice: Assessment of a Condition

Within the literature reviewed, there was unanimous support for a robust assessment process based on criteria. However, a key issue relating to the assessment of nominated conditions centered on the appropriateness of using criteria originating from the Wilson and Jungner principles (10, 28, 29). Criticisms are voiced that the Wilson and Jungner principles are developed to evaluate individual conditions, while modern day technology pushes toward the possibility and sometimes the need to evaluate groups of conditions at once (30, 31). Furthermore, there is no objective tool developed from the Wilson and Jungner principles, which leaves them open to interpretation into different criteria between programs (31).

While the need to assess harms and benefits was a repeated theme in the articles, there is no agreed-upon threshold for benefit (10, 22, 29). This is essential to effectively explore the benefits and harms of screening, to ensure that the former outweighs the latter. The benefit of screening is intrinsically related to the primary aim of screening, which is predominantly to avoid preventable harm in newborns. The aim and beneficiary screening should both be specified in policies, as they are open to interpretation (9, 32, 33). That is, to support assessing the appropriateness of a condition, there needs to a clear understanding of who will benefit from screening, what is the perceived benefit, and how should it be weighed in decisions (10, 22). In general, three groups were mentioned as potential beneficiaries of NBS: the child, the family, and/or society (34). In the recent report from the Health Council of the Netherlands, the beneficiary of screening was specifically defined as the child. Consequently, this lead to conditions without clear clinical benefits to the child to be assessed as inappropriate for inclusion within NBS (12).

## Policy Decision: Deciding on a Condition

The literature focused on two key areas when deciding whether to screen a condition. The first focused on who makes the decision, and the second focused on the evidence on which the decision is made. The literature was consistent in its recommendation for an overarching, independent, multidisciplinary group providing recommendations to government (18, 24, 28).

In terms of evidence, authors of the reviewed literature identified that decisions in NBS often need to be made based on incomplete information (22). A main concern identified in several articles, is the lack of data to support evidence-based decisions. There is the need for interoperable databases to collect sufficient data on the diseases considered and included in NBS (10, 23, 35). Alternatively, authors suggested that innovations in NBS should be implemented in a research paradigm, to facilitate data collection for policy decisions, and gain informed consent from parents participating with their child(ren) in the study (22, 30, 33, 35). Pilot studies are vital to the development of a strong evidence base to support decision-making regarding the addition of new conditions. As shown in Denmark, the Faroe Island, and Greenland, a pilot program of 7 years eventually provided information for evaluation and the subsequent decision to not include 11 conditions in the routine screening program in 2009 (36).

### Implementation: Addition of a Condition

Once a condition is approved for implementation in a NBS program, an implementation plan focusing on the different levels on which NBS operates and the information needed at each level should be developed (21, 37, 38). Issues across programs are similar when looking at implementation and relate to five fields: education, finances, logistics, politics, and culture (5, 24). These fields extend beyond "public health" to also include follow-up in the clinical setting. Key issues include the need for work flow across these fields to be coordinated, and ensuring professionals have the relevant skills and knowledge for the new condition(s) (24, 39). Issues to ensure skills and knowledge are particularly challenging where conditions are being identified in the pre-symptomatic phase, and there may be a lack of evidence or consensus in clinical guidelines. This will be further challenged if the preferred technology moves toward genome-based technologies, which can identify genetic variations that have implications for family members. These technologies will lead to issues relating to privacy and confidentiality, residual specimen storage and usage policies, and educational material to become even more pressing (40).

# Evaluation: Quality Assurance and Improvement

Continuously monitoring, and improving a program is vital, particularly following the implementation of screening for a new condition (30). It is possible that a condition assessed as being appropriate to screen, will not meet the parameters upon which this decision was based. For example, the false positives or negatives recorded for a test within a trial period might not align with those that occur in the real world setting. Thus quality assurance (QA) is required to monitor the program's performance against defined targets to ensure it aligns with the anticipated outcomes (22, 29, 34).

QA provides essential ongoing assessment of feasibility, cost, and equitable delivery of testing (10, 32, 37). Some authors suggest principles for QA, such as clear guidelines on responsibilities throughout the chain of NBS; standards on aspects regarding confidentiality; and protocols for storage of blood spot specimens (21). Importantly, QA should be complemented by quality improvement (QI). QI builds upon QA, to drive improvements and achieve success. Issues for QI within NBS include managing the improvement process across the NBS system: from healthcare professionals to laboratory experts. Ways to overcome fragmentation of providing information on the key indicators while gaining data on them from all parts of NBS can be providing training, develop written educational materials for parents and health-care professionals, and redesign laboratory slips for blood collection (21).

### Policy Cycle in General

Authors advocated a transparent, structured, and evidencebased process (22, 25, 41). Policy making for NBS can be governed both locally and nationally (38, 42). Governance is a process that focuses on balancing competing influences and demands (43). The need for harmonization of national policies is often referred to in literature: to ensure a national balance in competing interests and equity in access to early interventions (29, 31). A central body like a national or federal government should play a core role in overseeing NBS. In addition, consultation and engagement is a key theme, which some authors highlight should be managed through a multidisciplinary advisory committee providing advice through the policy cycle (28). An advisory committee could advise them on implementation, development, review, modification, and cessation of (parts of) NBS (21, 29, 38).

### DISCUSSION

Stepping through the policy cycle illustrated that NBS is on the precipice of great change. Programs are facing a wave of pressures, including in response to new treatments and new technologies. The history of NBS suggests the programs are flexible in responding to a continually changing environment. However, the historical *ad hoc* approach to adding conditions to NBS is recognized as potentially problematic in the light of future developments. Current NBS programs might contain conditions that have not been robustly evaluated through an agreed policy advice process. Future developments and challenges highlight that policy makers need to take stock of the issues facing the programs, and develop policies that will ensure safe and appropriate growth of programs (22, 25, 41).

The growing number of conditions that could be screened is a key issue for NBS programs, particularly in the face of pressures from next generation sequencing (13, 44). Moreover, the potentially growing number of conditions extends beyond what is technologically possible, to challenge the fundamental purpose of the programs. Internationally, there are increasing calls to move further beyond the traditional aim of NBS, and screen for "untreatable" conditions (33, 34). Untreatable conditions do not always have a certain treatment benefit or treatment is not urgently needed in the newborn period. For untreatable conditions to be implemented in NBS, some argue that the aim has to shift from clinical benefits solely for the child, to include family benefits (24, 29). Such a shift in the focus of beneficiary beyond the newborn, will lead to a vast increase in the number and type of conditions eligible for screening (3, 22): a great number of conditions may have family benefit through information on relevant reproductive options compared to a limited amount of conditions that have direct clinical benefit for the newborn (34).

The above issues overwhelmingly outline the need for robust and considered policy making for NBS. However, it is unclear whether such policy making can fully combat the pressures facing the program. Many nomination processes can still be considered passive where a nomination is awaited, instead of active horizon scanning for relevant conditions. Further, should there be a shift in focus or a push for more conditions to be screened, it is recommended that this be accompanied with consideration as to whether NBS is the right place to screen for such conditions. Specifically, in order to protect the programs and ensure they stay true to their aim, consideration should be given to preconception, prenatal, or screening during early childhood.

Our study has several limitations, data from the USA were overrepresented (13 of the 27 articles) and a sample of 27 articles might not be representative for international policy making in NBS. Further, the policy cycle is theoretical. As such, the recommendations from academic literature are prone to interpretational disparity between theory and practice. Nonetheless, this review shows relevant aspects in policy making and addresses gaps in the current processes. Our results suggests that there is the need for a structured and timely approach that responds to the changing environment.

Through a systematic, continuous policy process, NBS programs will be able to anticipate developments, as opposed to being reactive and heavily influenced by external drivers. A policy process that is developed in light of the issues raised here will help the programs to anticipate challenges and progress in a safe and effective way. A framework to facilitate this approach should be strived for. Only by making careful and considered decisions, can we ensure that NBS of the future is as successful as the existing programs we know today.

# AUTHOR CONTRIBUTIONS

MJ contributed to the conception and design of the work, analysis and interpretation of data, drafted the work, and revised it. KL contributed to the interpretation of data and critically revised the work. HK contributed to the design of the work and critically revised it. MC contributed to the conception and design of the work, interpretation of data, and critically revised the work. All authors approved the version to be published, and agreed to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

### REFERENCES


# SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at http://journal.frontiersin.org/article/10.3389/fpubh.2017.00053/ full#supplementary-material.

FIGURE S1 | Proportion of the affiliations of the first authors of the articles included in the review.

TABLE S1 | Articles included in the review (*n* **=** 26) with a short summary of each article, as well as the main stakeholder groups discussed, articles are sorted based on publication date to illustrate the shift in the technologies discussed.

screening: lessons for Australia. *Front Public Health* (2015) 3:214. doi:10.3389/ fpubh.2015.00214


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2017 Jansen, Lister, van Kranen and Cornel. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Outcomes of an International Workshop on Preconception Expanded Carrier Screening: Some Considerations for Governments

*Caron M. Molster1 \*† , Karla Lister1†, Selina Metternick-Jones2 , Gareth Baynam1,3,4,5,6,7,8, Angus John Clarke9 , Volker Straub10, Hugh J. S. Dawkins1,11,12,13 and Nigel Laing14,15*

*1Office of Population Health Genomics, Public Health Division, Department of Health Western Australia, Perth, WA, Australia, 2Sir Charles Gairdner Hospital, Perth, WA, Australia, 3Genetic Services WA, Perth, WA, Australia, 4School of Paediatrics and Child Health, University of Western Australia, Perth, WA, Australia, 5 Institute for Immunology and Infectious Diseases, Murdoch University, Perth, WA, Australia, 6 Telethon Kids Institute, University of Western Australia, Perth, WA, Australia, 7Western Australian Register of Developmental Anomalies, Perth, WA, Australia, 8Spatial Sciences, Department of Science and Engineering, Curtin University, Perth, WA, Australia, 9Division of Cancer and Genetics, School of Medicine, Cardiff University, Cardiff, UK, 10 Institute of Human Genetics, University of Newcastle upon Tyne, Newcastle upon Tyne, UK, 11Centre for Comparative Genomics, Murdoch University, Perth, WA, Australia, 12Centre for Population Health Research, Curtin University, Perth, WA, Australia, 13School of Pathology and Laboratory Medicine, University of Western Australia, Perth, WA, Australia, 14Centre for Medical Research, Harry Perkins Institute of Medical Research, University of Western Australia, Perth, WA, Australia, 15Neurogenetics Unit, Department of Diagnostic Genomics, PathWest Laboratory Medicine, Department of Health Western Australia, Perth, WA, Australia*

### *Edited by:*

*Rumen Stefanov, Institute for Rare Diseases, Bulgaria*

### *Reviewed by:*

*Arnold Bosman, Transmissible, Netherlands M. Rashad Massoud, University Research Co., USA*

### *\*Correspondence:*

*Caron M. Molster caron.molster@health.wa.gov.au*

*† These authors have contributed equally to this work.*

### *Specialty section:*

*This article was submitted to Public Health Policy, a section of the journal Frontiers in Public Health*

*Received: 29 September 2016 Accepted: 09 February 2017 Published: 24 February 2017*

### *Citation:*

*Molster CM, Lister K, Metternick-Jones S, Baynam G, Clarke AJ, Straub V, Dawkins HJS and Laing N (2017) Outcomes of an International Workshop on Preconception Expanded Carrier Screening: Some Considerations for Governments. Front. Public Health 5:25. doi: 10.3389/fpubh.2017.00025*

Background: Consideration of expanded carrier screening has become an emerging issue for governments. However, traditional criteria for decision-making regarding screening programs do not incorporate all the issues relevant to expanded carrier screening. Further, there is a lack of consistent guidance in the literature regarding the development of appropriate criteria for government assessment of expanded carrier screening. Given this, a workshop was held to identify key public policy issues related to preconception expanded carrier screening, which governments should consider when deciding whether to publicly fund such programs.

Methods: In June 2015, a satellite workshop was held at the European Society of Human Genetics Conference. It was structured around two design features: (1) the provision of information from a range of perspectives and (2) small group deliberations on the key issues that governments need to consider and the benefits, risks, and challenges of implementing publicly funded whole-population preconception carrier screening.

Results: Forty-one international experts attended the workshop. The deliberations centered primarily on the conditions to be tested and the elements of the screening program itself. Participants expected only severe conditions to be screened but were concerned about the lack of a consensus definition of "severe." Issues raised regarding the screening program included the purpose, benefits, harms, target population, program acceptability, components of a program, and economic evaluation. Participants also made arguments for consideration of the accuracy of screening tests.

Conclusion: A wide range of issues require careful consideration by governments that want to assess expanded carrier screening. Traditional criteria for government decision-making regarding screening programs are not a "best fit" for expanded carrier screening and new models of decision-making with appropriate criteria are required. There is a need to define what a "severe" condition is, to build evidence regarding the reliability and accuracy of screening tests, to consider the equitable availability and downstream effects on and costs of follow-up interventions for those identified as carriers, and to explore the ways in which the components of a screening program would be impacted by unique features of expanded carrier screening.

Keywords: carrier screening, expanded carrier screening, genetic carrier screening, government, public policy

### INTRODUCTION

Population-based screening programs are a public health approach implemented by many governments, which usually focus on a specific subpopulation defined by age, sex, and sometimes by ethnicity. Examples include newborn screening, prenatal screening, and screening for breast, bowel, and cervical cancer. In most countries, programs for screening the population to identify carriers of genetic diseases have not yet been adopted by governments. However, the possibility of offering such programs has become more salient in recent years in the wake of technology drivers such as the availability of relatively low cost massively parallel sequencing. Given this, a workshop was held with experts from a number of countries to identify key public policy issues related to preconception expanded carrier screening, which governments should consider before deciding whether to publicly fund such programs.

Carrier screening is a form of genetic testing that is used to determine a couple's risk of having a child with a recessive genetic disorder, when there is no *a priori* risk based on personal or family history (1). The process involves analyzing a sample of blood, or other biological material, for evidence of genetic mutations associated with autosomal-recessive and X-linked conditions. Carriers of autosomal-recessive conditions are people who have one copy of a gene mutation that can cause a condition in their offspring. If two carriers of a mutation in the same gene have children, their offspring have a one in four chance of having the condition. For women who carry a gene mutation associated with an X-linked condition, their children have a 50% chance of inheriting that gene mutation. Male children of these women are usually affected by the condition since they inherit their only copy of the X chromosome from their mother, while female children are usually protected from the condition by the inheritance of a second X chromosome from the father.

The members of a couple may undertake carrier screening simultaneously or sequentially, and screening can be performed in the preconception period or during pregnancy. If offered in the preconception period, carrier screening provides an opportunity to identify individuals who are at risk of having a child affected with a condition, before they become pregnant. Partners can then make informed reproductive choices including: not having a child at all; adoption; preimplantation genetic diagnosis (PGD) or *in vitro* fertilization (IVF) to avoid having an affected child; or to have a child naturally, with prior knowledge of their risk of having a child with a specific condition. Screening during the preconception period can be considered more favorable than during pregnancy, as it avoids expectant parents being faced with a prenatal diagnosis and possibly a decision on whether to selectively terminate an affected pregnancy as the only way of avoiding the birth of an affected child.

Until recently, carrier screening was generally available for one or a very small number of conditions within ethnic subgroups of the population that have a relatively high prevalence of those conditions. Examples include carrier screening offered to the Ashkenazi Jewish population for Tay–Sachs disease and to Mediterranean populations for beta-thalassemia (2–4). In more recent years, the possibility of expanded carrier screening has emerged. This involves simultaneously screening for carrier status for multiple diseases, which can be offered to all members of a pan-ethnic population, regardless of family history or ancestry (5–7). This has been made feasible through advances in genotyping and genetic sequencing technologies, which enable the concurrent evaluation of genetic mutations for large numbers of recessive diseases, for relatively low additional cost (5). Commercial companies have already developed expanded carrier screening tests that can screen for more than 100 recessive diseases at one time and these are being offered direct-to-consumers at a cost (8, 9). However, only consumers who are willing and can afford to pay for these screening tests are able to undertake them.

For carrier screening to be truly universal, it requires a publicly funded approach to ensure equity of access. To warrant public funding, there needs to be an evidence-based assessment of the appropriateness of expanded carrier screening against a range of predetermined criteria (10). This is because, like all populationbased screening programs, carrier screening has the potential to result in harm as well as benefits (11). Therefore, there must be a rigorous assessment before implementing a publicly funded program, to ensure that the benefits outweigh the harms. The "gold standard" criteria for evaluating population-based screening were developed for the World Health Organization over 40 years ago by Wilson and Jungner (12, 13). These screening pioneers suggested assessing evidence against 10 principles that explore four themes: (1) the condition being screened, (2) the test, (3) the treatment, and (4) the screening program.

While the Wilson and Jungner principles are the benchmark for government decision-making in screening, the ways in which they have been applied in practice vary across the globe. This is highlighted through a recent review of the criteria for deciding whether to introduce screening programs in Australia, Canada, Denmark, Finland, France, Germany, Italy, the Netherlands, New Zealand, Sweden, the UK, and the USA (14). Across these countries, Seedat et al. (14) identified 46 unique criteria that were associated with screening in general, most of which related to the screening program (27) as opposed to the condition (7), test (6), and treatment (6). Generally, the reason for expansion beyond the original Wilson and Jungner principles and variation in government decision-making criteria is to ensure processes sufficiently explore the issues most pertinent to each local setting (15, 16).

Despite their continued application to the assessment of screening programs worldwide, the Wilson and Jungner principles do not incorporate the full range of considerations for expanded carrier screening. A key limitation is that the criteria were developed without specific examination of the unique benefits, risks and harms that accompany genetic screening (10). These unique features include that most conditions screened for will be rare and that a genetic test is required that in most cases will produce personal information for both the individual having the test, as well as their genetic relatives. Further, in relation to carrier screening, it does not screen for the presence of a condition, but rather for the presence of gene mutations that might cause a condition in offspring, and the "treatment" following on from carrier screening is thus not an intervention in line with the classic definition. This latter point means that, should an individual be identified as a carrier, there is no treatment required since carriers are generally not affected by the condition for which they are a carrier. Instead, carriers are provided with information that will inform their reproductive choices.

While it is recognized that the Wilson and Jungner principles need further consideration in the context of expanded carrier screening, the Netherlands is so far the only country to have developed criteria specifically for assessing genetic screening including carrier screening (17). For other governments looking to develop relevant criteria, there is a lack of clear, consistent guidance in the literature. At the time our workshop was held, there were two statements of recommendations from professional bodies in the USA regarding expanded carrier screening along with a report by the UK Human Genetics Commission, and recommendations have subsequently been published by the European Society of Human Genetics (ESHG) (1, 7, 18, 19). However, the content of these documents varies considerably, highlighting the current lack of consensus. There is literature that has identified lessons learned and factors for the successful implementation of existing, usually ethnicity-based, carrier screening programs (20–23). Yet, to our knowledge, there has been no systematic evaluation of the extent to which these factors can inform decision-making criteria for assessing expanded carrier screening.

Given the lack of clear, consistent policy and academic guidance on the relevant criteria to assess expanded carrier screening, we believe there is a need for more research to inform governments of the issues they need to consider before implementing expanded carrier screening. In the first instance, best practice public policy development suggests that there is a need to understand the values, expectations, preferences, and concerns of key stakeholders (10). In line with this, we held an international workshop to gain an understanding of which issues experts considered were the most salient for governments to consider. We chose to focus on screening in the preconception period since this is considered to be the best timing for carrier screening to optimize reproductive choice (1, 24).

### METHOD

In June 2015, we held a satellite workshop at the ESHG Conference. To reach experts who might want to attend the workshop, a call for expressions of interest was posted on the ESHG Conference website. Invitations were also issued to known experts in fields related to expanded carrier screening, with a request to forward the invitation to other experts who might also be interested in the workshop. These communications included information on the objectives of the workshop. One of the objectives listed was "to contribute to the academic literature on expanded carrier screening," which would be by publishing the outcomes of the workshop in a peer-reviewed journal. The intention to contribute to the academic literature was reiterated in material sent to those who expressed an interest in attending the workshop, as well as at the beginning of the workshop while "setting the scene" and at the end of the workshop in relation to "next steps." By choosing to participate in the workshop, which was a public event, we assumed that participants were giving implied consent to the workshop outcomes being collated, analyzed, and published in academic literature. We considered this a sufficient level of consent, since there would be no identifying information published about the participants and the information obtained from the workshop was neither personal nor private, and could not be linked to an individual but rather were the outcomes of group discussions in a public setting.

The aim of the workshop was to identify expert opinions on the issues that governments should consider when deciding whether or not to implement preconception expanded carrier screening. To achieve this, the workshop was structured around two design features, these being information provision and small group deliberations. The morning session of the workshop involved a series of presentations from nine experts in different fields relevant to expanded carrier screening. These presentations were designed to expose workshop participants to information from outside their field of expertise and to a range of different perspectives on the key issues that government policymakers might face in relation to preconception expanded carrier screening (see **Table 1**). Providing a range of perspectives was considered important as it was recognized that these presentations would likely frame the subsequent deliberations of the small groups.

Following the presentations, participants worked in small groups of between six and eight people to discuss and develop answers to three questions, namely:

1. What are the key factors/issues that governments need to consider when deciding whether or not to implement publicly

### TABLE 1 | Range of perspectives covered in workshop presentations.


funded, whole-population preconception carrier screening programs?


The outcomes for each small group were written down by a scribe on feedback sheets and then orally reported back to the large group. The information recorded by the scribes was subsequently analyzed to identify common themes across all groups, and a summary of these findings is included in this paper.

# RESULTS

Forty-one people attended the workshop, representing a range of disciplines including human genetics, clinical genetics, medical genetics, genetic counseling, primary care, pediatrics, laboratory science, bioethics, population health policy, medical sociology, humanities, health economics, and public health genomics. Participants were largely employed in academia, public health systems, and commercial companies. The outcomes of the small group discussions are presented below, gathered together in line with the Wilson and Jungner themes of the condition, test, treatment, and screening program.

# Condition

There was general agreement that only "severe" or "serious" conditions should be included in preconception expanded carrier screening. However, there was concern about the lack of a consensus definition of "severe" and "serious," where the line between "mild" and "severe" should be and why. Some participants suggested "severe" disorders should be defined as early onset conditions where the child dies in the newborn or early childhood periods. There was a belief that screening individuals in line with this definition is (1) less ethically contentious than screening for conditions that do not result in early mortality; (2) avoids perceptions of eugenics; (3) has fewer implications for people living with diseases; and (4) is less vulnerable to the disability rights critique that carrier screening removes normal human diversity.

The lack of a definition of severity was perceived to create confusion regarding which conditions should be screened and the potential for competition between laboratories to offer more and more tests. There was a belief that commercial pressures and technology-led development of expanded carrier screening have the potential to result in a "slippery slope" of offering tests just because they are possible. Thus, participants perceived that a definition of severity, which could be used to determined which conditions to screen, may safeguard against the inappropriate extension of preconception expanded carrier screening programs to include more and more conditions, including "mild" conditions. Another question raised at the workshop was whether participants of a preconception expanded carrier screening program should be able to choose which conditions to be tested for, or whether they could be offered a panel with no option for selecting specific conditions to be tested for.

### Test

Participants argued that there is a need for robust up-to-date evidence about tests used in preconception expanded carrier screening. Specifically, evidence is needed on the following: the reliability of the tests (especially the negative predictive value) and their appropriateness for the population; the confidence with which the pathogenicity of the gene mutations has been established; clinical and analytical validity; and residual risk and explanations for variants of unknown significance. It was thought that the tests should have clinical value/utility and public acceptability, and economic factors including the cost of tests need to be considered in deciding which conditions to test for.

### Treatment

There were no issues raised in relation to "the treatment" of participants identified as carriers. This undoubtedly relates to the fact that carriers of autosomal-recessive diseases are unaffected by those diseases, and as such they do not need treatment. Carriers of X-linked diseases may sometimes be or become affected, depending upon multiple factors including the pattern of X chromosome inactivation.

### Screening Program

For participants, there were many uncertainties around preconception expanded carrier screening and the view that offering a program would be "uncharted waters" in the rapidly changing, dynamic field of genomics. A number of issues were raised regarding aspects of a screening program including the purpose, benefits and harms, target population, program acceptability, components of a program, and economic evaluation.

### Clarity of Purpose and Expected Benefits

Participants asserted that it would be important for governments to consider why preconception expanded carrier screening might be implemented as a publicly funded program, and what would be the objectives, motives, rationale, and goals of such a program. In their view, the purpose should be "well framed" with appropriate evidence of benefits and harms and ethical principles (e.g., autonomy and individual rights to informed healthcare choices and to make decisions with as much relevant information as possible). Discussions around the purpose and possible benefits of preconception expanded carrier screening focused on outcomes that might eventuate as a result of program participation. There were two clear perspectives on the overarching purpose or benefits, namely:


There were tensions between some participants, who held differing views on whether a reduction in disease burden should be a primary goal or a secondary benefit of preconception expanded carrier screening.

Other potential benefits identified by participants included:


a screening program, including the screening test as well as information provided prior to screening. It was suggested that a government funded program for the whole population would mean less likelihood of a user-pays system. This would enable lower socioeconomic and vulnerable populations to access the program, thereby minimizing health disparities and inequities in access. However, there was some doubt expressed as to whether a preconception expanded carrier screening program would really have the capacity to deliver equitable access.


### Potential Harms

There was a view that preconception expanded carrier screening may increase stigma and discrimination for those identified as carriers, those who opt not to undergo screening and those born with the conditions screened. It was also thought that people living with the conditions screened may be disadvantaged if a reduced incidence of these conditions reduces the incentives to develop treatments and therapies. According to participants, the rights of those who choose not to undergo preconception expanded carrier screening need to be respected. Further, participants felt that there would need to be adequate support for people regardless of the reproductive choice they make following preconception expanded carrier screening.

Participants argued that being identified as a carrier might have financial implications, for example, on insurance premiums, and psychological impacts, such as increased anxiety or false expectations or reassurance that they have been "promised" a healthy baby. Additionally, it was suggested that a governmentsponsored preconception expanded carrier screening program might foster the perception of genetic testing being "routine" and that screening is mandated by the government, and thus not voluntary. People may feel social pressure, coercion, or obligation to participate. It might raise questions of government-sponsored eugenics and "where will it end?" A challenge was seen to be providing information and counseling that is "neutral," particularly if there is a "strong incentive" to increase uptake to justify providing the program.

### Target Population

The key issues explored by participants included defining the target population, deciding at what age to offer screening, and determining whether screening would be offered to both members of a couple at the same time or to one member of a couple first and only to the other if the first one is a carrier. There was also a perceived need for governments to understand what would motivate or drive decision-making around participation in a preconception expanded carrier screening program. Some attendees reflected that the uptake rate for the program has implications for cost-effectiveness, and the extent to which the program could result in benefits such as reduced burden of disease and increased reproductive choice.

### Acceptability

Whether preconception expanded carrier screening was "acceptable" was raised by a number of participants. This included whether the general public and target population actually want government funded access to preconception expanded carrier screening, and whether clinicians and politicians would support such a program.

### Components of a Program

Participants thought that, if a preconception expanded carrier screening program was to be offered by governments, there should be sufficient resources to invest in an "end to end service." That is, the participants thought a program is not just about the screening test itself. Other components of a program that participants thought important for governments to consider included:


reproductive choices that carriers might want to pursue (e.g., access to IVF or PGD). It was proposed that consideration would need to be given to how the preconception expanded carrier screening program would connect with these other services and what the implications of the program would be for other parts of the health system. According to participants, a preconception expanded carrier screening program needs to be integrated with other programs so that there are no mixed messages and quality is not compromised.

• Collection of data on program participants and program operations. There were concerns around participant privacy and data ownership, protection, confidentiality, sharing, and access.

Questions were also raised by participants around workforce capacity and the impact that a program may have on healthcare providers, how best to start the program, and whether a pilot study would be appropriate.

### Economic Evaluation

The resources required to establish a high-quality "end to end" preconception expanded carrier screening program would likely be significant. Participants acknowledged that healthcare systems are experiencing both growing demand and funding ceilings. Consequently they argued there is a need for governments to prioritize spending and consider the opportunity costs of offering a preconception expanded carrier screening program, as opposed to any other program. It was thought that the establishment of a preconception expanded carrier screening program should not take resources away from providing adequate treatment of people who are living with the conditions screened.

There was a perceived need for governments to consider sustainability, cost–benefits, and cost-effectiveness, including direct, indirect, and intangible costs such as anxiety and other psychological harms. However, in making these suggestions, questions arose around how to best do this and what costs and savings should be considered and how can these be measured. In particular, participants questioned the best way to consider savings from reduced births of affected children and reduced long-term support for people living with severe disabilities. Government inertia and the difficulty of estimating costs were seen as inhibitors to investment in a preconception expanded carrier screening program.

### DISCUSSION

Workshop findings highlight that there is a wide range of issues that require careful examination by governments that are assessing preconception carrier screening, to ensure that the benefits outweigh the harms. Overlaying feedback from the workshop against the original Wilson and Jungner principles demonstrates that these are not a "best fit" for governments to assess preconception expanded carrier screening. Given that only Israel has implemented a national program of genetic carrier screening (25) and only the Netherlands has developed tailored decision-making criteria for genetic screening, governments across the globe have further work ahead of them to develop criteria that could inform whether to introduce preconception expanded carrier screening. The workshop findings provide a starting point for governments to begin addressing this policy gap. Specifically, a range of issues have been identified in relation to the conditions to be screened, the tests to be used, and the components that should be incorporated into a preconception carrier screening program.

When considering "the condition," workshop participants agreed that screening should only ever be offered for conditions that are "severe" or "serious." This aligns with the Wilson and Jungner concept of an "important health problem" (12). However, participants recognized that there is no clear, consensus definition of what constitutes "severe," with different suggestions existing in the literature (26, 27). Without a clear definition, it is difficult to determine the scope of conditions that should be considered for inclusion in an expanded screening program. Indeed there is marked disparity in the composition of currently available laboratory panels of conditions for expanded carrier screening (28). From a program perspective, a definition is essential because the number and type of conditions screened has follow-on implications for how the program is implemented. Specifically, it will impact upon components of the program such as information and consent requirements, as well as counseling requirements and treatment or follow-up options. Further, as outlined by workshop participants, the definition of "severe" and thus the conditions screened are likely to impact upon public and clinical perceptions of the program. If a clear definition is not developed, and parameters and safeguards not set, there is the potential for trust in a preconception expanded carrier screening program to be undermined. Therefore, a body of work is needed to consider the definition of "severe." The definition offered by workshop participants is a valid starting point: "early onset conditions where the child dies in the newborn or early childhood period." In excluding conditions that do not result in early mortality, this definition was perceived to be less ethically contentious and to have less of an impact on disability rights.

The workshop discussions around the Wilson and Jungner criteria for "the test" were aligned with the literature reviewed, in terms of the need for the test to be accurate (see **Table 2**). This was in relation to both sensitivity (low false-positives) and specificity (low false-negatives) and also the ability to determine meaningful residual risk for individuals who test negative. The issue of the cost-effectiveness of the tests was also raised by workshop participants, and this would likely be a key consideration for governments within the context of the overall cost-effectiveness of a program.

Workshop participants did not raise any considerations for government in relation to Wilson and Jungner's theme of "the treatment." This demonstrates a lack of salience of this issue for the participants. This could be because care and follow-up for carrier screening does not meet the traditional definition of treatment, since such screening does not result in the identification of people who have conditions. When coupled with the fact that much of the workshop discussion focused on elements of the screening program, the absence of discussion on treatment could also reflect heightened interest in the issue of "how to screen" as opposed to "whether to screen." Nonetheless, the relevance of the Wilson and Jungner criteria associated with "the treatment" may be queried in relation to preconception expanded carrier screening. The question then becomes whether there are more relevant dimensions that should replace the treatment criterion. For example, instead of the need for treatment being available, should the criterion be to recommend that "interventions are available"? Should the issue of interventions be framed within the context of the reasons for participation in preconception expanded carrier screening, such as "a decision should need to be taken by the person screened" (20) or that "screening should potentially influence the reproductive choices made by at-risk participants" (19)?

In our view, it is essential that governments consider the availability of interventions for preconception expanded carrier screening, and the downstream effects on and costs of providing such interventions. In order for a screening program to be effective and cost-effective, there must be an intervention that can lead to better health outcomes for an individual. Further, the intervention must be effective, available, easily accessible, and acceptable to individuals within the target population (44). Importantly, government consideration should be given to the fact that interventions for individuals identified as carriers are not currently always equitably accessible. For example, IVF and PGD are provided in the private sector within Australia, meaning there can be significant costs to individuals, which may limit access for citizens in lower socioeconomic groups (45, 46). This means that Australia, and other countries where these healthcare services are not equitably accessible, would need to carefully consider its capacity to provide the follow-up interventions required for a population-based approach to preconception expanded carrier screening. Further in relation to equity, consideration would also need to be given to the quality of PGD and IVF services, particularly given concerns regarding false-positive screening results (47, 48), and the fact that the genetics workforce is not keeping pace with the demand for these services (49).

As with the review by Seedat et al. (14) of population-based screening criteria adopted across a number of countries, the findings of the workshop were more likely to focus on considerations relating to "the screening program" as opposed to the condition, test, and treatment. The issues identified in relation to the screening program were largely those that would be relevant to all screening programs, not only preconception expanded carrier screening. These issues included the need for a program that is not just about the test, but rather includes components such as the provision of information and education, informed consent processes, genetic counseling, clear care pathways, data collection, and economic evaluation. There is a need for further exploration of these issues to determine in what ways, if any, these program components would be impacted by the unique features of expanded carrier screening. For example, there was recognition by workshop participants that consent should be informed, but what would be the impact on the ability to obtain informed consent, when expanded carrier screening would test for multiple conditions simultaneously? How would informed consent be defined in this context? Related to this issue, further investigation should examine the impact of preconception expanded carrier screening on the complexity, volume, and financial implications of pretest and posttest counseling (28).

TABLE 2 | Coverage of issues referred to in literature.


In addition to the work that is needed by governments to develop robust decision-making models for assessing preconception expanded carrier screening, researchers should begin to explore a number of issues raised by the workshop participants to inform and complement work in the public policy space. Within local contexts, "societal readiness" for preconception expanded carrier screening could be investigated. While several potential benefits of expanded carrier screening were identified at the workshop, a number of potential harms were also discussed, including concerns around discrimination, eugenics, and people refusing to participate in a program, which could undermine the cost-effectiveness of program delivery. While the UK Human Genetics Commission (18) concluded "there are no specific ethical, legal or social principles that would make preconception genetic testing within the framework of a population screening program unacceptable" (p. 1), this needs to be explored by experts in other local contexts, including the contention expressed by workshop participants around the primary purpose of this screening being reproductive choice and/or reduced burden of disease (50). Further to this, consultation and engagement methodologies could be developed and implemented to assess stakeholder acceptability of preconception expanded carrier screening, including the public, target population, disease associations, clinicians, and laboratory staff. For the target population, this should also include investigation of likely uptake and postscreening decisions around reproductive choices. A recent study in the Netherlands has made initial contributions in the area of citizens/user perceptions of expanded carrier screening (51), while a qualitative study in Sweden has examined healthcare professionals' views on preconception carrier screening (52). This line of work must be extended to further local contexts.

This paper has several limitations. Workshop participants were self-selected and may not be a representative sample of experts relevant to preconception expanded carrier screening. This may impact on the generalizability of the workshop findings. It is also important to note that, in relation to the literature on existing carrier screening programs and recommendations by professional bodies regarding expanded carrier screening, not all of the issues raised in the literature as key success factors or recommendations for implementation were addressed by the participants (see **Table 2**). During the workshop, participants were exposed to a range of perspectives related to preconception expanded carrier screening, which framed the subsequent discussions, and not all perspectives were covered by the workshop presentations. Participant exposure to other perspectives may have resulted in different workshop outcomes. Finally, as noted above, the workshop findings were reasonably high-level and did not drill down to deeper levels of analysis regarding the key issues. Therefore, while the findings provide useful guidance, a more precise exploration of each issue may be required to develop a comprehensive view of the factors governments need to consider when deciding whether to implement preconception expanded carrier screening.

### REFERENCES


The international workshop was an important opportunity for expert stakeholders in the field of preconception expanded carrier screening to come together to share their values, experiences and knowledge. The workshop outcomes identified benefits, harms, and other key issues that governments should consider when assessing whether to publicly fund preconception expanded carrier screening programs. This is particularly useful since most countries globally do not have decision-making frameworks related to emerging genetic screening options and are at the formative stage of making assessments about preconception expanded carrier screening.

### AUTHOR CONTRIBUTIONS

All the authors contributed substantially to the conception and design of the workshop, the acquisition and interpretation of data, have given final approval for the manuscript to be published, and agreed to be accountable for all aspects of the work. CM and KL undertook the data analysis and drafted the manuscript. GB, SM-J, AC, VS, HD, and NL critically revised the manuscript for important intellectual content.

### ACKNOWLEDGMENTS

The authors would like to acknowledge the workshop participants for their contributions to the workshop reported in this paper.

### FUNDING

This work was financially supported by the Office of Population Health Genomics, Public Health Division, Department of Health Western Australia; Harry Perkins Institute; Life Letters; the European Union Seventh Framework Programme (FP7/2007- 2013) under grant agreement No. 305444 (RD-Connect) and 305121 (Neuromics); Australian National Health and Medical Research Council (NHMRC) APP1055319 under the NHMRC– European Union Collaborative Research Grant; and NHMRC Principal Research Fellowship APP1117510.

assay for pan-ethnic screening of carrier status. *J Mol Diagn* (2014) 16:350–60. doi:10.1016/j.jmoldx.2013.12.003


carrier screening for fragile X syndrome. *Am J Med Genet A* (2013) 161:48–58. doi:10.1002/ajmg.a.35674


**Conflict of Interest Statement:** The authors acknowledge the financial contribution of Life Letters to the implementation of the workshop reported in this paper.

*Copyright © 2017 Molster, Lister, Metternick-Jones, Baynam, Clarke, Straub, Dawkins and Laing. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Precision in Setting Cancer Prevention Priorities: Synthesis of Data, Literature, and Expert Opinion

*Jennifer Girschik <sup>1</sup> \*, Laura Jean Miller <sup>1</sup> , Tony Addiscott <sup>2</sup> , Mike Daube <sup>3</sup> , Paul Katris <sup>4</sup> , David Ransom <sup>5</sup> , Terry Slevin <sup>6</sup> , Tim Threlfall <sup>7</sup> and Tarun Stephen Weeramanthri <sup>1</sup>*

*1Public Health Division, Department of Health Western Australia, Perth, WA, Australia, 2Health Consumers Council, Perth, WA, Australia, 3 Faculty of Health Sciences, Curtin University, Perth, WA, Australia, 4Western Australian Clinical Oncology Group, Perth, WA, Australia, 5Cancer and Palliative Care Network, Department of Health Western Australia, Perth, WA, Australia, 6Cancer Council Western Australia, Perth, WA, Australia, 7Western Australian Cancer Registry, Department of Health Western Australia, Perth, WA, Australia*

Cancer will continue to be a leading cause of ill health and death unless we can capitalize on the potential for 30–40% of these cancers to be prevented. In this light, cancer prevention represents an enormous opportunity for public health, potentially saving much of the pain, anguish, and cost associated with treating cancer. However, there is a challenge for governments, and the wider community, in prioritizing cancer prevention activities, especially given increasing financial constraints. This paper describes a method for identifying cancer prevention priorities. This method synthesizes detailed cancer statistics, expert opinion, and the published literature for the priority setting process. The process contains four steps: assessing the impact of cancer types; identifying cancers with the greatest impact; considering opportunities for prevention; and combining information on impact and preventability. The strength of our approach is that it is straightforward, transparent and reproducible for other settings. Applying this method in Western Australia produced a priority list of seven adult cancers which were identified as having not only the biggest impact on the community but also the best opportunities for prevention. Work conducted in an additional project phase went on to present data on these priority cancers to a public consultation and develop an agenda for action in cancer prevention.

### *Edited by:*

*Ross Bailie, University of Sydney, Australia*

### *Reviewed by:*

*Gerjo Kok, Maastricht University, Netherlands Edward Broughton, University Research Co., United States*

*\*Correspondence: Jennifer Girschik jennifer.girschik@health.wa.gov.au*

### *Specialty section:*

*This article was submitted to Public Health Policy, a section of the journal Frontiers in Public Health*

*Received: 11 October 2016 Accepted: 15 May 2017 Published: 06 June 2017*

### *Citation:*

*Girschik J, Miller LJ, Addiscott T, Daube M, Katris P, Ransom D, Slevin T, Threlfall T and Weeramanthri TS (2017) Precision in Setting Cancer Prevention Priorities: Synthesis of Data, Literature, and Expert Opinion. Front. Public Health 5:125. doi: 10.3389/fpubh.2017.00125*

Keywords: cancer prevention, cancer control, preventability, prioritization, policy, public health

# INTRODUCTION

Cancer is a major cause of ill health and death in Western Australia (WA). Almost 12,000 Western Australians are diagnosed with cancer and around 4,000 lose their lives to the disease every year (1). In addition, approximately 85,000 non-melanoma skin cancers are treated each year in WA (2). Consequently, cancer is also a major source of government expenditure. The most recent national report estimates that total health system expenditure on cancer (excluding population screening programs) was \$4.5 billion in 2009 (3). Recent studies have estimated that 30–40% of cancers could be prevented (4, 5), which, if achieved, would save much of the pain, anguish, and cost associated with treating cancer.

The Western Australian State Health Department is responsible for establishing policies for cancer prevention in the state. As in most government departments, this responsibility is being enacted in an environment that demands a robust evidence base along with collaboration, transparency, and accountability in decision-making, all within increasing fiscal constraints. This can be a difficult undertaking in any area, but can be uniquely complicated for cancer prevention.

Specifically, the evidence base for prioritizing between cancer prevention targets is not always obvious. Determining the relative impact of the different types of cancer would, in theory, highlight the areas where resources are most needed. However, cancer impact can be measured across a range of domains including incidence, morbidity, duration, mortality, years of potential life lost and cost, and the choice of domain can impact significantly on the relative priority of different cancers. In addition, impact alone is not sufficient to drive public health expenditure if scientifically proven prevention strategies are not available (6). Additionally, cancer prevention involves a range of organizations, services, and expertise, all of which have their own perspectives, priorities, and funding constraints, and may not easily sit together. Moreover, public preferences and community values play a role in the allocation of public resources, for both practical reasons around policy effectiveness and ethical reasons around civic participation and democratic legitimacy (7). In some instances, public opinion and behavior can be at odds with the available evidence on cancer prevention (8–11). For example, screening programs regularly report participation rates well below the target rate, despite the programs being publicly funded and having well-documented population benefits (12–14). Above all, there is significant pressure to ensure the best return for investment.

In this context, how do policy makers appropriately appraise and balance these complex and sometimes competing demands to ensure policy development that will maximize public good and ensure the best return for the limited cancer prevention dollar?

This was a challenge faced in the Western Australian Department of Health (WA DoH); the solution was to develop a method for, and undertake a process of, priority setting through a project called *Priorities and Preferences for Cancer Control in Western Australia*. The project was conducted by the Public Health Division under the auspices of the Chief Health Officer. The ultimate aim of the project was to develop a list of cancer prevention priority areas in WA informed by evidence, expert opinion, and public priorities and preferences. This list could then serve as the basis for more technical discussions around appropriate prevention/health promotion strategies, including issues of costeffectiveness, delivery, and evaluation. To achieve this aim, the project was conducted in two phases. The first phase identified a priority list of cancers with both the biggest impact on the WA community and the best opportunities for prevention. The second phase involved using evidence of the impact and preventability of our priority cancers to engage the public in a discussion around their preferences for cancer control in the state.

This paper describes the first phase of the project, the process of identifying and prioritizing cancers. It is hoped that our perspective paper will inform others who may also be challenged by the process of priority setting in cancer prevention.

### ESTABLISHING AN EXPERT ADVISORY GROUP

The initial step in this project was the establishment of an expert advisory group to provide oversight and review of the analysis and interpretation undertaken by the project staff. The expert advisory group consisted of eight local cancer experts representing a range of relevant disciplines, including prevention, health promotion, clinical oncology, cancer registration, epidemiology, consumer advocacy, and policy development. The advisory group met regularly and reviewed all aspects of the project from analysis through community consultation and finally to dissemination of the project findings.

# ACCESSING AND ANALYZING DATA

The second task involved intensive data review and analysis in an attempt to describe the impact of cancer on the WA community. WA, like most Australian states, has a dedicated population-based cancer registry whose legislative mandate is to collect and collate information on diagnoses and deaths from cancer in WA (1). De-identified data provided by the WA Cancer Registry (WACR) was central to much of the analysis describing the impact of cancer in WA. Approval for the project and access to the de-identified data were granted by the WACR Data Custodian, in accordance with our institutional ethics committee terms of reference (15).

Statistics for 55 different cancer types recorded in the WACR were generated by the project team including age-standardized incidence and mortality rates, person years of life lost, and hospital length of stay for 2012 (data extracted December 2013). In addition, relative 5-year survival was calculated for three 5-year time periods: 1986–1990, 1996–2000, and 2006–2010. These statistics were calculated for males and females combined and separately. In addition, trends over time were calculated for incidence, mortality, and survival.

Data from outside the scope of the WACR was also sought, including Burden of Disease and costing information. Disabilityadjusted life years for 2012 for WA were available as projections from the 2006 WA Burden of Disease study (16). Burden of Disease data were extracted for men and women combined and separately. Estimated lifetime treatment cost per case and total health system cost were also extracted from a national report produced by the Australian Institute for Health and Welfare for males and females combined (17). However, there were limitations to both Burden of Disease and cost data as information was not available for all the cancer types recorded in WACR and some classifications of cancer types were different (17, 18).

International data were also sought to place the WA results in a global context. For this purpose, data from the International Agency for Research on Cancer project Globocan 2012 was used, as it is the global source with the most reliable data (19). This World Health Organization initiative produces estimates with the most recent data available at the time. Using the Globocan 2012 project's age-standardized incidence and mortality rates for Organisation for Economic Cooperation and Development countries, of which Australia is a member, the complement of the mortality to incidence ratio (MIR) was calculated. The MIR is the best available method for international comparisons of cancer survival. Again, however, Globocan did not have information available for all the cancer types recorded in the WACR (19).

While an extensive number of data analyses were conducted, they all used standard methods and therefore can easily be replicated. The more difficult phase was determining how to compare and prioritize the varied analysis to define the cancers that have the biggest effect on the community. Should we prioritize a cancer with a high number of new cases but a low death rate above or below a cancer with a low number of cases but a high death rate? Should we prioritize a cancer with a small number of cases but a high treatment cost above or below a cancer with a short hospital length of stay but a large number of cases? And who should be making this decision—an epidemiologist, a policy maker, or the community?

### ASSESSING CANCER IMPACT

As a first step in assessing the impact of individual cancers, the expert advisory group reviewed the cancer types recorded in the WACR. From this review, the classification of esophageal and stomach cancer was changed. Whereas the WACR classified esophageal and stomach cancer separately (ICD-O codes C15 and C16 respectively), expert opinion from the advisory group argued for their amalgamation on the grounds that in Western Societies they most commonly occur at the gastroesophageal junction making the classification of "esophageal" or "gastric" somewhat arbitrary. Statistics were then recalculated for esophageal and stomach cancer combined.

As the second step in assessing the impact of individual cancers, the expert advisory group reviewed the available data to decide which variables might best reflect the overall impact of each cancer on the general population. There was consensus among the advisory group that, in general, incidence and mortality were the strongest indicators of the impact of cancer on people in the community, which is consistent with the conclusions of earlier studies that had similar aims (6, 20). In addition, the incidence and mortality datasets were derived from the WACR and we were confident in the accuracy and completeness of these datasets for the full scope of cancers. Furthermore, it was felt that incidence and mortality would be the easiest to interpret for a general audience. This was important as the second phase of this project was to present the data to the general public and facilitate a discussion around setting priorities and preferences for cancer prevention in the state. Although Burden of Disease data are considered a very good tool for priority setting, in this case, the age of the source data and the reliance on projections made it less than ideal, and there was also some concern that it would not be easily understood by a general audience.

### IDENTIFYING THE CANCERS WITH THE GREATEST IMPACT

Using incidence and mortality data, the list of 55 cancers was refined based on meeting at least two of four possible criteria (**Figure 1**):


For non-sex-specific cancers (excluding breast cancer), the cancer had to meet the minimum criteria in both males and females to be considered for inclusion.

Applying these criteria reduced the list of 55 cancers to 12 cancers considered to have a large impact on the WA community: breast, cervical, ovarian, uterine, prostate, colorectal, leukemia, lung, lymphoma, melanoma, pancreatic, and esophageal/stomach cancer. The unknown primary site category also met the criteria, but this category was excluded from the list due to a lack of specificity regarding these cancers (21).

# CONSIDERING OPPORTUNITIES FOR PREVENTION

The project then considered the cancers in terms of their potential preventability. However, operationalizing the concept of preventability also proved challenging. What was meant by a "preventable" cancer? Did this mean completely preventable, mostly preventable or just partially preventable? Should it be preventable in the whole population or only in specific high-risk groups? How would population-based screening and early detection as forms of secondary prevention fit into any "preventability" definition? And importantly, at what level of "preventability" should public health action be triggered?

The scientific literature was reviewed for paradigms of preventability. Although a body of literature is available, only one paper was located that attempted to operationalize the concept (8, 22–24). Smith et al. (8) defined three categories of preventability: "*all or mostly preventable*" if 50% or more of cases were considered preventable, "*sometimes preventable*" for 20–49% and "*rarely preventable*" for less than 20%, although it is not clear what data were used to estimate the proportion of preventability. No papers were identified that provided a recommendation for the level of preventability that should trigger public health action. The resulting discussion among the expert advisory group and project staff led to the development of three categories:


### COMBINING INFORMATION ON IMPACT AND PREVENTABILITY

For the 12 high impact cancers, the literature was reviewed to identify: (i) strong evidence and/or scientific consensus regarding modifiable risk factors; (ii) data on population attributable fractions

for the risk factors that might be relevant to the WA context; (iii) information on population-based screening programs that were currently operating or proposed for WA; and (iv) any evidence to support specific screening programs in particular populations and/or high-risk groups.

The International Agency for Research on Cancer was the source for much of the information on risk factors (25–28), while data for the population attributable fractions came primarily from one large study conducted in the UK (5). Population-based screening programs for bowel, breast, and cervical cancer are operating in WA (12–14). Evidence was found to suggest possible screening strategies for three cancers, lung (in heavy smokers), prostate, and esophageal/stomach (among certain Asian populations and/or groups with high consumption of salt-preserved foods) (29, 30).

This literature review process was used to classify the 12 cancers in terms of the three categories of preventability for the WA context. Cancers with clearly established strategies for primary or secondary prevention (category 1) included breast, cervical, colorectal, lung, and melanoma. Cancers with potential for targeted primary or secondary prevention in population subgroups (category 2) included prostate and esophageal/stomach. Cancers with no clearly established strategies for primary or secondary prevention (category 3) included uterine, pancreatic, ovarian, and leukemia. The seven cancers that were determined to have both a high impact and also some potential for prevention (category 1 and 2 cancers) became our priority cancers for phase two of the project.

The source data behind identifying the cancers with the greatest impact and combining impact and preventability is summarized in two publically available reports called "Choosing Cancers for Your Say on Cancer in WA" and "The data behind Your Say on Cancer in WA" which are both available from: http://www.healthywa.wa.gov.au/yoursayoncancerwa.

# FURTHER WORK

The second phase of the project went on to present data on our seven priority cancers to a public consultation on preferences for cancer control. The results of this consultation and the agenda for action that arose out of it were published as the Chief Health Officer's Report, "*Priorities and Preferences for Cancer Control in Western Australia*" (31) and are available online at: http://www. healthywa.wa.gov.au/yoursayoncancerwa.

# DISCUSSION

The value of this paper lies in describing a method for priority setting that attempts to identify and unravel some of the factors that contribute to cancer prevention policy-making, but which are rarely made explicit. For example, this project examined a wide range of statistics that could be used to determine the relative impact of different cancers and attempted to explicitly justify why certain statistics were chosen over others. In addition, the project sought to explicitly define what might constitute a "preventable" cancer from a policy point of view to explain why some cancers were prioritized over others.

The result is a list of 12 high impact cancers, seven of which have been identified as having at least some potential for prevention (categories 1 and 2) in the Western Australian context. The collaborative nature of our project has ensured that the selection of these seven cancers as priority cancers has widespread acceptance among the cancer prevention community in WA. The second phase of this project involved community consultation around the seven potentially preventable cancers to get a sense of the public's knowledge of, and their preferences and priorities for, public health action addressing these specific cancers/risk factors. This process found that respondents were generally surprised by the preventability of cancer overall, and in particular, the preventability of bowel cancer and cervical cancer. Red and processed meat intake, alcohol consumption, and salt were also identified as clear areas for increased community education in the immediate future (31). We have already engaged with our health promotion partners in reviewing the evidence-based messages for these areas. Work to determine the most efficient and cost-effective prevention strategies in these areas is an important next step and the process of developing, delivering, and evaluating programs will be ongoing.

The strength of this project is that the methods can be readily replicated by others who seek to identify cancer prevention priorities in different settings. In addition, the involvement of an expert advisory group at every stage was beneficial in providing context and perspective to cancer-related data, as well as building relationships across the relevant disciplines. Access to high-quality data that were specific to the WA community for a large range of cancer types was a strength of our assessment of cancer impact.

Limitations included the following: first, a lack of certain types of information, in particular up-to-date Burden of Disease information and costings for some of the less common cancers. Second, the arbitrary cutoff of including only the top 12 for incidence and mortality rate, however the ranking of the top 25 cancers (mortality and incidence, male and female separately) was documented for transparency and is available from http://www.healthywa.wa.gov.au/yoursayoncancerwa. Third, the absence of any clear definition of preventability in cancer prevention. The use of a 50% cutoff for known modifiable risk factors in defining a preventable cancer was based on limited evidence, and the heavy reliance on UK data sources for the population attributable fractions was suboptimal. The subsequent publication of a study estimating Australian population attributable fractions supports some, but not all, of the UK study's findings. Most notably, the Australian study estimated that only 63% of Melanomas were due to exposure to UV radiation, as opposed to 86% of melanomas in the UK (4, 5). However, even the lower population attributable fraction of 63% would not have changed the classification of melanoma in this study as a category 1 "preventable" cancer. Differences also occurred in population attributable fractions for ovarian cancer and leukemia, but these were of a smaller magnitude, and again would not have changed the classification of the cancers in this study (4, 5).

An important gap in this research topic more broadly is the lack of discussion around what level of "preventability" should trigger public health action for cancer. This is in contrast to other areas within public health, for example, infectious diseases management, where there are published guidelines for the level of public health threat which stimulates action (32, 33).

# CONCLUSION

Cancer will continue to be a leading cause of ill health and death, unless we can capitalize on the potential for 30–40% of these cancers to be prevented. In this light, cancer prevention represents an enormous opportunity for public health. However, there is a challenge for governments in prioritizing cancer prevention targets, especially in financially constrained environments. This paper describes a process for synthesizing information from detailed cancer statistics, expert opinion, and the published literature to help identify cancer prevention priorities. The strength of our approach is that it is transparent, reproducible, and applicable to other settings. The result is a list of 12 high impact cancers in adults, 7 of which have been identified as having at least some potential for prevention in the Western Australian context. This list has been used to justify and drive further work around cancer prevention opportunities and priorities in WA. Broader discussion around defining preventability and initiating prevention actions for cancer would contribute to international efforts to reduce the incidence and impact of these diseases globally.

# ETHICS STATEMENT

Approval for the project and access to the de-identified data were granted by the Western Australian Cancer Registry Data Custodian, in accordance with our institutional ethics committee terms of reference.

# AUTHOR CONTRIBUTIONS

JG and LM assisted with study design, conducted the study including data extraction and analysis, reviewed the literature, and wrote the manuscript. TW conceptualized the project, chaired the expert advisory group, and reviewed the manuscript. TA, MD, PK, DR, TS and TT constituted the expert membership of the advisory group and provided advice and supervision on study design and analysis, data interpretation, and reviewed the manuscript.

### ACKNOWLEDGMENTS

The contribution of staff who contributed to all aspects of the project throughout its duration is greatly appreciated especially: Bridget Egan, David Gibson, Andrew Jardine, Colleen Koh, Stacey-Mae Prokopyszyn, Peter Somerford, Wendy Sun, and Marcia Van Zeller. In addition, we acknowledge the contribution

### REFERENCES


of Professors Lin Fritschi and Simone Pettigrew as members of the expert advisory group.

# FUNDING

This project was conducted within Western Australian Department of Health operational resources.


**Conflict of Interest Statement:** The authors declare that the project was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2017 Girschik, Miller, Addiscott, Daube, Katris, Ransom, Slevin, Threlfall and Weeramanthri. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Big Data's Role in Precision Public Health

### *Shawn Dolley\**

*Cloudera, Inc., Palo Alto, CA, United States*

Precision public health is an emerging practice to more granularly predict and understand public health risks and customize treatments for more specific and homogeneous subpopulations, often using new data, technologies, and methods. Big data is one element that has consistently helped to achieve these goals, through its ability to deliver to practitioners a volume and variety of structured or unstructured data not previously possible. Big data has enabled more widespread and specific research and trials of stratifying and segmenting populations at risk for a variety of health problems. Examples of success using big data are surveyed in surveillance and signal detection, predicting future risk, targeted interventions, and understanding disease. Using novel big data or big data approaches has risks that remain to be resolved. The continued growth in volume and variety of available data, decreased costs of data capture, and emerging computational methods mean big data success will likely be a required pillar of precision public health into the future. This review article aims to identify the precision public health use cases where big data has added value, identify classes of value that big data may bring, and outline the risks inherent in using big data in precision public health efforts.

Keywords: precision public health, big data, computational epidemiology, infectious disease surveillance, precision population health

# INTRODUCTION

This review article aims to identify the precision public health use cases where big data has added value, identify classes of value that big data may bring, and outline the risks inherent in using big data in precision public health efforts. This article focuses on surveying current practice, with a breadth of examples. The article does not include a critical review of the methods included in the big data and precision public health published research. It is hoped this article may pave the way for future researchers to measure the strengths and weaknesses, robustness, and validity of individual studies, interventions and outcomes. With the breadth of practice defined here, such follow-on in-depth critical review could identify precision public health best practices in design, methods, implementation, and analysis.

### METHODS

The terms "big data" and "precision public health"—two relatively new disciplines—often do not appear in the nomenclature of contemporary public health interventions and studies. Searching for the terms "big data" or "precision public health" returns a small fraction of the actual activity. Based on the lack of existing reviews and the complexity in identifying the intersection of precision public health and big data, the rationale of this narrative review article is to find examples of the use of big data in implementations of precision public health published in peer-reviewed academic journals.

### *Edited by:*

*Hugh J. S. Dawkins, Government of Western Australia Department of Health, Australia*

### *Reviewed by:*

*Gareth Baynam, Genetic Services of Western Australia, Australia David Preen, University of Western Australia, Australia Ori Gudes, University of New South Wales, Australia Emmanuel D. Jadhav, Ferris State University, United States*

> *\*Correspondence: Shawn Dolley shawn.dolley@gmail.com*

### *Specialty section:*

*This article was submitted to Public Health Policy, a section of the journal Frontiers in Public Health*

*Received: 25 July 2017 Accepted: 20 February 2018 Published: 07 March 2018*

### *Citation:*

*Dolley S (2018) Big Data's Role in Precision Public Health. Front. Public Health 6:68. doi: 10.3389/fpubh.2018.00068*

The author (a) reviewed a large number of public health studies to look for precision and big data, as well as related and follow-on studies, (b) identified and searched for specific types of big data being applied to public health, and (c) searched for uses of data in precision public health to identify big vs. small data—always using the definition of these terms rather than relying on the presence of the terms "big data" or "precision public health."

Searches were performed using Google Scholar and Google. Examples of public health implementations—with and without big data—and precision public health implementations—with and without big data—only qualified for this article if they were published in peer-reviewed journals. In the presence of multiple qualifying examples, best attempts were made to limit examples to a single citation. In the presence of multiple examples, to reduce risk of bias and attempt to identify the most robust examples, the examples selected were those with the (a) most clearly identifiable public health use case, (b) clearest use of big data, (c) most "precision," (d) in journals with the highest impact factor, that were (e) the most recent—and in that order of priority. Searches were concluded by July 20, 2017.

Search terms used were as follows:


Google Scholar also provides lists of more recent studies which have cited the current study. These lists were reviewed to identify if more recent studies existed that provided better examples of pertinent characteristics.

This method has a number of limitations. Google Scholar has limitations, including relying on the end user to discriminate which studies returned are from peer-reviewed journals. No review protocol exists independent of this review article. No study selection or summary measures were collected, and no metaanalysis was performed. No study characteristics were collected. No assessment of the validity of included studies was performed beyond their inclusion in peer-reviewed academic journals. No assessment of cumulative level bias risk was performed. No additional analysis methods were used. The selection of studies included was not independently reviewed. The scope of this narrative review precludes enumerating additional limitations. Limitations aside, the result of these methods is a collection of studies or programs where big data and precision public health as these terms are defined in this article—are being used together. Through implementing these methods, this review article is the first to identify the scope and scale of big data's role in precision public health, highlight classes of innovation, and identify the risks of using big data in this field.

# PRECISION PUBLIC HEALTH

"Precision public health is a new field driven by technological advances that enable more precise descriptions and analyzes of individuals and population groups, with a view to improving the overall health of populations" (1). The term was coined in Australia by Dr. Tarun Weeramanthri in 2013, and first found in print in 2014 (2). Dr. Muin Khoury and Dr. Sandro Galea describe precision public health as "improving the ability to prevent disease, promote health, and reduce health disparities in populations by applying emerging methods and technologies for measuring disease, pathogens, exposures, behaviors, and susceptibility in populations; and developing policies and targeted implementation programs to improve health" (3). Precision public health leverages big data and its enabling technologies to achieve a previously impossible level of targeting or speed (4). The Bill & Melinda Gates Foundation adds that precision public health "requires robust primary surveillance data, rapid application of sophisticated analytics to track the geographical distribution of disease, and the capacity to act on such information" (5). Precision public health works because "more-accurate methods for measuring disease, pathogens, exposures, behaviors, and susceptibility could allow better assessment of population health and development of policies and targeted programs for preventing disease" (4). Arnett & Claas add "Precision public health is characterized by discovering, validating, and optimizing care strategies for well-characterized population strata" (6). As for the size of the strata, Colijn et al. state "precision approaches must act at the right scale, which will often be intermediate—between "one size fits all" medicine and fully individualized therapies" (7).

The prominence of the term "precision" in the new practices of precision medicine and precision public health will invariably raise questions about their similarity. While precision medicine requires genetic, lifestyle, and environmental data to meet goals of more customized and potentially individualized clinical treatments, precision public health is about increased accuracy and granularity in defining public cohorts and delivering target interventions of many types (4–6). Precision medicine and precision public health are independent.

### BIG DATA IN HEALTHCARE AND PUBLIC HEALTH

Big data has recently become a ubiquitous approach to driving insights, innovation and new interventions across economic sectors (8, 9). The United States National Institute of Standards and Technology defines big data as follows: "Big Data consists of extensive datasets—primarily in the characteristics of volume, variety, velocity, and/or variability—that require a scalable architecture for efficient storage, manipulation, and analysis," (10). Decreases in costs of technology enabled the big data phenomenon to emerge (11). Data of "such a high volume, velocity and variety to require specific technology and analytical methods for its transformation into value" has a symbiotic relationship with the technology innovation on which it relies; the term big data often conflates the actual physical data with the unique technologies required to use it (12, 13).

In patient-specific healthcare, big data technology has helped enable greater scales of volume, variety and velocity (14, 15). Usable data *volume* has significantly increased in areas such as genomics (16, 17), molecular research (18, 19), medical image mining (20), and population health (21, 22). Enabling a *variety* of data to be integrated, for a more complete view of patient or population, has occurred in areas including air quality (23, 24), wearables (25, 26), patient generated content *via* the web (27), patient or physician movement (28, 29), medical studies (30), and critical care (31). Big data enabling increased *velocity* in healthcare was one of the earliest uses, in areas such as clinical prediction (32, 33), and diagnostics (15, 33). Current examples and future vision for use of big data exists in multiple and varying pathologies, including cancer (34), cardiology (35), epilepsy (36), family medicine (37), gastroenterology (38), nursing (39), pediatric ophthalmology (40), psychiatry (41, 42), and women's health (43) as examples.

Barrett et al. state succinctly: "Big data can play a key role in both research and intervention activities and accelerate progress in disease prevention and population health" (44). Big data shows utility across the entire spectrum of public health disciplines. This capability ranges from "monitoring population health in real-time" to building "definitive extents and databases on the occurrence of many diseases" (45). Public health subject areas that include examples of the use of big data include community health (46), environmental health science (24, 47), epidemiology (48), infectious disease (45), maternal and child health (49), occupational health and safety (50), and nutrition (51). There is optimism and evidence for big data's value in public health, both in research and in intervention (52).

### BIG DATA IN PRECISION PUBLIC HEALTH

Today, use of big data has been shown to improve precision in select disciplines of public health. These areas include performing disease surveillance and signal detection (53, 54), predicting risk (55, 56), targeting interventions (6), and understanding disease (57). Research and proofs-of-concept with this data for these applications have been performed around the world. With the pace of technology innovation, and the speed at which precision health practitioners have embraced big data, there will likely be more public health disciplines, practices, approaches, and interventions implemented in the future or that are beyond the scope of this article (58, 59).

# PERFORMING DISEASE SURVEILLANCE AND SIGNAL DETECTION

Disease surveillance and signal detection are among the most commonly cited and revolutionary of the big data use cases in precision public health (45, 60–62). Precision signal detection or disease surveillance using big data has shown efficacy in air pollution (23, 24), antibiotic resistance (63), cholera (64), dengue (65, 66), drowning (67), drug safety (68, 69), electromagnetic field exposure (70), Influenza A H1N1 (71), Lyme disease (72), monitoring food intake (73), and whooping cough (74).

Disease surveillance often includes tracking affected individuals, i.e., human carriers, patients, or victims (75). Stoddard et al. stated in 2009: "Human movement is a critical, understudied behavioral component underlying the transmission dynamics of many vector-borne pathogens" (76). In the effort to track disease spread by human vectors, a premium is placed on information that is more recent and granular (77, 78). Thus, access to huge volumes of streaming real-time data generated by humans seems at once an ideal signal repository for identifying and tracking affected individuals, and definitionally big data (78).

Indeed, big data supports alternate and in some ways superior methods to track affected individuals (45, 62). Because affected individuals move so quickly and at such a wide range, the realtime capabilities of big data and big data technology are now critical in this discipline (79, 80). Studies have shown efficacy using mobile phone data in tracking movement in cholera (81), dengue (82), Ebola (83), human immunodeficiency virus (HIV) (84), malaria (85), rubella (85), and schistosomiasis (86). Other mechanisms that have shown efficacy or promise in tracking movement of affected individuals include air travel data (87), GPS data-loggers (88), magnetometers (89), Twitter (71), and web searches (65).

### PREDICTING RISK

Effective signal detection often leads to attempts to predict future signals (90, 91). Predicting public health risk leads to a chance to implement preventive interventions (56, 92). Models predicting either disease spread or outcomes, using traditional or non-big data sources, have been developed across the spectrum of public health crises, including dengue (93), HIV (94), influenza (95), malaria (96), Rift Valley Fever (97), and tuberculosis (98).

One early example of using big data for public health prediction, Google Flu Trends, was a well-publicized failure (99). Since that episode, approaches to predicting risk using the internet and social media have shown special care to include merging big data with non-social media data sources, avoid overfitting models with relatively few cases, and being conscious of the risks of big data (56, 100).

Big data has been used for risk prediction of spread or outcomes in public health topics such as air pollution (101), antibiotic resistance (102), avian influenza A (103), blood lead levels (104), child abuse (49), diabetes (105), Ebola (106), HIV (107), malaria (108), gestational diabetes (109), smoking progression (110), West Nile (111), and Zika (86, 112, 113).

### TARGETING TREATMENT INTERVENTIONS

Applying treatment interventions to homogeneous cohorts within a larger heterogeneous population has been advocated since Lalonde's seminal report "A New Perspective on the Health of Canadians" in 1974 (114). Historical examples of adding precision to public health treatment populations include gonorrhea in the 1980s (115), HIV in the 1990s (116), breast cancer in the 2000s (117), and malaria in the 2010s (118). In 2010, the US Department of Health and Human Services said of those citizens with multiple chronic conditions: "Indeed, developing means for determining homogeneous subgroups among this heterogeneous population is viewed as an important step in the effort to improve the health status of the total population" (119).

Big data was leveraged in public health research identifying finer-grain treatment interventions in childhood asthma (120), childhood obesity (121), diarrhea (122), Hepatitis C (123), HIV (124), injectable drug use (125), malaria (126), opioid medication misuse (127), use of smokeless tobacco (128), and the Zika virus (129).

One clinical example at the intersection of identifying subpopulations for effective interventions and big data is personalized vaccinology or "vaccinomics" (130). Most vaccines today are applied in a one-size fits all model: the typical implementation assumes a homogenous population, uses the same vaccine and dosages for all patients, ignores replicated, empirical realities of a heterogeneous population, and does not use sophisticated genomic capabilities at hand (131, 132). While today's vaccines are applied homogeneously, the results are individual: "The response to a vaccine is the cumulative result of non-random interactions with host genes, epigenetic phenomena, metagenomics and the microbiome, gene dominance, complementarity, epistasis, coinfections, and other factors" (133). Vaccinomics would focus on homogeneous subpopulations treated with vaccines, dosages and approaches that would "hold the promise of moving away from one standard vaccine against all human populations…to one where vaccines can be relatively easily tailor-fitted to individual, community and population specificity" (134).

### UNDERSTANDING DISEASE

Data volume and variety in epidemiology have grown consistently over time well before the age of big data (135–137). Contemporary exponential increases in data sizes, and perhaps more importantly increases in variety of data sources, make big data a valuable addition to the epidemiologist's toolkit (64, 138). Glymour states "We recommend that social epidemiologists take advantage of recent revolutionary improvements in data availability and computing power to examine new hypotheses and expand our repertoire of study designs" (139). Big data may have added relevance in study designs that are patient-centric and precision-oriented (140).

"Person-oriented approaches, in contrast, focus on differences between individuals as characterized by configurations and patterns of variables. This is well in line with a precision-medicine approach to understanding disease risk, resilience, and treatment response in subpopulations of individuals" (140).

Big data is a component in studies that have shown new precision characteristics of such public health concerns as cholera (141), chikungunya (142), diabetes (143, 144), diarrhea (145), heatwave (146), influenza (147), opioid epidemic (148, 149), preterm birth (150), stunting (151), and Zika (152).

**Table 1** summarizes the public health crises cited previously for which exists peer-reviewed research in at least two of the four precision public health disciplines. While the precision health research in **Table 1** and in this article has peer-reviewed and exhaustive methods, there are some opportunity gaps that future research should consider and include. **Table 2** lists critical gaps that occasionally exist in the research, grouped by precision public health discipline.

# CONTRIBUTIONS OF BIG DATA

Big data offers special contributions to precision public health in enabling a wider view of health variables through linking disparate or novel data (44, 153, 154) and enabling large study populations with volumes of multiomic data to identify "molecular cohorts" (155).

The technologies behind big data make it much easier to integrate a variety of data within a study (156). For example, because big data does not require investment in an *a priori* data

TABLE 1 | Precision public health research leveraging big data.


*Research studies (by citation) applying precision with the help of big data to a public health crisis. Public health crises are only included if big data in precision public health examples exist in more than one precision public health discipline.*

### TABLE 2 | Potential gaps in research methods in precision public health using big data.


*Critical features sometimes missing from precision public health studies leveraging big data, shown by public health discipline type.*

schema, users can bring together a variety of different data and link it when the analytics are created (157). This enables researchers to link a mélange of unstructured disease and outcome data (158, 159). In their 2017 study, Harry Hemingway, in their completion of 33 studies using linked data with a total population of two million patients, said "Our findings clearly show that research using one of the NHS greatest assets—its data—is vital to innovate improvements in disease prevention, to make earlier diagnoses and to give the best treatments" (160). The inclusion of data variety increases the number of independent variables; one novel variable—or a combination of as yet uncompared variables—could end up being significant in defining relevant precision subpopulations (161, 162).

Examples of data that has been linked to help identify more precise cohorts of populations include: longitudinal health claims data (163, 164); secondary use anonymized electronic health records (159, 165); cohort studies, health surveys, and registries (166–168); environmental variables (104); molecular data such as from the genome, exposome, microbiome, or transcriptome (169–172); "mhealth" wearable and sensor data (173); mobile phone sensing data and self-reports (174); online patient generated content (175); and the semantic web (176).

The explosion of new volumes of genomic "big data" helped make possible the precision medicine movement (177). One of precision medicine's promises was to lead to development of new treatments for subpopulations defined by their similarities at the molecular level (178, 179). Currently, translational efforts in precision medicine often work by identifying cohorts of patients who have or lack specific genomic or molecular biomarkers (132, 180). Since today's precision medicine works at the granularity of disease subtypes and population strata and not at the "n of one" level, contemporary precision medicine really is—when applied to community crises—an example of precision public health (2).

Researchers agree that only by using very large sample sizes will genomic studies have the proper statistical power (181, 182). "These large case–control studies are essential for boosting the statistical power needed to detect the genetic variants responsible for rare diseases and can provide the necessary knowledge for use in the clinical setting," (183). Big data has been a necessary component in the scale-up of genomic sample sizes, enabled by the decrease in cost of gene sequencing (183). Future versions of sovereign genomics programs in over ten countries have the potential to create data sets with millions of samples (184–186). These databases should be ideal platforms for research such as genome wide association studies, which have been used with over ten thousand cases per study in public health diseases such as Alzheimer's disease (25,000+ cases), autism (16,000 cases), high blood pressure (200,000+ cases), posttraumatic stress disorder (10,000+ cases), and smoking (50,000+ cases) (187–191).

The most sophisticated precision approaches to public health today at once include data from multiple omic disciplines, can make use of linked phenotype data, and leverage novel or recent types of computation (7, 132, 192, 193). In targeting interventions, *de novo* or improved computational methods like geospatial risk modeling, latent class modeling, social molecular pathological epidemiology, and agent-based modeling simulation all benefit from big data to better identify these "intermediate" subpopulations (49, 122, 126, 193–196).

### RISKS

More work needs to be done both enumerating and evaluating the risks and challenges of using big data in precision public health.


# CONCLUSION

Precision public health is exciting. Today's public health programs can achieve new levels of speed and accuracy not plausible a decade ago. Adding precision to many parts of public health engagement has led and will lead to tangible benefits. Precision can enable public health programs to maintain the same efficacy while decreasing costs, or hold costs constant while delivering better, smarter, faster, and different education, cures and interventions, saving lives.

Precision public health does not require big data. That said, the future of big data in precision public health is assured, based on its successes and acceleration of use to date. Big data and the methods created to make it useful allow precision public health practitioners to operate at the top of their license and can bring more insight to cohort membership, disease pathways and treatments. Big data enables lower costs and more precision to find, educate, track, and help each high-risk citizen. In the future, precision public health needs, imperatives, mandates and techniques will drive new capabilities into big data.

Using big data in precision public health has risks. A number of risks were identified here and future study will expand these or identify more. Protecting the dignity, privacy, security of citizens and patients, while finding truly meaningful significant outcomes in a reasonable timeframe will take effort on the part of each and every researcher in this space.

What are the calls to action? Investment has increased, but additional investment and research are needed in many areas. First, more experimentation is needed to understand how to best create and mobilize open data, open science, open source communities, and open collaboration platforms. For context, the Observational Health Data Sciences and Informatics collaborative is a thriving global open science community focused on large scale population health outcomes and prediction. If such a collaborative existed for precision public health, one imagines practitioners could leverage shared best practices, data, open software, and opportunities. Second, there are opportunity gaps in training precision public health workers in countries with a dearth of data scientists, on-premise data storage and computational assets, or access to big data. For example, communities suffering public health crises increasingly desire to "learn how to use the information and improve their ability to respond to future outbreaks in the region," rather than having their data removed for analysis by better funded nations (212). Third, follow-on research is needed in the area of big data in precision public health. Specifically, (a) best practices in performing data quality assessment along a broad range of attributes should be enumerated, (b) existing research should be scored along these attributes as well as those studies' compliance with statistical best practices specific to big data and high dimensionality, (c) each area of value delivery—disease surveillance, predicting risk, targeting intervention and understanding disease—needs their own full treatment with regard to methods, data sources, data management, and more, (d) some critical framework ought to be created and proposed to systematically measure precision

### REFERENCES


public health studies and programs, specific to and beyond big data, and (e) as precision public health becomes more mature, emerging trends should be noticed and evaluated. Fourth, more work is needed in areas of ethics, risk, and governance. The community should be watching for overreliance on big data-driven approaches that lead to decreases in radical wholepopulation solutions that increase baseline health norms. Fifth, the global economic opportunity of using big data prescriptively in public health has not been systematically measured, beyond specific country or disease successes. For context, organizations such as the United Nations, the World Bank, and the United States Agency for International Development have estimated economic impacts of individual epidemics. These or other institutions could convene a task force to estimate the economic benefit of applying precision to public health responses, as well as the relative contribution of big data. Sixth, precision public health centers of excellence in universities can help. Today, leaders in schools of public health are speaking and writing about precision public health; presumably academic courses, concentrations and centers will follow in stepwise progression. Seventh, new technical innovation must continue and needs investment. For example, this could include applying deep learning to precision public health use cases, or creating a novel free and open source data science software "pipeline" for geospatial event prediction.

Future precision public health will be transformative. It will include new applications, modifications, and uses of today's assets, including social media and communication platforms, unmanned aerial vehicles, mobile applications, mobile sequencing, selfscreening, sensors, vaccine or drug internet-of-things inventions, and more. Tomorrow, we could be looking up, wondering if a high-resolution satellite is mapping our neighborhood to predict the path of an infectious disease, or if a drone is approaching with a targeted intervention. With future applications of precision public health and the speed of big data adoption, tomorrow's new public health students and young practitioners soon won't think of the discipline as precision public health. They will only think of it as public health.

### AUTHOR CONTRIBUTIONS

The author confirms being the sole contributor of this work and approved it for publication.


Available from: http://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST. SP.1500-1.pdf


and output on social media. *PLoS One* (2017) 12(2):e0168893. doi:10.1371/ journal.pone.0168893


gene-smoking interactions. *Circulation* (2017) 135(24):2336–53. doi:10.1161/ CIRCULATIONAHA.116.022069


pervasive healthcare. *IEEE J Biomed Health Inform* (2017) 21(1):218–27. doi:10.1109/JBHI.2015.2483902


**Conflict of Interest Statement:** The author is employed by Cloudera, Inc., a provider of big data technology.

*Copyright © 2018 Dolley. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Applying Precision Public Health to Prevent Preterm Birth

*John P. Newnham1,2\*, Matthew W. Kemp1 , Scott W. White1,2, Catherine A. Arrese1 , Roger J. Hart1 and Jeffrey A. Keelan1*

*1School of Women's and Infants' Health, The University of Western Australia, Crawley, WA, Australia, 2Department of Maternal Fetal Medicine, King Edward Memorial Hospital, Subiaco, WA, Australia*

Preterm birth (PTB) is one of the major health-care challenges of our time. Being born too early is associated with major risks to the child with potential for serious consequences in terms of life-long disability and health-care costs. Discovering how to prevent PTB needs to be one of our greatest priorities. Recent advances have provided hope that a percentage of cases known to be related to risk factors may be amenable to prevention; but the majority of cases remain of unknown cause, and there is little chance of prevention. Applying the principle of precision public health may offer opportunities previously unavailable. Presented in this article are ideas that may improve our abilities in the fields of studying the effects of migration and of populations in transition, public health programs, tobacco control, routine measurement of length of the cervix in mid-pregnancy by ultrasound imaging, prevention of non-medically indicated late PTB, identification of pregnant women for whom treatment of vaginal infection may be of benefit, and screening by genetics and other "omics." Opening new research in these fields, and viewing these clinical problems through a prism of precision public health, may produce benefits that will affect the lives of large numbers of people.

### *Edited by:*

*Paul Russell Ward, Flinders University, Australia*

### *Reviewed by:*

*Annette Regan, Curtin University, Australia Gareth Baynam, Western Australian Register of Developmental Anomalies; University of Western Australia and Murdoch University, Australia*

### *\*Correspondence:*

*John P. Newnham john.newnham@uwa.edu.au*

### *Specialty section:*

*This article was submitted to Public Health Policy, a section of the journal Frontiers in Public Health*

*Received: 16 December 2016 Accepted: 17 March 2017 Published: 04 April 2017*

### *Citation:*

*Newnham JP, Kemp MW, White SW, Arrese CA, Hart RJ and Keelan JA (2017) Applying Precision Public Health to Prevent Preterm Birth. Front. Public Health 5:66. doi: 10.3389/fpubh.2017.00066*

Keywords: preterm birth, cervix length, progesterone, smoking, genomics, proteomics, transcriptomics, screening

# INTRODUCTION

Preterm birth (PTB) is the single major cause of death in children up to 5 years of age in the developed world (1–3). In low-resource countries, early birth is the second greatest cause of death in young children, second only to pneumonia. Most children born too early will survive and go on to lead a normal and healthy life, but for many there will be life-long disability. Health issues may include neurodevelopmental delay, hearing and visual loss, cerebral palsy, and learning and behavioral problems. The potential impact on individuals, families, and society are considerable (4, 5).

Providing optimal care for very preterm infants in dedicated neonatal intensive care units is vital to minimize any potential for life-long harm, but such care comes at considerable financial cost. The potential costs to society in terms of lost productivity throughout the lifespan may be even greater (6). Considerable benefit has arisen from the discovery in 1972 that administration of corticosteroids to the mother at risk of early birth will halve the rate of death and respiratory distress syndrome in the preterm newborn, but the treatment does not in itself delay the age at birth (7, 8).

Preterm birth is defined as birth before 37 and after 20 completed weeks of gestation. There are many potential pathways to this complication of pregnancy (9). At all gestational ages, and especially the very early ages, inflammation in the pregnant uterus, either due to infectious or non-infectious causes, is commonly associated with preterm deliveries (10). For cases where the age is closer to term, a major concern is medical intervention which at times may not be medically indicated. In other cases, the delivery may be expedited to prevent stillbirth or maternal morbidity, such as for preeclampsia, fetal growth restriction, or diabetes. In half of all cases of PTB, labor commences spontaneously for no known reason or the membranes rupture unexpectedly leading later to the birth. These various phenotypes are diverse and poorly classified, reflecting our incomplete understanding of their various causal pathways and mechanisms.

The impetus to discover strategies by which PTB rates may be lowered has been driven at least in part by advances in neonatal care resulting in more babies surviving at lower gestational ages at birth, but often at high cost in human and financial terms. Starting in the 1960s, much attention was given to developing and refining tocolytic drugs, which are therapeutic agents aiming to inhibit uterine contractions and hence prevent early birth. These drugs may have some usefulness in delaying PTB by hours or a few days, but do not extend pregnancies to gestational ages that will lower the rate of PTB (11).

### SUCCESS SO FAR

There has been success in lowering PTB rates to some extent in some environments (5). In USA, the rising rates of non-medically indicated late preterm and early-term birth, peaking in 2007, led to the launch of several quality assurance programs aiming to prevent unnecessary early births (6). These programs have resulted in successfully lowering the rates of late PTB in those hospitals and regions that were targeted. In part, as a result of these programs, the rate of PTB in USA fell for 8 years in a row, but has now been reported in late 2016 to have increased again from 9.57 to 9.63% (12).

The first whole of population and whole of geographic region PTB prevention initiative was recently reported for the state of Western Australia (13). Six strategies were applied: administration of progesterone based on prior history of a PTB or the finding of a shortened cervix measured routinely in mid-pregnancy on ultrasound examination, appropriate use of cervical cerclage, avoidance of non-medically indicated induction of labor or Cesarean section, avoidance of exposure to cigarette smoke, judicious use of fertility treatments, and a dedicated PTB prevention clinic at the state's sole tertiary level perinatal center for referral of cases at highest risk. Implementation involved a state-wide outreach program aiming to ensure that all obstetricians, general practitioners, midwives, and ultrasound imaging specialists had training and expertise in the various aspects of the program. Women and their families were made aware of the strategies through print and social media. The program overall was badged as thewholeninemonths™. After the first full 12 months of implementation, the state-wide rate of PTB had reduced by 7.6% when compared with the years prior to initiation. Statistical modeling estimated that approximately 200 preterm births had been prevented with avoidance of more than 40 in the <32-week gestational age group. Analysis by run charts indicated the rate of late PTB had decreased rapidly, suggesting an effect of educational programs aiming to discourage practitioners and women from unnecessary early intervention. A more delayed effect was observed in reducing births in the 28–31 week category, possibly reflecting use of cervix length screening, administration of progesterone, and surgical cerclage. The benefits extended across the gestational age spectrum from 28–31 weeks onward, although any effect at ages before that time was not statistically significant possibly due to low numbers. An even greater effect in reducing the overall PTB rate was observed within the tertiary level center itself, where awareness among practitioners and pregnant women may have been greater. Together, these results indicate that a comprehensive and multi-faceted geographic-based PTB prevention program in a relatively high resource setting can significantly reduce the rate of PTB using existing knowledge, with an effect of 7–8%.

The magnitude of this reduction in PTB rates recently observed in the Western Australian program is generally consistent with a previous estimate of the effect that could result from effective implementation of known strategies. In an analysis of the potential reduction in preterm births for countries with a very high human development index conducted by the Boston Consulting Group and published in 2013, it was estimated conservatively that the combined impact of implementing known strategies may be a reduction in rate of 5% (14).

We are now left with two great challenges. First is to explore how the array of known strategies can be applied effectively across other population groups. Second is to discover new strategies by which our modest success so far can be expanded. One possibility may be to apply the principles of Precision Public Health to the field of PTB prevention.

### PRECISION PUBLIC HEALTH

Precision public health has arisen from the emerging field of precision medicine (15, 16). Contemporary clinical practice is built upon practice guidelines that are developed and refined by the principles of evidence-based medicine. Typically, such guidelines are applied to individuals with specified symptoms or signs or abnormalities in laboratory or imaging tests. There is much greater potential benefit if the case selection could be refined by tests predicting that a given treatment is likely to be most effective for a given individual with a given disease. As an example, the drug crizotinib has been shown to be much more effective for the treatment of non-small cell lung cancer if the patient has a particular chromosomal translocation involving the gene encoding ALK that drives tumor growth (17).

But there is far more potential to the concept of precision medicine than just "drugs, genes, and disease." Rather than merely targeting individuals by identifying specific phenotypic or genotypic characteristics, a population approach is possibly the next logical step. Identifying population groups rather than just individuals may yield great benefits. With such an approach, development of precision public health may enable us to benefit the right population at the right time.

Using a population-level approach, the massive amount of data currently being generated from multiple sources could be harnessed for the purpose. Such data collection systems include genomic, transcriptomic and microbiome analysis, and lifestyle characteristics measurable by electronic machines worn or carried by individuals. We also need to expand our capabilities in harnessing data that describe the social and environmental determinants of our health. Clinicians have traditionally incorporated informal information on these determinants into their clinical decision-making, but never before have we had access to data that can provide an accurate description of such factors. We are no longer lacking in information. Our challenge is how to embrace these massive data sources and use the information to identify the right populations at the right time for the right treatment.

The purpose of this article is to explore ways in which the principles of precision public health could be applied to the field of PTB prevention. Different scenarios will be discussed covering some of the strategies that are currently being employed for this task, and we will explore some of the possible avenues by which the principles of precision public health may be incorporated. The concept and examples are illustrated diagrammatically in **Figure 1**.

# NATIONAL DIFFERENCES, POPULATIONS IN TRANSITION AND HEALTH INEQUALITIES

There are many lessons to be learnt from studying the rates of PTB in various nations, how these rates change with time, and the effects of migration.

The PTB rate varies markedly between different countries. In a study of national estimates of PTB rates in 2010 and published in 2013, rates ranged from 5.3 per 100 live births in Latvia to 14.7 per 100 live births in Cyprus (14). Northern European countries had very low rates ranging from 5.5 per 100 live births in Finland to 5.9 in Sweden and 6.0 in Norway. More southern European nations had slightly higher rates with 6.5 in Italy, 6.7 in France, and 8.0 in the Netherlands. In contrast, the rate in USA was 12.0 per 100 live births.

Different migrant subgroups have different reproductive responses to migration (18). Black and Hispanic migrant women are less likely to deliver preterm than US-born Blacks and Hispanics, but the effect does not extend to White and Asian migrants. The duration of residence in the new country also contributes greatly. In a study of migrants to Canada, recent immigrants of less than 5 years had a lower risk of PTB than non-immigrants; but after 15 years or more, the protective effect was reversed and a higher

FIGURE 1 | The application of precision population health principles to preterm birth (PTB) prevention. Primary population screening using a number of known epidemiological, social, and obstetric factors can identify women with increased risk of PTB but has poor predictive value. In the three examples illustrated, women identified with a specific risk factor are then further evaluated using precision screening, aided by the application of various "omic" technologies, to identify those at greater risk of PTB associated with specific etiologies. This precision risk profile is then used to target treatment and prevention strategies to those who are most likely to benefit. Women who are unlikely to benefit from treatment are also identified through this process. This approach will enhance efficacy, reduce unnecessary interventions, and make optimal use of clinical resources. Abbreviations: BMI, body mass index; BV, bacterial vaginosis; Cx, cervix; fFN, fetal fibronectin; PTB, preterm birth.

risk of PTB than in non-immigrants was observed. No such effect was found on birth weight relative to gestational age (19). Chinese women living in Jiangsu Province were found to have a much lower rate of PTB than China-born women living in Hong Kong and Western Australia (20). In China-born women living in Western Australia, the ability to be fluent in English and no longer need a translator was associated with a doubling in risk of the pregnancy ending preterm, suggesting that environment was overriding genetics in terms of PTB risk.

The wide variation in PTB rates between countries, and the changes observed with migration, provides strong evidence for environmental contributors and clues to the magnitude of effect that may be amenable to change in any prevention strategy. But what factors associated with migration are operating to decrease and increase PTB rates, and how might this information be exploited in a precision public health approach? Large amounts of data are now available in various government agencies on people as they either reside in their country of origin or migrate. Details are also available on when they moved, at what age, and the circumstances under which they re-located. It seems likely that harnessing these databases would add greatly to our ability to identify women at risk of PTB and enable us to devise public health strategies that may mitigate this risk. Much of the information of course is collected for other purposes and the challenge now is to bridge the gap with the many government and private agencies that generate and control such data bases.

Geographic information systems (GIS) employ sophisticated software and hardware platforms to analyze and collate data on geospatial distribution of disease incidence, risks, and health outcomes (21–23). While the technology and its applications are in their infancy, studies have shown that GIS is a useful modality for guiding public health policy and gaining high precision data on population risks, comorbidities, changes in incidence rates, effectiveness of treatment programs, and socioeconomic factors associated with risk (24–26). As far as we are aware, this approach has not yet been explored for PTB prevention, but there are clearly opportunities to be exploited here for improved and innovative public health PTB prevention initiatives.

A variety of population-based PTB prevention programs are now underway, each targeting the needs of their own communities and aiming to overcome deficiencies that may be contributing to high rates of early birth. In USA, several programs have been launched aiming to overcome health inequalities. Much of this work has resulted from awareness of the very different rates of PTB among the various racial groups. In 2014, the rates ranged from 13.2% in non-Hispanic Black women through to 9.4% in Hispanics and 8.9% in non-Hispanic Whites (27, 28).

In Kentucky, one such program named "Healthy Babies Are Worth The Wait" has focused on improving access to antenatal care by incorporating a range of modifiable factors including group care (6, 28). When compared with surrounding states, there has been evidence of improved rates of PTB.

Central to the high rates of PTB in women of poor socioeconomic and educational standards may be factors operating in their neighborhoods and lifestyles that produce chronic stress and a sense of alienation (6, 28). Living in environments of high crime rates, low wages, lack of employment opportunities, sub-standard services, and feeling excluded from society is associated with higher rates of PTB. Understanding the pathways by which these multiple factors lead to pregnancy complications and other adverse health outcomes will require a much broader view of an individual's life than we have previously appreciated. Tackling these considerable challenges may be assisted by access to large data bases and an understanding of the importance of the entire life-course in health and disease, as well as gaining a better understanding of how inter-generational effects, such as history of slavery and disempowerment may leave enduring effects on people and their communities. Much work is left to be done, but applying the principles of Precision public health may enable progress that has so far remained elusive.

# TOBACCO EXPOSURE AND PTB

Exposure to tobacco products, either directly or from environmental sources, is recognized as a significant threat to human health and a significant cause of PTB. The importance of controlling tobacco exposure during pregnancy is underscored by tobacco exposure being designated the single largest preventable risk factor for non-communicable human disease (29). Data published by the World Health Organization show that some 6% of all female and 12% of all male deaths are attributable to tobacco use (30). By 2020, the WHO projects that 7.5 million people will die from direct and indirect exposure to tobacco smoke (30). Data drawn from populations in the United States, Denmark, Sweden, and Canada (i.e., broadly high-income economies) suggest that fewer than 50% of women who smoke cease smoking during pregnancy; Swedish data from 2000 suggest that 13% of women in that country smoked during pregnancy, with smoking persistence more likely in those women with a lower level of educational attainment (31). Accordingly, tobacco control and PTB prevention constitute tightly intertwined and hugely important global public health challenges.

### Tobacco Use and Pregnancy

The greatly elevated risks of lung cancer, chronic respiratory disease, and cardiovascular disease [71%, 42%, and 10% of total incidence, respectively (30)] associated with smoking are well appreciated by both the medical community and the general public; however, the risk of adverse pregnancy outcomes associated with maternal tobacco exposure, and the benefits of smoking cessation prior to or early in pregnancy, are equally profound (32–34). Although protective for preeclampsia (35, 36), smoking during pregnancy is causally associated with fetal growth restriction, placental abruption (37), PTB (38, 39), and sudden infant death syndrome (40). At least one-third of all cases of fetal growth restriction in developed countries is attributable to the effects of maternal tobacco use (41). There are also data to suggest that fetal exposure to many chemicals in tobacco smoke is associated with a host of childhood developmental abnormalities, including subnormal weight gain (42) and neuro-behavioral disorders, such as attention deficit hyperactivity disorder and deficits in auditory and cognitive ability (43).

A 1990 report by the US Surgeon General concluded that "women who stop smoking before pregnancy or during the first 3–4 months of pregnancy reduce their risk of having a low birth weight baby to that of a woman who never smoked" (44). Perhaps, the most important element of this report, in keeping with seminal studies into smoking-related mortality among UK doctors by Doll and colleagues, is that the adverse effects of tobacco exposure on pregnancy are not immutable and can be reduced or even entirely abrogated by timely smoking cessation (45). Numerous subsequent studies, both experimental (46) and epidemiological (47–49), have demonstrated both the significant potential harm to pregnancy and the developing fetus caused by maternal tobacco smoke exposure and, promisingly, the profound benefits to both that may be gained from effective tobacco control.

It is now well established that PTB is a complex syndrome, in many cases likely patient and/or population specific, and a syndrome for which very few preventative interventions exist (9). Of the limited armamentarium presently available to the medical and public health communities, tobacco control is among the most uniformly effective interventions. A meta-analysis of the effects of implementing smoke-free legislation in North America and Europe reported a statistically significant 10% reduction in PTB [1,366,862 cases; −10.4% (95% CI −18.8 to −2.0); *p* = 0.016] (48). Similar benefits have been reported in response to tobacco control measures adopted in Switzerland, with benefits to population pregnancy health correlating with the extent of tobacco control achieved in a particular canton (50). Recent modeling by Levy and colleagues identified that a \$2/pack cost increase in cigarette excise tax, combined with cessation programs, health warnings, and public smoking bans would deliver a 33.5% reduction in smoking prevalence among US women aged 15–49 by 2065; relative to maintaining the *status quo*, such policy measures would deliver 227,300 fewer (132,600–302,300) low birth weight infants and 351,000 (137,100–501,300) fewer preterm births over the same period (51).

### Controlling Tobacco Exposure to Prevent PTB *via* Precision Public Health: A Nuanced, Multilevel Approach

Reducing the adverse impact of tobacco exposure on pregnancy outcomes likely requires a combination of national-level initiatives, supported by complementary, culturally, and contextually appropriate programs targeted at specific communities and individual patients. Given the substantial body of data linking tobacco exposure to human disease generally (29, 45), preventive programs based around national public education and legislative control (52) (high levels of tobacco excise tax, comprehensive advertising bans, prohibition of tobacco use in public places, public education campaigns) are clearly warranted and, when wellexecuted, demonstrate marked reductions in population-level tobacco use coincident with significant reductions in population PTB rates (47, 48, 50). Such programs, targeting both males and females, are important as there is very good evidence that both direct maternal tobacco use and environmental exposure in the home, workplace, or public transport, convey sizeable risks to pregnancy health. In addition to the risks of second-hand smoke, data from the Generation R study in the Netherlands also suggest that, in a number of communities, partner smoking is associated with smoking during pregnancy (53).

Unfortunately, countries employing comprehensive, multifaceted tobacco control measures at a national level remain in the minority. Many of the countries with sub-optimal tobacco control measures are the same low- and middle-income countries that are home to 80% of the world's smokers (31) and report some of the highest rates of PTB in the world (2, 4, 54). Even in countries with effective national-level controls, there are marked differences in the effectiveness of national-level tobacco control measures between communities (53, 55), seemingly contingent on a host of socioeconomic factors. In addition to determining maternal tobacco use history and environmental exposure (especially in the home), a precision public health approach to preventing tobaccoassociated PTB would likely require assessment of maternal and/ or community factors, including (but not limited to), ethnicity, education, poverty, the ability to access social support networks, and the ability to access health-care services.

Ethnicity should be accounted for when designing pregnancy smoking risk assessments and interventions. An analysis of smoking habits in the Netherlands as part of the Generation R study showed significant differences in smoking rates between women of Turkish (43.7%), Dutch (24.1%), and Moroccan (7.0%) ethnicities; moreover, women of Turkish and Moroccan ethnicity were more likely to continue smoking during pregnancy (72.0 and 70.6%, respectively) than women of Dutch ethnicity (58.6%) (53). In South Australia, data collected during the 1990s revealed a marked difference in smoking rates between Aboriginal and non-Aboriginal women (57.8 vs. 24.0%) at their first antenatal visit (56). A similar disparity is evident in more recent data collected in Queensland between 2005 and 2006, with 54% of Indigenous and 19% of non-Indigenous women reporting smoking during pregnancy (55). In the study, adjusted pregnancy outcomes in non-smoking Indigenous and non-Indigenous women were almost equivalent, underscoring the profound impact of smoking on pregnancy and the potential benefits to be gained from effective cessation support (55).

A mother's level of educational attainment and financial situation are known to be important predictors of peri-conceptual smoking habits and the ability to quit smoking once pregnancy is established (31, 57). Hibbs and colleagues undertook a crosssectional study of PTB rates among women residing in lowincome urban areas in Illinois; the findings of this study showed that impoverished African-American women who smoked exhibited a pattern of "weathering" with advanced maternal age associated with increased rate of PTB [25.2% among 30- to 35-year-old women vs. 17.9% for teenagers; RR = 1.5 (1.1–2.0)]. Interestingly, the authors reported that impoverished White mothers (irrespective of smoking status) and non-smoking African-American mothers did not exhibit a similar pattern of increased rates of PTB with increasing maternal age. The authors concluded that their findings underscore the "potential public health benefit of cigarette smoking cessation programs aimed at the most economically disadvantaged African-American women" (57).

The level of social support and community integration available to a mother is believed to play an important role in pregnancy outcomes and may impact behaviors, such as smoking, during pregnancy. Indeed, smoking has been reported as one means by which women cope with stress in adverse domestic situations, including poverty and disadvantage. In a Turkish study, Ergin et al. reported that young age (<20), low education level, and migrant status were associated with smoking during pregnancy (58). Similarly, in a German cohort, Elsenbruch and colleagues reported that a lack of social support in pregnancy is a key risk-factor for adverse outcomes and that women categorized as having low social support during pregnancy were far more likely (34 vs. 17%) to self-report smoking during the first trimester of pregnancy than women categorized as having high levels of social support (59). These findings mirror earlier findings summarized by Dejin-Karlsson and colleagues, wherein women with strong social support networks are more successful at successfully quitting smoking, and that an absence of support networks is one reason why some pregnant women continue to smoke (60).

Given the relationships between PTB and inadequate antenatal care, economic disadvantage, low-educational attainment, and ethnicity, a number of investigators have recommended the use of community outreach programs or clinics involving multiple disciplines sensitive to the particular needs of the community in question (61). Interventions based on nuanced patient risk assessments that consider individual and community-level factors may thus benefit attempts to reduce the rate of tobaccoassociated PTB, and should be considered in any precision public health approach to PTB prevention. Risk assessments could be used to design and deliver targeted, culturally and contextually appropriate interventions at community (i.e., comprehensive antismoking programs and cessation support through community, religious, sporting and cultural groups, supported by community leaders) and individual (psychotherapy and pharmacotherapy for smoking cessation, midwife outreach programs, access to social support and mental health services, assisted transport to access health-care services) levels.

### POPULATION-BASED CERVIX LENGTH SCREENING

While there are many pathways leading to PTB, some cases may be predicted by measurement of the length of the cervix in mid-pregnancy (62–64). This measurement may be done as a component of the standard mid-pregnancy morphology scan typically conducted between 18 and 24 weeks gestation.

If the cervix appears shortened on trans-abdominal scan, a transvaginal scan is then recommended. For pregnancies where the risk of PTB is perceived to be increased, the standard of practice is generally to proceed to transvaginal scan to measure the cervix length.

The finding of a shortened cervix in mid-pregnancy is predictive of PTB, and there are treatments that reduce that risk, at least for singleton pregnancies. Administration of natural progesterone vaginally as a single evening dose has been shown to nearly halve the risk of PTB in such circumstances. This benefit has been shown in large randomized controlled trials and also by meta-analysis of individual patient data (65–67).

So, we now have strong evidence that the risk of PTB can be reduced dramatically, perhaps by about half, in women found in mid-pregnancy to have a short cervix and then using that information to prescribe natural vaginal progesterone which is a relatively simple and safe treatment. Should the test and its subsequent treatment be applied to the entire population of pregnant women?

In the largest American randomized controlled trial, Hassan and colleagues screened 32,091 women by transvaginal scan to identify 733 with a cervix shortened and measuring between 10 and 20 mm (65). Two hundred thirty-six were randomized to the vaginal progesterone treatment group, 229 to placebo and 268 declined to participate further. The effect was to nearly halve the rate of early PTB in those found to be at risk; the number of births <33 weeks was 21 cases in the treatment group (8.9%) and 36 cases in the placebo group (16.1%). Thus, if the findings of this trial were to be replicated in general clinical practice with no women receiving a placebo and all women participating in the treatment, then screening 32,000 women would identify 2.3% of women requiring treatment and would result in the prevention of 47 cases of births <33 weeks (5).

More recently, the largest randomized controlled trial to have been conducted so far, based in UK, observed a non-significant effect on three primary outcomes—fetal death or birth <34 weeks gestation, a composite of neonatal outcomes, and a standardized cognitive score at 2 years of age (68). There were multiple entry criteria and the treatment consisted of vaginal progesterone from 22–24 to 34 weeks. When the data from the various trials are included in an updated meta-analysis, the beneficial effect of vaginal progesterone treatment for prevention of PTB in women with a singleton pregnancy and shortened cervix in mid-pregnancy remains statistically significant (67).

A cohort study assessing the introduction of routine cervix length screening in mid-pregnancy at a single tertiary level center in Chicago, IL, USA, observed a reduction in preterm births that appeared to be spread across the preterm gestational age spectrum when compared with outcomes prior to introduction of the program (69). The reductions were from 6.7 to 6.0% for births <37 weeks, 1.9 to 1.7% for <34 weeks, and 1.1 to 1.0% for <32 weeks gestation. In this population, the frequency of cervix length less than 25 mm in mid-pregnancy was 0.89%.

But would it be cost-effective to introduce this protocol into clinical practice across the entire population? In a USA-based decision analysis model of a single cervix length measurement in mid-pregnancy compared with no such measurement, with treatment with vaginal progesterone if the cervix were found to be shortened, it was estimated the program would save \$12 million/ year and gain 424 quality-adjusted life years (70).

A more recent USA-based analysis asked if risk-based screening would be more or less effective than universal screening of all pregnancies (71). Results of the decision analytic model indicated that both risk-based and universal screening would be more cost-effective than no screening. Of the two approaches, universal screening of cervix length measurement of all pregnancies was superior to risk-based screening and would result in a higher cost-effectiveness ratio.

Despite the evidence presented above, there remain as many unknowns as knowns. What is the prevalence of shortened cervix in mid-pregnancy in the various populations of the world? Even within the USA there is considerable variation, ranging from 0.89% <25 mm in the Chicago study (69) to 2.3% <20 mm in the multicentered trial reported by Hassan and colleagues (65). Regions such as northern Europe where the rate of PTB is much lower are likely to have even lower prevalence of shortened cervix in mid-pregnancy. Further, we have few data describing the phenotypic and genotypic variables that influence mid-pregnancy cervix length and the response to treatment, and we have little understanding how to monitor and manage cases undergoing treatment and the drivers of treatment success and failure. This is a key issue in the context of precision public health (**Figure 1**). Interestingly, a small USA study found that progesterone and cerclage therapies were only effective in women with biomarkers of inflammation in the amniotic fluid and that without inflammation the risk of PTB was actually increased (72). This preliminary evidence suggests that the inclusion of additional screening parameters, such as inflammation, might improve precision and response to targeted therapies.

To achieve improvements in precision, we need to progress beyond case studies and randomized controlled trials. In the first instance, we need sound and reliable population-based data. Most high resource nations have perinatal collection systems collecting basic information on demographics, the birth, and the newborn. To fully understand and maximize the benefit from mid-pregnancy cervix length measurement and treatment, such data collection systems will require much greater complexity and capability. The expansion will be complicated by the fact that many health-care systems are fragmented, with antenatal investigations and treatments often being dislocated from hospital-based birthing processes. Prevention of PTB is of great benefit to individuals and the community, and development of cost-effective models of health-care delivery using this intervention should be entirely feasible. At this time, it would seem reasonable to expect that the most effective solution will come from population-based screening, but followed by further precision analysis to best understand the most appropriate treatment for each case and the manner by which that treatment should then be monitored for effectiveness.

### AVOIDANCE OF NON-MEDICALLY INDICATED LATE PRETERM AND EARLY-TERM BIRTH

The rates of late preterm, and early-term, birth have been increasing over recent decades in many countries (73, 74). These trends have been underpinned by a general assumption held by many people that birth close to term will not be associated with any enduring compromise for the offspring. A wealth of data, from multiple societies, now suggest otherwise.

Being born late preterm, defined as birth between 34 and 36 weeks and 6 days gestation, places the infant at risk of neonatal and childhood consequences (75). For the neonate, complications may include the need to be admitted to a neonatal intensive special care unit, and special support to maintain respiratory function, temperature control, prevention of infection, and maintenance of normal glucose levels, and much more (75, 76). For the child, there are increased risks of death, re-admission to hospital, cerebral palsy, developmental delay, and behavioral and learning problems at school age (77, 78). In recent years, the findings of potential compromise from prematurity for the child have been extended to birth in the early-term period, ranging from 37 weeks and 0 days to 38 weeks and 6 days.

For those cases in which early birth is required for maternal or fetal reasons, the benefits may outweigh any risks of prematurity. But, we now have strong evidence that there are many cases where such a benefit is not the case, and steps need to be taken to ensure that any elective early births can be fully justified.

Population-based study of the factors involved in rising rates of early births has shown clear demographic differences (2, 14, 79). In a USA study of early births between 1992 and 2002, the major increase in rate was observed in non-Hispanic White births (80). During the decade of study, rates of early births in Hispanic and Black women had remained relatively constant. The factors underpinning these observations are uncertain, but suggest that socioeconomic factors are involved and that both medical and patient contributions need to be considered.

Reducing the rate of early intervention is particularly challenging, but progress has been made by some health-care systems. In a study of 27 health-care facilities in USA, each organization was invited to choose one of three protocols to reduce their rate of non-medically indicated late preterm and early-term birth (81). The options ranged from just education of staff through to complete prohibition of early birth. The gestational age below which non-medically indicated birth was to be discouraged was 39 completed weeks. Outcome data revealed that the most interventionist protocols produced the greatest benefit, with significant reductions in early birth rates and admissions to neonatal intensive care units. The still birth rate did not change during the time of the study. Education alone did not improve outcomes, but of importance, the program only involved the medical staff and did not include education of other health-care practitioners or the patients themselves.

At a national level, there has been considerable public advocacy across USA led by the March of Dimes though a campaign called "Healthy Babies are Worth the Wait," coupled with quality improvement initiatives (6). These programs have been aimed primarily at health-care providers and pregnant women to discourage unnecessary early intervention. The results of these and other programs suggest that the rates of non-medically indicated late preterm and early-term birth rates can be reduced, but there are many confounding factors.

First, the effect is entirely dependent on the extent to which unnecessary early intervention is prevalent in a particular healthcare environment. Further, it is clear that individual health-care practitioners and their pregnant patients actively choose to make such decisions. Understanding when preventative policies should be introduced requires detailed understanding of not just health-care outcomes but also the practices of individual practitioners and the attitudes of women and families who access such care.

Second, are the health-care workforce implications of discouraging elective birth before 39 weeks gestation. Such a policy will inevitably increase the number of cases of spontaneous labor, and there are clear implications for the work–life balance of busy practitioners, especially those in solo or small group practices. In a secondary analysis of a randomized controlled trial in Denmark comparing planned elective Cesarean births at 38 or 39 weeks gestation, delaying the surgery by 1 week resulted in a 60% increase in unscheduled Cesarean sections and a 70% increase in births outof-hours (82). Imposing health-care guidelines that potentially compromise the daily activities of practitioners and hospitals requires considerable justification and a thorough understanding of the issues involved in each health-care environment.

Third, is the potential for stillbirth or fetal compromise by delaying birth. There is no convincing evidence at a population level that earlier delivery is associated with lesser rates of stillbirth but it is logical to assume that a background risk of fetal death must remain present by continuing the pregnancy, even in the absence of any known risk factors. Any such risk would need to be weighed against any potential risk of death or morbidity resulting from late preterm or early-term birth. Reassuringly, the American trial of three management options in 27 hospitals significantly reduced early birth rates and there was no evidence of any increase in risk to the child. In the Danish RCT of elective Cesarean delivery at 38 or 39 weeks gestation, there was a small reduction in rate of delivery to the NICU in the delayed delivery group, but no other signs of danger (82, 83).

It is clear, therefore, that the challenge to prevent non-medically indicated late preterm/early-term births is scientifically justified, but confounded by regional differences, possible benefits and risks to the child by continuing the pregnancy, and potential adverse effects on the practitioners and their health-care facilities. For each health-care environment, detailed and ongoing data availability and analysis are vital if preventative strategies are to be introduced and be maintained. So far, in USA and Western Australia, there has been success in lowering the rates of late preterm/early-term births, but continuing success and translation into other environments where the baseline PTB rates are already lower may be more challenging. The principles of precision public health may offer the solution to this major problem. By applying precision analysis using factors such as genetic predisposition, prior history, and lifestyle variables, we may better understand which pregnancies can safely be left until after 39 weeks gestation and which cases require earlier intervention.

### PREVENTION OF INFECTION-DRIVEN PTB

It is well established that microbial infection of the extra-placental membranes, amniotic cavity, and fetus is an important driver of PTB, particularly in the deliveries at the earliest gestational ages (9, 84, 85). Animal models and clinical studies demonstrate a causal relationship between infection and PTB, while the immunepathophysiological pathways responsible for triggering preterm labor in response to infection have been studied, replicated and characterized in a variety of models (86–88). The most common pathway *via* which bacteria trigger PTB is the so-called ascending infection route: microorganisms residing in cervico-vaginal fluid in pregnancy ascend through the cervical barrier and colonize the fetal membranes, passing through in some cases to colonize the amniotic fluid and from there infect the fetus (86, 89). The severity of the infection and the ensuing inflammatory response determines, to a large part, the obstetric and neonatal outcomes, including the timing and onset of preterm labor, the effectiveness of tocolytic therapy, and the risk of serious neonatal morbidities.

While the majority of preterm deliveries occur in the 32- to 36-week period, the costs and risks of serious perinatal morbidities are highest in deliveries <32-week gestation, the majority of which are infection-associated (9, 84, 85). Hence, preventing PTB as a result of intrauterine infection has the largest potential gains in terms of reducing major morbidity and death and reducing perinatal and lifetime health-care costs (90, 91).

Unfortunately, identifying women at risk of infection-driven PTB and treating them to prevent the infection is challenging, both from an individual patient and public health perspective (92–94). Gestational tissues exposed at different times in pregnancy to different bacteria exhibit variable immune responses depending on dose, duration, distribution, maternal and fetal genetics, ethnicity, lifestyle, and anatomical factors (such as previous cervical surgery). This level of heterogeneity makes identification and risk prediction particularly difficult. There are several well-documented approaches which can identify women at increased risk of infection-related PTB, yet their prognostic value is generally too weak to alter clinical decision-making and treatment. Many of the trials of prophylactic antibiotic therapy given to pregnant women to prevent PTB have failed to employ robust inclusion criteria or delivered poorly effective antibiotic regimens (95, 96). Identifying women who will benefit from treatment is a key requirement for primary prevention: prescribing antibiotics to large numbers of women in pregnancy in order to prevent PTB in a small percentage of recipients is not justifiable in light of the emerging recognition of the potential developmental effects of disrupting the maternal and neonatal microbiomes with antibiotics in pregnancy (97–100).

To date, primary prevention research has focused on identifying and treating women with abnormal vaginal microbiota in early pregnancy prior to the onset of preterm labor (101, 102). This strategy is based on the assumption that women with vaginal dysbiosis have increased risk of ascending infection and that antibiotic treatment will eradicate the pathogens, prevent infection, and thus prevent PTB. The odds ratio of women delivering preterm with bacterial vaginosis (BV) or aerobic vaginitis (AV) is approximately 2–7, and even higher if diagnosed before 16-week gestation (103–105). The presence of *Ureaplasma* in the vagina is also associated with an approximately twofold increased risk of PTB (106–108). However, due to its high prevalence in pregnant women (~50%) (109), detection of *Ureaplasma* alone is not sufficiently diagnostic to warrant prophylactic treatment. The prognostic significance of the presence of specific *Ureaplasma* serovars is currently under investigation (109).

A large number of trials have employed standard microbiological or clinical approaches to identify women with vaginal dysbiosis and treated them with antibiotics (with or without probiotics) to prevent PTB (84, 110, 111). Many of these studies failed to significantly lower the rate of PTB (111–114), although a few studies employing clindamycin treatment before 22 weeks of pregnancy have shown significant maternal and neonatal benefits (95, 96). The negative findings are, in part, due to lack of effective antimicrobial interventions (95) and the failure to treat the dysbiosis or address recurrence (115–117). However, the major impediment to therapeutic progress is the poor prognostic precision of current identification methods. Most women with vaginal dysbiosis or urogenital tract pathogens do not deliver preterm, and our ability to identify those at sufficiently high risk to warrant treatment (OR > 10) is poor (102). Primary prevention studies have focused on defining clinical vaginal microbial disorders and recruiting patients based on these definitions, rather than identifying subgroups of women who are at particularly high risk of delivering preterm as a result of microbial profile and/ or other risk factors and targeting them for appropriate and effective treatment. Currently, although rapid molecular tests for diagnosing BV and AV are being developed (118, 119), we lack an accurate, rapid, and affordable test to identify women at high risk of infection-related PTB (120, 121).

Nevertheless, despite these uncertainties and qualifications, there is encouraging evidence that public health screening for BV or other forms of abnormal vaginal microbiota using traditional techniques can be an effective primary prevention strategy in reducing PTB rates. A recently published Austrian study, implemented following the results of a randomized clinical trial, retrospectively analyzed the pregnancy outcome data of over 17,000 women at high risk of PTB (based on general, family, and obstetric risk factors) across a 10-year timespan following introduction of a voluntary antenatal infection "screen and treat" program (122). All women received standard antenatal care; 49.5% entered the screen and treat program, which consisted of testing of vaginal swabs at 10- to 16-week gestation for detection of BV and presence of *Candida* spp. or *Trichomonas vaginalis*, followed by antimicrobial treatment with either clindamycin, clotrimazole, or metronidazole as appropriate. Recurrent infections were retreated and women with BV were given probiotics after treatment to prevent recurrence. Women in the treated group had a significantly lower rate of stillbirth (0.4 vs. 2.0%), miscarriage (0.5 vs. 1.4%), PTB <37 weeks (9.7 vs. 22.3%), and PTB <32 weeks gestation (1.9 vs. 8.3%). The effect of the program on the rates of early/extreme PTB was particularly impressive, with a more than 77% reduction observed; this is consistent with the known role of intrauterine infection in the majority of deliveries <32 weeks. The major weakness in this study is its retrospective, non-randomized design, although confounding is minimal and unlikely to alter the findings (122).

These data are similar to the achievements of an earlier public health program implemented in Germany in late 1997 (123). The program consisted of a free self-test vaginal pH kit offered to 2,722 women in obstetric care (>12-week gestation), with optional follow-up with obstetricians if the test was positive. Elevated vaginal fluid pH is a weak surrogate marker of BV and AV, typically associated with a lack of *Lactobacillus* spp. Treatment with antibiotics (clindamycin) was indicated if clinical symptoms of BV were present following obstetric examination. The program resulted in much lower rates of PTB <32 weeks in the women who participated (14% of the cohort) compared to women under the same care who did not engage in the pH testing program (0.3 vs. 4.1%). Subsequently, a large prospective trial was conducted, enrolling 8,000 women in the state of Thuringia over a 6-month period. Women who self-tested their vaginal pH and sought medical treatment based on the result (8% of the cohort) had lower rates of PTB at <32 weeks (0.3 vs. 1.6%) and at <37 weeks of gestation (5.3 vs. 8.5%). After discontinuation of the program, PTB rates returned to historical levels in the state (123).

In order to obtain greater precision, we need to refine our ability to identify women at high risk of infection-driven PTB (OR > 10) and target them with appropriate follow-up and treatment. Many of the bacteria found in the amniotic cavity of preterm deliveries are normal commensals of the urogenital tract (124, 125), so vaginal microbiological profiling alone is unlikely to have a high positive predictive value. It remains to be determined whether more refined and selective molecular techniques may help to improve diagnostic discrimination (109). It is likely that a combination of clinical risk factors (e.g., prior PTB, cervical imaging, and abnormality detection), high-resolution microbial profiling (possibly including bacterial strain identification), and immunological/inflammatory biomarker assessment will be needed, in combination with a highly effective antimicrobial regimen (97, 126), to enable a truly effective maternal "screen and treat" program to achieve the desired level of precision and effectiveness required for a primary prevention public health program (127–133).

# "OMICS" AND PRECISION PUBLIC HEALTH

# Identification of At-Risk Individuals: Genomic Approach

The contribution of genomic variation to the etiology of PTB is thought to be in the order of 40% by twin and family studies (134). Women who were born preterm, or who have close family members with a PTB, have a significantly higher risk of PTB. Rates of prematurity are influenced by ethnicity (135): studies have shown that the rates of PTB in African-Americans is significantly higher than other racial groups in similar socioeconomic settings, and that women married to African-American men have higher prevalence of PTB. Determining the precise nature of the genomic variants responsible for determining risk of PTB is hampered by the complex biology of preterm labor (135). From an evolutionary perspective, PTB is detrimental to the survival of the species, and there are multiple levels of redundancy in the biological process of labor initiation, which combine to reduce the incidence of PTB. The genetic basis of PTB is, therefore, unlikely to be monogenic. Rather, in all but the most extreme of phenotypes, multiple changes with gene pathways are required to overcome physiological redundancy and culminate in PTB.

The advantage of genome-based screening tests is that they may be applied prior to pregnancy and allow ample time to initiate primary prevention, rather than attempting to slow or reverse the premature activation of parturition which leads to spontaneous preterm labor. Many gene-targeted analyses and genome-wide association studies (GWAS) have been carried out in an attempt to identify genetic variants associated with PTB (135). Sheikh et al., in a recent review of the literature, identified 119 candidate genes with SNPs that had potential association with PTB in an evaluation of 92 different studies. Many studies have found association between SNPs in parturition-associated genes such as the progesterone receptor, oxytocin receptor, relaxin, the prostaglandin EP3 receptor, and the CRH receptor 1, although the level of risk associated with these polymorphisms is not sufficiently high to be useful clinically (136). Other target genes have yielded more promising SNPs, in particular heat-shock protein 47 (SERPINH1), which is involved in the maturation of collagen molecules and is enriched in African and African-American populations. Polymorphism in other tissue remodeling-related genes like metallopeptidase inhibitor-1 and 2, COL1A2, COL5A2, and COL5A1 also significantly increase the risk of PTB (136). Overall, many of the SNPs significantly associated with PTB are related in some way to inflammation. Lack of replication and population-based heterogeneity remain major hurdles to be overcome.

Identification of combinations of multiple subtle genomic contributors are now feasible with advances in high-throughput genomic sequencing and bioinformatics. In a recent study of women with 2–3 generations of PTB using a meta-genomic, bi-clustering algorithm, Uzun et al. (137) identified variations in 33 genes within five genetic pathways associated with altered PTB risk. Brubaker et al. employed protein network analysis with tissue-specific gene expression data to identify functionally important candidate genes that would be overlooked by standard GWAS techniques. Their analysis identified significant subnetworks and genes not previously associated with PTB, including sub-networks associated with inflammation, muscle function, and ion channels (138). It is likely that extensions of such work will ultimately permit the characterization of an individual's overall genomic risk of PTB as a screening test to identify those at risk, with the molecular consequences of the variation being used to guide preventive interventions.

Further advances are likely to originate from advances in the use of phenotypic data from rare genetic diseases to identify variants associated with common pathways shared by multiple disorders (139). To this end, the recently expanded Human Phenotype Ontology (www.http://human-phenotype-ontology.github.io) now contains 250,000 phenotypic annotations for over 10,000 rare and common diseases (140, 141), which can be used to examine the phenotypic overlap among common diseases with shared risk alleles or those linked by genomic location. Other databases and platforms have been developed to allow accurate assessment of the causal relationship between genetic variants and phenotype; these are becoming critical tools in clinical genetic diagnostics (140), for comparing phenotypes between patient cohorts (142) and for identifying new disease genes *via* the linkage of novel variants with well-defined phenotypes (141, 143). Application of such approaches to understanding the genetic causes of PTB and identifying populations at risk remains to be explored. However, several examples have already been identified. Insights may be expected from analyzing the links between PTB and Prader–Willi Syndrome (144, 145) or by studying Beckwith–Wiedemann Syndrome, which is associated with increased rates of PTB, gestational diabetes, polyhydramnios, and intrauterine bleeding (146). Advances in this area may lead to improved knowledge regarding genetic variants and pathways to PTB which can be exploited to enhance screening programs and develop and target interventions.

# Identification of At-Risk Individuals: Transcriptomic Approach

Transcriptomic methods assess RNA in tissue and quantify the extent to which genes are functioning, rather than inferring variations in function related to variations in sequence. By examining expression of genes related to inflammation, for example, one can find evidence of inflammation prior to the development of clinical manifestations. In PTB, a screening test could be developed to identify individuals with premature activation of parturition at a stage where treatment can reverse such activation prior to the tipping point to inevitable preterm labor (147).

As was the case for genomic methods, alterations in the transcriptome in the lead up to PTB are likely to arise from multiple pathways. Heng et al. (148) described a method considering the expression levels of multiple genes from multiple pathways in maternal whole blood and their relationship to subsequent PTB. Applied to asymptomatic pregnant women at 28-week gestation and including clinical factors, their model predicted PTB with sensitivity of 65%, specificity of 88%, and false positive rate of 11%; their birth cohort had been enriched, with a PTB rate of 31%. This method requires further validation in average-risk populations before being considered for clinical application. Alternative methods may employ samples other than maternal blood such as cervico-vaginal fluid, or analysis of micro RNAs and other related molecules which may give insights into placental pathologies and associated risk factors. With refinement, this approach could be useful in the identification of the woman heading toward PTB in whom interventions could arrest this course and allow safe delivery at term.

# Identification of At-Risk Individuals: Proteomic Approach

As transcriptomics looks at the actual expression of genes rather than the sequence, proteomics goes a step further in examining the protein end products of gene function, giving insight into the physiological alterations related to gene sequence and expression. High-throughput proteomic assessments allow the identification of differentially produced proteins in association with clinically relevant phenotypes. From a PTB perspective, proteomic variation may herald early delivery prior to the development of clinically apparent symptoms.

The most commonly employed protein assessment is fetal fibronectin in women who present with symptoms suggestive of preterm labor. This protein is found in greater quantities in the cervico-vaginal fluid in women who will deliver preterm than in those women whose pregnancy will continue to term (149). Quantitative or qualitative assessment of fetal fibronectin levels permit the rationalization of therapies aimed at delaying delivery or reducing the adverse sequelae of prematurity.

A natural extension of the use of fetal fibronectin testing to stratify risk of PTB in symptomatic women is its application to asymptomatic women. To date, there is little evidence to support the adoption of such screening into routine clinical practice in low-risk pregnant women (150). It may have greater utility in the screening of women with other established risk factors (151–153).

Kim et al. (154) assessed the utility of amniotic fluid MMP-8 as a screening test for subsequent PTB in women undergoing diagnostic amniocentesis in the mid-trimester. The test was highly specific (100%) for subsequent PTB, albeit with relatively low sensitivity (42%). The major limitation of this approach is the invasive nature of the test which is unlikely to be acceptable to the majority of women due to the low but significant risk of pregnancy loss. Several other studies of amniotic fluid proteome have been carried out, with a number of candidate biomarkers identified for predicting PTB and neonatal adverse outcomes secondary to intrauterine inflammation (155, 156), but these have not been clinically exploited (157).

A more acceptable, less invasive testing strategy based on analysis of cervico-vaginal fluids may have greater prognostic potential and acceptance. Gravett et al. (158) undertook a proteomic study of the cervico-vaginal fluid in a non-human primate model of iatrogenic intra-amniotic infection, a major contributor to PTB, with the aim of identifying a non-invasive biomarker for this condition. Twenty-six proteins were found to be differentially expressed in the presence of intra-amniotic infection compared to controls, with a preponderance of proteins involved in inflammatory regulation. Of these, IGFBP1 was increased 16-fold in the presence of intra-amniotic infection, and this is a potential biomarker for clinical application. Many other studies have investigated levels of inflammatory cytokines and proteins in cervico-vaginal fluids, and candidate proteins such as IL-6 have been consistently identified (159, 160), but these have not yielded biomarkers with sufficient prognostic utility to be useful clinically. Georgiou and colleagues employed proteomic analysis of cervico-vaginal fluid samples in at-risk asymptomatic women to identify candidate biomarkers of impending preterm delivery (161). They found that thioredoxin and interleukin 1 receptor antagonist levels were significantly reduced up to 90 days prior to preterm labor compared with women who delivered at term. Both proteins had a positive predictive value of >72% and negative predictive value of >95%. The prognostic value of these biomarkers has yet to be demonstrated in independent studies and populations; however, and to date, proteomic approaches have not yet yielded clinically useful tests that have been commercialized and widely adopted (162).

# PRECISION REFINEMENT USING NEW TECHNOLOGIES

"Omic" approaches may be suitable for both primary screening, as well as precision refinement in women previously identified as being at risk of PTB by other screening modalities (**Figure 1**). Gene–environment interactions underlie almost all responses of complex organisms to external stimuli. This principle is the basis of pharmacogenomics, whereby an individual's genetic susceptibility to the effects of a drug determines whether or not that drug is used. However, this may be used on a larger scale than is currently employed in order to direct interventions in those who screen at increased risk of PTB. For example, women who are genetically more likely to succeed in smoking cessation with nicotine replacement therapy may be offered this intervention, while those genetically likely to fail may avoid the potential adverse outcomes of this therapy (163).

As high-throughput genomic sequencing technology becomes more affordable and rapid and point-of-care devices are developed, future technological advances may provide exciting new avenues for further refinement of risk. Other risk factors likely to be amenable to "omic" precision refinement include bacterial dysbiosis, previous inflammatory and infection-related PTB, as well as cervical dysfunction (**Figure 1**).

Other technological advances are likely to be able to be exploited in the near future to gain greater precision in PTB prevention strategies. Wearable mobile sensor technologies have been developed for a number of applications requiring real-time monitoring of physiological parameters that allow monitoring of health status/ responses and identification of individuals at risk. Examples include remote monitoring of the elderly after transfer to a community care setting (164, 165). Sophisticated systems have been developed and trialed for measuring multiple physiological parameters (166), and it could be envisaged that such systems could be adapted and used in high-risk women to identify changes in uterine activity, for example, suggestive of early onset of labor (164).

In addition, E-registries and web-based surveillance systems are exciting developments in health information systems that have applications in monitoring maternal health trends and outcomes together with changes in population characteristics and risks (167, 168). A simple example that illustrates the potential is the use of a mobile SMS-based system for monitoring maternal health in low-resource settings (169).

# AUTHOR CONTRIBUTIONS

JN: first draft of introduction, populations in transition, cervix length screening and non-medically indicated preterm birth, and responsible for overall manuscript. MK: first draft of smoking section and review of entire manuscript. SW: first draft of genetics and -omic section and review of entire manuscript. CA: coordination of authors and review and submission of manuscript. RH: contributed to writing of overall manuscript and review. JK: first draft of infection-associated preterm birth, preparation of figure, and review of overall manuscript.

### FUNDING

This work was funded in part by the Australian National Health and Medical Research Council grant number APP1077931.

# REFERENCES


Medicine Unit Network. *N Engl J Med* (1996) 334:567–72. doi:10.1056/ NEJM199602293340904


effectiveness of 3 approaches to change and the impact on neonatal intensive care admission and stillbirth. *Am J Obstet Gynecol* (2010) 203:.e1–6. doi:10.1016/j.ajog.2010.05.036


preterm labor and delivery and those with a normal delivery at term. *Microbiome* (2014) 2:18. doi:10.1186/2049-2618-2-18


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The reviewer, GB, declared a shared affiliation, though no other collaboration, with the authors to the handling Editor, who ensured that the process nevertheless met the standards of a fair and objective review.

*Copyright © 2017 Newnham, Kemp, White, Arrese, Hart and Keelan. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Spatially Enabling the Health Sector

*Tarun Stephen Weeramanthri1,2\* and Peter Woodgate2,3*

*1Department of Health, Government of Western Australia, Perth, WA, Australia, 2Cooperative Research Centre for Spatial Information, Carlton, VIC, Australia, 3Global Spatial Network Board, Cooperative Research Centre for Spatial Information, Carlton, VIC, Australia*

Spatial information describes the physical location of either people or objects, and the measured relationships between them. In this article, we offer the view that greater utilization of spatial information and its related technology, as part of a broader redesign of the architecture of health information at local and national levels, could assist and speed up the process of health reform, which is taking place across the globe in richer and poorer countries alike. In making this point, we describe the impetus for health sector reform, recent developments in spatial information and analytics, and current Australasian spatial health research. We highlight examples of uptake of spatial information by the health sector, as well as missed opportunities. Our recommendations to spatially enable the health sector are applicable to high- and low-resource settings.

Keywords: spatial information, health sector, health reform, health information, innovation, technology, end-user development

### *Edited by:*

*Paul Russell Ward, Flinders University, Australia*

### *Reviewed by:*

*Mariastella Pulvirenti, Flinders University, Australia Ronan Foley, Maynooth University, Ireland*

### *\*Correspondence:*

*Tarun Stephen Weeramanthri tarun.weeramanthri@health.wa.gov.au*

### *Specialty section:*

*This article was submitted to Public Health Policy, a section of the journal Frontiers in Public Health*

*Received: 03 June 2016 Accepted: 17 October 2016 Published: 04 November 2016*

### *Citation:*

*Weeramanthri TS and Woodgate P (2016) Spatially Enabling the Health Sector. Front. Public Health 4:243. doi: 10.3389/fpubh.2016.00243*

# THE IMPETUS FOR HEALTH SECTOR REFORM

Spatial information describes the physical location of either people or objects, and the measured relationships between them. We argue in this article that well-established geographic information systems (GIS), as well as more recent advances in spatial technologies, analytics, and visualization, have the potential to enrich our understanding of health systems and drive strategies for health sector reform.

In defining "health systems," the World Health Organization (WHO) includes "all the activities whose primary purpose is to promote, restore and/or maintain health" as well as associated "people, institutions and resources."1 The use of the word "sector" emphasizes the industry aspects, including its organization (public and private), financing, and performance dimensions. Health sector reforms have been defined as "sustained, purposeful changes to improve the efficiency, equity and effectiveness of the health sector" (1).

There are a number of pressures on health systems that are present in many countries, regardless of their level of development. These include an aging population, with an increasing burden of chronic disease; introduction of new technologies (services, drugs, and medical devices); and increased consumer expectations (which tend to rise with increasing wealth). In short, demand for health services is increasingly outstripping supply, and as a result, costs are rising in an unsustainable way. In the five decades to 2008, spending on health care grew by 2% points in excess of GDP growth across all Organisation for Economic Cooperation and Development (OECD) countries. From 2005 to 2009, average annual health spending growth across the OECD was 3.4%, though this slowed to 0.6% in the period from 2009 to 2013, in the aftermath of the global financial crisis (2).

1http://www.who.int/healthsystems/hss\_glossary/en/.

Senkubuge and colleagues (3) argue that there is no single global or regional policy formula for health sector reform and that it will depend on a country's history, values, and culture. On the one hand, many developed countries are approaching health reform from a cost-efficiency perspective, aiming to reduce costs while maintaining quality of care and improvements in longevity. On the other hand, there is a strong agenda for change in less wealthy countries, which emphasizes universal health insurance, equity, strengthening of primary care, and addressing of social and environmental determinants of health.

The WHO describes how a well-functioning health information system allows for reliable and timely decision making at different levels of the health system (4). It outlines three domains of health information: health status; determinants of health; and health system performance. These three domains have been used by many countries, including Australia, to develop performance frameworks,2 which can also then serve as ways to measure the success or otherwise of health reform efforts.

Having made the argument that information is a key pillar for health reform, we now turn to the value of spatial information in particular.

### THE FUTURE IS SPATIAL, BUT HEALTH SECTOR UPTAKE IS PATCHY

Spatial information is a broad term that describes the connection between data on positioning and location with that of people, objects (both built and natural), and activities. It includes tools such as aerial and satellite remote sensors, Global Navigation Satellite Systems (e.g., GPS), and computerized GIS software. Traditionally spatial data have been uniquely characterized as geographic (e.g., longitude and latitude) or map-based coordinates. In more recent years, the concepts of location and place ("the railway station" and "my home," respectively) have gained broader attention. For example, "what3words"3 has divided the world into a 3 m-by-3 m grid and assigned a unique combination of three words to each grid cell to enable a fine granularity of positioning using language, not coordinates. So "spatial" is not "just maps."

Spatial analytics is also changing. Traditional approaches using GIS software have typically undertaken distance and proximity analyses on single and multiple datasets that are co-registered and "stacked" in a single database. More recently, there has been a trend for spatial analytics to become increasingly web-based, accessing both data and tools from multiple sources.

Adoption of spatial technology by the public *via* Internetenabled and mobile devices is now commonplace, and health professionals are no exception in terms of their use for personal purposes. Case studies of potential uses in the health sector include mapping and matching of needs and services, and evaluation of outcomes (including adverse events and medical errors), depending on location of residence or work (5). For example, a recent study looked at the variation in common hospital procedures (such as arthroscopies, cesarean sections, and cardiac procedures) across regions in Australia and found that much of the variation was unwarranted, unrelated to demonstrable need, and hence not a good use of scarce resources (6).

Spatial tools have also been used for many years to explore environmental determinants of cancer, describe risk factors for chronic disease, investigate disease transmission, and plan for and respond to natural disasters, including in low-resource settings, where application to infectious disease surveillance and outbreak response predominates (7). Indeed, modern public health, in the English-speaking world, was founded in the work of John Snow and his carefully drawn cholera maps in the London of the 1850s (8).

Our argument is that despite all of the above, there is little evidence that such technologies are being used in a comprehensive or planned way across the health sector to drive health reform. Certainly, in developed economies, the hospital and acute care sector has seen the benefits of precision location technology through sophisticated *within the body* imaging techniques (CT, MRI, etc.). But this has not translated into a consistent desire for location precision *outside the body*.

As a result, good spatial health practice remains the exception rather than the rule, GIS practitioners remain relatively isolated from other information analysts, the role of location or spatial information is rarely discussed at an executive level in health organizations, and it is often neglected in strategic planning. We miss many opportunities to utilize spatial technology in the health sector, and its potential remains significantly under-utilized.

### AUSTRALASIAN SPATIAL HEALTH RESEARCH

The geoservices market is dynamic and expanding faster than the global economy (9). Many sectors, including defense, engineering, transport, energy, agriculture, environment, and mining and resources, have been utilizing spatial technology in a systematized way for many decades, and several were instrumental in a successful bid in 2003 for a Cooperative Research Centre in Spatial Information (CRC-SI).4 The Australian Government's Cooperative Research Centre's Programme5 supports industryled collaborations between researchers, industry, and the community, and forms part of the National Innovation and Science Agenda.6 In 2010, a Health Program was included in a successful second-phase funding bid of the CRC-SI.

The CRC-SI aims for end-user-driven research, and this same principle underpins the Health Program, which has major programs on visualization and privacy protection (based in Western Australia), spatial statistical modeling (based in Queensland), and healthy cities and recovery from natural disasters (based in New Zealand).7

<sup>2</sup>http://meteor.aihw.gov.au/content/index.phtml/itemId/435314. 3http://what3words.com/.

<sup>4</sup>http://www.crcsi.com.au/.

<sup>5</sup>http://www.industry.gov.au/industry/IndustryInitiatives/IndustryResearch Collaboration/CRC/Pages/default.aspx.

<sup>6</sup>http://www.innovation.gov.au/page/agenda.

<sup>7</sup>http://www.crcsi.com.au/research/4-4-health/.

It is helpful for end-users if spatial tools are available and accessible to more than a few GIS specialists. Therefore, the Department of Health in Western Australia (DoHWA) has developed an online geovisualisation tool called *HealthTracks*, with both interactive mapping and reporting functions, that makes a broad range of demographic, health, and environmental data available *via* a web interface to all employees (10). Non-expert users, including clinicians, program managers, policy makers, and planners, can generate their own local area reports, tables, and maps.

*HealthTracks* has successfully extended the number of users of GIS information in DoHWA. Prior to its development, around six epidemiologists and GIS analysts were regular users of spatial technologies. The semi-automation of analytics, coupled with a largely plain language interface, has seen 150 users generate over 7000 maps and reports in the last year. However, this number remains a fraction of the total workforce of over 40,000.

A related project called *Epiphanee* has focused on generating dynamic privacy protections, based on a complex algorithm that takes into account small number analysis and reporting – a particularly important issue for health data, and of particular concern to data custodians charged with protecting health privacy (11). The program allows the user, when making a single data request from multiple datasets, to trade-off competing dimensions of area size, disease specificity, and demographic composition, and generate a report that falls within probabilistic privacy limits set by the data custodians. This project highlights the distinctive and highly sensitive nature of health information.

A good spatial analytics program for health relies on fundamental spatial concepts, including scale, accuracy, and geocoding uncertainty (12), as well as a critical approach to spatial thinking, reasoning, and language (13). Data analysis also requires specialized statistical expertise to deal with the heterogeneity of spatial data, and its tendency to exhibit autocorrelation. Using the release of the Atlas of Cancer in Queensland (14) as a foundation, Queensland researchers have developed new spatiotemporal modeling techniques to examine cancer incidence and survival within small areas, important for understanding and reducing population-level inequities. Analysis based on the Atlas led the Queensland government to double its rural travel subsidy for patients to attend screening and treatment facilities.

Epidemiological studies have traditionally looked at the aggregate relationship between geographic areas and disease risk factors or outcomes (e.g., a certain neighborhood may have an elevated level of obesity or diabetes). Longitudinal spatial studies are much less frequent. In New Zealand, University of Canterbury researchers have used fine-grained spatial tools and modeling, to examine the medium to long-term health and mental health impacts of the 2010/2011 earthquakes, particularly as they relate to place of exposure and subsequent mobility (15). The New Zealand research builds on a joint Ministry of Health-University of Canterbury venture, called the GeoHealth Laboratory (16), a partnership that helps ensure research results inform the targeting and ongoing design of social support and health services.

Even less frequently explored is how geocoded social determinants can be used to improve patient care at the community health center level (17). Research has identified the spatial clustering of older patients with poorly controlled diabetes within a large Australian general practice using individual-level data (18). Such analyses have the potential to promote new preventive strategies and stimulate better targeted approaches to patient management at a community level.

Australia has also focused on building capacity through international partnerships and is a member of the Global Spatial Network (GSN),8 which has identified the health sector as a priority sector for growth and application of spatial technology. The GSN is made up of research organizations that specialize in collaborative research. Member organizations must have partners drawn from the research sector, the private sector, and the government sector. Partnering organizations have come from Sweden, the EU, the US, Canada, Mexico, Korea, New Zealand, and Australia. The GSN seeks to promote international collaboration in complex, multi-organization spatial research and facilitate information sharing. As a result of the Network's activities, Sweden now hosts an annual workshop of spatial and health researchers as part of their "Geolife Region program."9

Training is central to building capacity, and the teaching of spatial skills encompasses much more than the use of GIS software. Specially commissioned special "GIS Awareness for Health Professionals" training courses that use case studies and scenarios to promote the intelligent application of spatial data and analysis within the health sector have now begun to be developed.10

These projects and products have not yet, however, been game changers in the health sector in Australia. There are a number of factors, including the relatively modest size of the program (less than one million Australian dollars per year), and its positioning as a "research" rather than a "services" program within a very large industry. But there may be other factors. In thinking through this issue, we adopted a model of technology use as the product of "context, tool and user," and a "spatial maturity" model as a *set of capabilities* required for the effective use of spatial technology (19).

The CRC-SI has developed and trialed this model in a health sector setting, in order to benchmark performance and drive organizational improvement. The messages were clear: there is more to technology adoption than just "cool tools"; use of a generic framework can help assess and improve "capacity to use"; and mixed methods analysis (combining quantitative and qualitative approaches) can generate critical organizational insights (20).

### CHALLENGES AND FUTURE DIRECTIONS

Spatial technology encompasses much more than static maps. The global public are already familiar with an array of new dynamic location tools from GPS-enabled mobile phones to Google Earth. Newer trends in data sharing made possible through wearable sensors, crowdsourcing, and interactive social media platforms are developing quickly and will stimulate debate about the social

<sup>8</sup>http://www.globalspatial.org/.

<sup>9</sup>http://geoliferegion.com/about-geo-life-region/about-geo-life-region-2/. 10http://ngis.com.au/gis-awareness-for-health-professionals/.

and contextual dimensions of *place*, to complement the technical considerations of *space* and *location* (21).

Developments in remote sensing and positioning technology will provide more precise environmental data, wearable sensors will provide a wealth of individual data, and informatics will allow the analysis of such big and complex datasets in much closer to real time. Importantly, computing power has increased, and technology costs have fallen, leading to potential applications even in low-resource settings.

There is likely to be increasing public demand for more tailored risk communication and personalized medicine, which will depend, in part, on more accurate and accessible spatial data. But there is no guarantee that advances in the use of spatial technology for individuals will aggregate neatly into improvements at a population level. In other words, precision medicine or precision "wellness" that is available only to a minority, and does not give consideration to other determinants of health, can aggravate inequity in a population (22). Hence, there is need for a broader performance framework for successful health reform.

In the health sector, uptake of spatial technology will also be affected by parallel developments in e-Health, telehealth, business intelligence, "Big Data," and the web (3.0 and beyond). Over time, more datasets will be linked, more health information will be available online, and more use made of off-the-shelf software and automated processing. As governments make more of their data open and freely available, the potential to combine such data with open analytic tools and personal data from other sources will also increase, leading to potentially greater insights but also foreseeable dangers. Privacy, data sharing, and data security concerns will need to be continuously addressed. In this new environment, the semantic web11 has the potential to empower users to access sophisticated programs through plain language queries. For example, it may be feasible to perform a geographic search for all cancer screening facilities within a radius of a chosen location and combine this with patient screening behavior, sociodemographic information, and cancer outcomes, so as to better target interventions to increase screening rates – all from a web browser.

### MISSED OPPORTUNITIES

The public, clinicians, health system planners, and policy makers each have a stake in improving both the spatial specificity of the information that underpins advanced analytics and our ability to visually communicate that information for a variety of purposes, including risk communication, service delivery, and planning, and policy.

However, we think that much more needs to be done to catalyze a transformation of what is a massive and complex industry sector, so that spatial information can become integral to evidence-based, data-rich, and patient-centered health reform. Supporting spatial data infrastructures (such as the European Union INSPIRE12 initiative) are well established in some regions, but it is the culture and capabilities within the health sector that are poorly developed.

We would like to see such initiatives, and indeed all large agencies that handle health data, undertake a "spatial maturity" review to identify their existing operational capabilities, and any measures that could be readily adopted or adapted to improve information handling and analysis in a systematic way and in a strategic context. This would transform the potential of spatial information – from an optional extra to an essential ingredient of a strong information strategy underpinning health reform. Spatial maturity reviews also serve as a form of future due diligence, setting up a pathway that links agencies to both the activities of the spatial analytics research community and the proprietary tools of the private sector.

### RECOMMENDATIONS

In summary, based on our experience in health delivery and spatial health research, we believe that the core technology is present and developing rapidly for spatial information to contribute to health sector reform. No major technological breakthrough is needed. What is missing is an attitude change to see the potential and make the most of spatial data and analytics, as well as to incorporate spatial thinking into strategic thinking.

We therefore make the following recommendations to spatially enable the health sector:


As a result, the health community misses clear opportunities to add value to information from a spatial perspective. Two recent Australian examples are the Personally Controlled Electronic Health Record13 and the National Disability Insurance Scheme.14 Neither of these large potentially transformative national programs, critical to health sector reform, considered or built detailed spatial specifications into their initial roll out plans.

<sup>11</sup>https://www.w3.org/standards/semanticweb/.

<sup>12</sup>http://inspire.ec.europa.eu/.

<sup>13</sup>http://www.health.gov.au/internet/main/publishing.nsf/content/ehealth-record. 14https://myplace.ndis.gov.au/ndisstorefront/index.html.

# AUTHOR CONTRIBUTIONS

TW created the first draft of this article, based on a set of discussions with PW over many years. Both TW and PW have contributed ideas, text, and references to subsequent and final versions.

# ACKNOWLEDGMENTS

The authors would like to gratefully acknowledge the contribution of all members of the CRC-SI Health Program since its inception – program managers, researchers, science directors,

# REFERENCES


board members, administrators, and partners – as well as the support of the broader CRC-SI administration and board.

### FUNDING

The Cooperative Research Centre for Spatial Information (CRC-SI) is funded through the Australian Government Cooperative Research Centre Programme that supports industry-led collaborations between researchers, industry, and the community. PW is the salaried Chief Executive Officer of the CRC-SI, and TW acts as the Chair of the Health Program Board in an unpaid capacity.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The reviewer MP and handling Editor declared their shared affiliation, and the handling Editor states that the process nevertheless met the standards of a fair and objective review.

*Copyright © 2016 Weeramanthri and Woodgate. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*