Edited by: Hee Jeong Kim, Asan Medical Center, South Korea
Reviewed by: Praveen Vikas, University of Iowa Hospitals and Clinics, United States; Jaime D. Lewis, University of Cincinnati, United States; Angela Toss, University of Modena and Reggio Emilia, Italy
*Correspondence: Kevin S. Hughes,
This article was submitted to Women’s Cancer, a section of the journal Frontiers in Oncology
This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
Pathogenic variants in cancer susceptibility genes can increase the risk of a spectrum of diseases, which clinicians must manage for their patients. We evaluated the disease spectrum of breast cancer susceptibility genes (BCSGs) with the aim of developing a comprehensive resource of gene-disease associations for clinicians.
Twelve genes (
Forty-two diseases were found to be associated with one or more of the 12 BCSGs for a total of 86 gene-disease associations, of which 90% (78/86) were verified by ClinGen and/or NCCN. Four gene-disease associations could not be verified by either ClinGen or NCCN but were verified by at least three of the other four genetic resources. Four gene-disease associations were verified by the NLP procedure alone.
This study is unique in that it systematically investigates the reported disease spectrum of BCSGs by surveying multiple genetic resources and the literature with the aim of developing a single consolidated, comprehensive resource for clinicians. This innovative approach provides a general guide for evaluating gene-disease associations for BCSGs, potentially improving the clinical management of at-risk individuals.
Hereditary predisposition is found in approximately 10% of all breast cancer cases (
Pathogenic variants in a BCSG can also increase the risk of other diseases. For instance,
A variety of existing resources, in addition to NCCN and ClinGen, describe the diseases associated with each gene (
In addition, the rapidly growing medical literature makes it not possible for clinicians to extract useful information precisely and quickly. To address this challenge, Natural language processing (NLP), a technology that trains a computational algorithm with many annotated examples to allow the computer to “learn” and “predict” the meaning of human language, may present a promising solution. Our previous studies illustrate how to train and evaluate an NLP algorithm and incorporate it into a semi-automated procedure to accurately identify the penetrance studies based on abstracts (
Relying on a patchwork of resources is cumbersome, time-consuming, and can lead to errors of omission. A single comprehensive resource is critically needed to streamline this process. In light of these issues, we have developed a novel approach to identify, evaluate, and curate the diseases or complex syndromes associated with cancer susceptibility genes based on six genetic resources and the NLP literature review.
Germline genetic testing is performed on non-cancer cells and mostly blood-based or saliva-based, and a germline pathogenic variant in a cancer susceptibility gene indicates the possibility that other family members have a hereditary susceptibility to developing cancer. In contrast, somatic testing is performed on cancer cells (e.g., tumor tissue), and a somatic variant may guide targeted therapy and other treatment decisions. The present study focused on germline BCSGs, and only monoallelic BCSGs were included. The BCSGs were initially identified using ClinGen (
Flow chart for identifying and evaluating gene-disease association. The number ‘1’ indicates that the gene was associated with BCSG in the resource. The number ‘0’ indicates that the gene’s association with BCSG was refuted in the resource. The number ‘9’ indicates that the gene’s association with BCSG was unclear in the resource. Uncertain association indicates that the gene’s association with BCSG is unclear, and further studies are required to refute or accept the association. BCSGs, breast cancer susceptibility genes; NLP, natural language processing.
Diseases associated with BCSGs were initially identified in the six genetic resources (ClinGen, NCCN, OMIM, Genetics Home Reference, GeneCards, and Gene-NCBI) and by reviewing the literature. For each of these sources, each potential association was coded in our database as ‘1’ if the association was definitive, ‘9’ if the association was possible, and ‘0’ if there was no association, as shown in
ClinGen is a database curated by the Clinical Genome Resource. It uses a standardized clinical validity framework to assess evidence to validate a gene-disease association and to define disease management. We extracted data regarding gene-disease associations directly from the ‘Gene-Disease Validity’ reports in ClinGen (
The strength of ‘Gene-Disease Validity’ was classified by ClinGen as ‘Definitive’, ‘Strong’, ‘Moderate’, ‘Limited’, ‘Refuted’, ‘Disputed’, or ‘No Reported Evidence’ based on the level of evidence. If an association was classified as ‘Definitive’, ‘Strong’, or ‘Moderate’, it was coded in our database as ‘1’ in the field ClinGen Validity. If an association was classified as ‘Limited’, it was coded in our database as ‘9’. If an association was classified as ‘Refuted’, ‘Disputed’ or ‘No Reported Evidence’, it was coded in our database as ‘0’.
We also reviewed the ‘Actionability’ reports in ClinGen, where the gene-disease associations were identified indirectly (
Data was extracted from the NCCN Guidelines on Genetic/Familial High-Risk Assessment: Breast, Ovarian and Pancreatic (Version 2.2021) (
Other reputable databases such as ‘OMIM’, ‘Genetics Home Reference’, ‘GeneCards’, and ‘Gene-NCBI’ (described in detail below) were also used to identify gene-disease associations. If a gene-disease association was present in one of these resources, this association was coded as ‘1’ in our database.
‘OMIM’ is an online compendium of human genes and genetic phenotypes that is written and regularly updated by the McKusick-Nathans Institute of Genetic Medicine. The “Clinical Synopses” table for each gene was used to identify gene-disease associations.
‘Genetics Home Reference’ is a free online resource that was created after the announcement of the human genome map in 2003 and is maintained by the National Library of Medicine. It is designed to make the connection between genetics and disease more transparent for the general public. The “health conditions related to the Genetic Changes” section for each gene was used to identify gene-disease associations. Of note, as of October 1, 2020, Genetics Home Reference was ended as a stand-alone website, and most of its content has been transferred to MedlinePlus Genetics (
‘GeneCards’ is a comprehensive database of human genes. The content of this database is reviewed and updated by the GeneCards Suite Project Team. The “disorders” table for each gene was used to identify gene-disease associations.
‘Gene-NCBI’ is a resource of the National Center for Biotechnology Information (NCBI), which centralizes gene-related information into individual records. Many different types of gene-specific data are connected to the record including gene products and their attributes, expression, interactions, pathways, variation, and phenotypic consequences. The “Phenotypes” section for each gene was used to identify gene-disease associations.
The process of validating the gene-disease association is outlined in
All ‘uncertain’ gene-disease associations were further evaluated by literature review using an abstract classifier NLP procedure, which classifies abstracts as being relevant to cancer penetrance or not (
In this study, we used standard gene and disease PubMed search terms (
If no relevant penetrance abstract was identified, the association was designated ‘no association’. If relevant penetrance studies were identified, they were presented in a group consensus meeting with our principal investigator (KSH), one surgery resident, and four clinical researchers participating (two attending surgical oncologists and two research fellows in surgical oncology). The attendees selected high-quality penetrance studies based on study design, patient population, number of pathogenic variant carriers, and ascertainment mechanism, and reached a final consensus based on evaluating these high-quality studies. As a rule of thumb, we considered a gene-cancer association to be real if at least one high-quality penetrance study reported at least a two-fold increased risk that was statistically significant. If the attendees could not reach a consensus, the gene-disease association remained ‘uncertain’. Of note, to ensure accuracy, the group meeting not only discussed the potential controversial gene-cancer associations but also examined all the evidence regarding every gene-cancer association reported in the study.
As shown in
Associations between the 12 susceptibility genes and breast cancer in six genetic resources.
Gene | Genetic Resources | ||||||
---|---|---|---|---|---|---|---|
No. of resources | ClinGen | NCCN | OMIM | GHR | GeneCards | Gene-NCBI | |
|
6 | Definitive | Strong | 1 | 1 | 1 | 1 |
|
6 | Definitive | Strong for |
1 | 1 | 1 | 1 |
|
6 | Definitive | Very strong | 1 | 1 | 1 | 1 |
|
6 | Definitive | Very strong | 1 | 1 | 1 | 1 |
|
6 | Definitive | Strong | 1 | 1 | 1 | 1 |
|
6 | Definitive | Strong | 1 | 1 | 1 | 1 |
|
4 | Definitive | Strong | 1 | 1 | ||
|
4 | Definitive | Strong | 1 | 1 | ||
|
4 | Definitive | Strong | 1 | 1 | ||
|
3 | Definitive | Strong | 1 | |||
|
1 | Strong | |||||
|
1 | Moderate |
The number ‘1’ indicates that the gene was associated with breast cancer in the resource.
GHR, Genetics Home Reference; NCBI, National Center for Biotechnology Information.
There were 66 unique diseases initially identified, of which 42 diseases were determined to be associated with BCSGs by our evaluation (
The disease spectrum of each BCSG is shown in
Diseases associated with the 12 breast cancer susceptibility genes.
BCSGs | Disease Spectrum | ||
---|---|---|---|
Malignant | Benign | Borderline | |
|
Breast Cancer, Colorectal Cancer, Gastric Cancer, Pancreatic Cancer, Prostate Cancer | ||
|
Breast Cancer | ||
|
Breast Cancer, Ovarian Cancer, Pancreatic Cancer, Prostate Cancer | ||
|
Breast Cancer, Melanoma, Ovarian Cancer, Pancreatic Cancer, Prostate Cancer | ||
|
Breast Cancer, Gastric Cancer | BCD Syndrome* | |
|
Breast Cancer, Colorectal Cancer, Gastric Cancer, Kidney Cancer, Prostate Cancer, Osteosarcoma, Thyroid Cancer | ||
|
Brain Tumor, Breast Cancer, Leukemia, Sarcoma | Bone Dysplasia, Cafe-Au-Lait Spots, Intellectual Disability, Iris Hamartoma, Neurofibroma, Pulmonary Stenosis, Skin | GIST, Paraganglioma, Pheochromocytoma |
|
Breast Cancer, Ovarian Cancer, Pancreatic Cancer, Prostate Cancer | ||
|
Brain Tumor, Breast Cancer, Colorectal Cancer, Endometrial Cancer, Kidney Cancer, Melanoma, Thyroid Cancer | Acral Keratoses, Autism, Cerebrovascular Malformation, Facial Papules, GI Hamartomatous Polyps, Lipoma, Macrocephaly, Macular Pigmentation, Oral Mucosal Papillomatosis, Palmoplantar Keratoses, Thyroid, Trichilemmoma, Uterine Fibroid | |
|
Breast Cancer | ||
|
Breast Cancer, Cervical Cancer, Colorectal Cancer, Endometrial Cancer, Gastric Cancer, Hepatobiliary Cancer, Lung Cancer, Pancreatic Cancer, Small Intestine Cancer | GI Hamartomatous Polyps, Skin | Non-Epithelial Ovarian Tumor, Ovarian SCST, Testicular SCST |
|
Adrenocortical Carcinoma, Brain Tumor, Breast Cancer, Colorectal Cancer, Hepatobiliary Cancer, Pancreatic Cancer, Osteosarcoma, Soft Tissue Sarcoma |
GI, gastrointestinal; BCD, blepharocheilodontic; SCST, sex cord-stromal tumor; GIST, gastrointestinal stromal tumor.
*BCD syndrome consists of facial dysmorphism, hypertelorism, imperforate anus, distichiasis, clinodactyly, hypoplastic nails, choanal atresia, cleft palate, and benign teeth disorder.
A total of 160 gene-disease associations were initially identified in the six genetic resources and literature (
Disease spectrum of breast cancer susceptibility genes. “†” refers to both female and male breast cancer. The three colors represent malignant disease (black), benign disease (grey), and borderline disease (orange), respectively. NLP, natural language processing; GI, gastrointestinal; BCD, blepharocheilodontic syndrome; SCST, sex cord-stromal tumor; GIST, gastrointestinal stromal tumor; NEOT, non-epithelial ovarian tumor.
Although hereditary breast cancer is mainly associated with
One of the authoritative resources used for this study is the NIH-funded ClinGen. In contrast to “expert panel” consensus assessments used by NCCN, ClinGen creates a framework that provides evidence for the strength of the association between a gene and a disease risk through semi-quantitative classification (
Four other genetic resources (OMIM, Genetics Home Reference, GeneCards, and Gene-NCBI) are also considered reputable and contain a comprehensive compendium of relationships between phenotypes and genotypes. However, these resources lack the strict curation processes for evaluating strength of evidence utilized by ClinGen or the expert panels employed by NCCN. Therefore, we rated the level of evidence from these four resources lower than ClinGen and NCCN, and the gene-disease association was designated ‘verified’ only if it was established by at least three of these sources when the relationship was not found in ClinGen or NCCN. Meanwhile, we understand that the likely valid gene-disease associations we identified that were not present in ClinGen or NCCN may be explained in part by the observation that the latter entities work in a slow and deliberate manner that might not yet have allowed a full review of all associations.
Forty-nine unique diseases were verified as being associated with BCSGs by our procedure. Each BCSG was associated with at least three diseases except
Generally speaking the BCSGs are thought to affect female breast cancer risk, but some are also associated with male breast cancer (MBC). Tai et al. evaluated 97 men with breast cancer from 1939 families. The cumulative risk of breast cancer was higher in both
Notably, we found that
In the present study, 82% of gene-disease associations were verified by ClinGen and/or NCCN, underscoring the credibility of these two major resources. Nevertheless, six gene-disease associations were not found in ClinGen or NCCN but were instead identified in at least three of the other four genetic resources. Furthermore, these associations were similarly supported by published studies with strong evidence of the association, underscoring the reliability our review criteria.
Of note, four gene-disease associations, i.e.,
The NCCN guidelines for considering risk-reducing mastectomy and breast MRI are well established for carriers of high-risk genes (e.g.,
Evaluation based on six genetic resources could result in omissions of some phenotypes associated with BCSGs. We attempted to lessen this effect by including a literature review as an additional step. Another limitation is that the strict criteria we set for gene-disease associations (e.g., verified by ClinGen/NCCN, or at least three genetic resources) could mean that some diseases are overlooked. By reviewing the literature using NLP, we reevaluated those uncertain gene-disease associations to lessen this effect as much as possible. Although the comprehensiveness of our data seems to be conducive to more individualized care, this raises the problem of absence of management guidelines for patients who carry such variants. Additionally, the clinical utility of identifying potential diseases in BCSG carriers may conflict with current cost-efficacy constraints (i.e., interpreting variants, genetic counseling, overdiagnoses, and resulting anxiety in patients). Of note, we are making assumptions based on the available evidence, and we recognize that authoritative sources, such as ClinGen and NCCN guidelines, are updated periodically. Thus, this study represents a snapshot of current knowledge and understanding, rather than a definitive conclusion.
In 2016, we built a clinical decision support tool for cancer susceptibility genes, called Ask2Me.Org (
To the best of our knowledge, this is the first study to collate the disease spectrum of BCSGs from multiple sources and make it available in a single resource. Notably, we developed an innovative assessment process based on six genetic resources and literature review using an NLP procedure. Throughout our evaluation process, we have kept in mind that frequent updates of the disease spectrum will be necessary to adjust for new data in these genetic resources. Our study provides a reference point for future studies, showing that BCSG mutation carriers should also be cautious of other diseases beyond breast cancer and highlights the necessity of broadening the criteria of management and improving outcomes for at-risk individuals.
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.
We used public database with no patient data, and individual informed consent was waived.
JW, KY, DB, and KSH were involved in the conceptualization and design of this study. JW, PS, KY, JZ, KP, and SKM collected the data. YB and MW were responsible for maintaining the natural language processing abstract classifier. JW and PS analyzed the data and interpreted the results. JW, PS, and KY drafted the initial manuscript with critical feedback from DB and KSH. All authors contributed to the article and approved the submitted version.
KH receives Honoraria from Hologic (Surgical implant for radiation planning with breast conservation and wire-free breast biopsy) and Myriad Genetics and has a financial interest in CRA Health (Formerly Hughes RiskApps). CRA Health develops risk assessment models/software with a particular focus on breast cancer and colorectal cancer. KH is a founder and owns equity in the company. KH is the Co-Creator of Ask2Me.Org, which is freely available for clinical use and is licensed for commercial use by the Dana Farber Cancer Institute and the MGH. KH’s interests in CRA Health and Ask2Me.Org were reviewed and are managed by Massachusetts General Hospital and Partners Health Care in accordance with their conflict of interest policies. DB co-leads the BayesMendel laboratory, which licenses software for the computation of risk prediction models. She does not derive any personal income from these licenses. All revenues are assigned to the lab for software maintenance and upgrades.
The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
The authors acknowledge Ann S. Adams (Department of Surgery, Massachusetts General Hospital) for editorial and writing assistance.
The Supplementary Material for this article can be found online at: