About this Research Topic
The decreasing cost of genomic sequencing is making it accessible for the analysis of genetic variation on a global scale, which has several potential benefits for the analysis of the origin of human disease and patient-specific treatment. Thus, it is of great interest to consider the analysis of genetic variations in clinical pipelines.
Information about genetic variation is available from high-throughput studies and is typically published in databases or curated from the scientific literature. Information from high-throughput methods needs to be interpreted. This interpretation usually relies on curated databases to identify potential disease-causing variants. Moreover, the curated databases rely on the manual transfer of information from the scientific literature. Due to its manual nature, this method is slow, of course, and it is not framed to keep up with the pace at which new information is being published in the scholarly literature landscape.
As the scientific literature contains relevant information about genetic variation and the interpretation of the genetic variants, it is, therefore, pivotal to develop new automated processes to extract meaningful data from scientific papers. Text-mining techniques are effective to automatically process this data and can certainly make information’s curation processes more efficient and faster, while also providing relevant information to clinical pipelines that might benefit from the interpretation of genetic variants. There have been advances in the automatic processes to identify genetic variants from the scientific literature but the interpretation of these variants, e.g. how they are linked to disease mechanisms, still requires further research and development. Furthermore, to support such interpretation, it is likely that additional information is required from existing structured data, thus combining multiple sources of data might support the discovery of functions of specific variants. Moreover, the reuse of linked/combined data sets will significantly support further predictive analytics targeted towards patient care.
As such, this Research Topic aims at collecting novel scientific content in the context of, but not limited to, the following aspects:
- Novel methods to extract genetic variation from scientific literature;
- Novel methods to link/ground variation mention to database identifiers or a nomenclature (e.g., HGVS);
- Extraction of information relevant to the relation between genetic variation and disease and/or phenotypes from scientific literature;
- Development of new data sets or corpora to be used by text mining methods;
- Combination of methods, which includes text mining for the interpretation of genetic variants;
- Methodologies that reuse extracted genetic variation information from the literature to predict disease phenotypes and protein properties;
- Novel methods to combine information extracted from the text about genes, variants, diseases, and proteins with information in existing databases or high-throughput assays.
Important Note: All contributions to this Research Topic must be within the scope of the section and journal to which they are submitted, as defined in their mission statements. Frontiers reserves the right to guide an out-of-scope manuscript to a more suitable section or journal at any stage of peer review.