TECHNOLOGY AND CODE article
Front. Genet.
Sec. Statistical Genetics and Methodology
Automatic detection of n-degree family members
Provisionally accepted- 1National Centre for Register-based Research, Aarhus Universitet, Risskov, Denmark
- 2Mental Health Center - Sct Hans, Institute for Biological Psychiatry, Roskilde, Denmark
- 3Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, Aarhus, Denmark
- 4Aarhus Universitet Bioinformatics Research Center, Aarhus, Denmark
- 5Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, the Broad Institute of MIT and Harvard, Massachusetts, United States
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
Summary Family-based genetic studies often require identification of relatives up to a specified degree, but existing tools are either restricted to second-degree relatives, return entire connected pedigrees, or require multiple pre-or post-processing steps. We implemented three new functions, prepare_graph, get_kinship, and graph_to_trio, in the R package LTFHPlus to address these limitations. prepare_graph constructs a directed graph from population-level trio data using the igraph package and supports attaching additional attributes to individuals. From this graph, relatives of arbitrary degree can be identified efficiently. get_kinship calculates a kinship matrix for all individuals in a (sub)graph, and graph_to_trio reconstructs trio information from identified families, enabling downstream use with other pedigree tools. In addition, familial relations can be labelled from the graph with the function get_relatives and the total and average of each relation per proband can be plotted with Relationship_per_proband_plot. Using the publicly available minnbreast dataset, we constructed a graph containing 28,081 individuals and 30,720 familial edges. Across 1,000 repetitions, the median run-time for identifying all relatives up to third degree for 500 randomly selected probands was 0.03 seconds, and kinship matrix calculation had a median run-time of 1.57 seconds (single-threaded execution). These functions provide a reproducible, scalable, and interoperable solution for integrating family information into genetic analyses.
Keywords: pedigree analysis, kinship matrix, graph theory, trio data, Family-based studies,, genetic epidemiology, R package
Received: 18 Sep 2025; Accepted: 24 Nov 2025.
Copyright: © 2025 Pedersen, Steinbach, Pedersen, Schork, Krebs, Vilhjálmsson and Privé. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence: Emil Michael Pedersen, emp@au.dk
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.
