In genetic and genomic studies, it is often difficult to access and obtain individual-level data such as genotypes and trait values for each participant. The reasons include privacy and consent issues, the considerable technical difficulties of data transfer, storage and harmonization, and other logistic considerations. To overcome such limitations, it has been a common practice to share summary level data across difficult cohorts and genetic communities. For example, in genome-wide association studies, summary statistics for individual genetic variants, including effect size estimates and their standardized errors, are readily available for a wide range of phenotypes at databases such as GWAS Catalog.
As large numbers of genetic and genomic data, including multi-omics data, whole-genome, and whole-exome sequencing data, in national and institutional biobanks become available, it is important to jointly analyze them at scale to improve the generalizability of genetic discoveries. The summary level data-based approaches, including meta-analysis methods and federated learning methods, provided an attractive solution to leverage large sample sizes to discover the genetic basis of human disease or traits and address biobank data privacy and consent concerns. However, these summary level data-based approaches face many challenges, for example in overcoming computational scalability issues that arise with hundreds of millions of variants. These methods also have limited abilities to account for relatedness and population structure. Thus, there is a pressing need to develop powerful, scalable, and resource-efficient methods to study the impact of genetic variants on diseases and traits, risk prediction, and the causal effects of biomarkers on diseases by leveraging summary-level data of large-scale genetic studies and biobanks.
This Research Topic is inclusive of both novel methods and novel applications for summary statistics in genetics research. Potential topics include, but are not restricted to:
• Integration of summary statistics across multiple phenotypes to better understand the etiology of related diseases.
• Utilizing summary statistics in causal inference approaches, e.g. to perform mediation analysis.
• Adapting summary statistic methodology developed for GWAS arrays to sequencing studies, for example by accounting for lack of normality when summary statistics are computed using rare variants.
• Fine-mapping approaches using summary statistics.
• Methods for creating polygenic risk scores with summary statistics.
• Development of biological network models using summary statistics.
• Application of summary statistics from understudied ethnic groups to better understand racial and ethnic differences in disease.
• Application of summary statistics from massive modern datasets to uncover new genetic risk factors for complex phenotypes.
• Tools or pipelines to automatically generate summary statistics from contemporary genetic compendiums with large numbers of outcomes.
In genetic and genomic studies, it is often difficult to access and obtain individual-level data such as genotypes and trait values for each participant. The reasons include privacy and consent issues, the considerable technical difficulties of data transfer, storage and harmonization, and other logistic considerations. To overcome such limitations, it has been a common practice to share summary level data across difficult cohorts and genetic communities. For example, in genome-wide association studies, summary statistics for individual genetic variants, including effect size estimates and their standardized errors, are readily available for a wide range of phenotypes at databases such as GWAS Catalog.
As large numbers of genetic and genomic data, including multi-omics data, whole-genome, and whole-exome sequencing data, in national and institutional biobanks become available, it is important to jointly analyze them at scale to improve the generalizability of genetic discoveries. The summary level data-based approaches, including meta-analysis methods and federated learning methods, provided an attractive solution to leverage large sample sizes to discover the genetic basis of human disease or traits and address biobank data privacy and consent concerns. However, these summary level data-based approaches face many challenges, for example in overcoming computational scalability issues that arise with hundreds of millions of variants. These methods also have limited abilities to account for relatedness and population structure. Thus, there is a pressing need to develop powerful, scalable, and resource-efficient methods to study the impact of genetic variants on diseases and traits, risk prediction, and the causal effects of biomarkers on diseases by leveraging summary-level data of large-scale genetic studies and biobanks.
This Research Topic is inclusive of both novel methods and novel applications for summary statistics in genetics research. Potential topics include, but are not restricted to:
• Integration of summary statistics across multiple phenotypes to better understand the etiology of related diseases.
• Utilizing summary statistics in causal inference approaches, e.g. to perform mediation analysis.
• Adapting summary statistic methodology developed for GWAS arrays to sequencing studies, for example by accounting for lack of normality when summary statistics are computed using rare variants.
• Fine-mapping approaches using summary statistics.
• Methods for creating polygenic risk scores with summary statistics.
• Development of biological network models using summary statistics.
• Application of summary statistics from understudied ethnic groups to better understand racial and ethnic differences in disease.
• Application of summary statistics from massive modern datasets to uncover new genetic risk factors for complex phenotypes.
• Tools or pipelines to automatically generate summary statistics from contemporary genetic compendiums with large numbers of outcomes.