AUTHOR=Shaver Amanda O. , Garcia Brianna M. , Gouveia Goncalo J. , Morse Alison M. , Liu Zihao , Asef Carter K. , Borges Ricardo M. , Leach Franklin E. , Andersen Erik C. , Amster I. Jonathan , Fernández Facundo M. , Edison Arthur S. , McIntyre Lauren M.
TITLE=An anchored experimental design and meta-analysis approach to address batch effects in large-scale metabolomics
JOURNAL=Frontiers in Molecular Biosciences
VOLUME=9
YEAR=2022
URL=https://www.frontiersin.org/journals/molecular-biosciences/articles/10.3389/fmolb.2022.930204
DOI=10.3389/fmolb.2022.930204
ISSN=2296-889X
ABSTRACT=
Untargeted metabolomics studies are unbiased but identifying the same feature across studies is complicated by environmental variation, batch effects, and instrument variability. Ideally, several studies that assay the same set of metabolic features would be used to select recurring features to pursue for identification. Here, we developed an anchored experimental design. This generalizable approach enabled us to integrate three genetic studies consisting of 14 test strains of Caenorhabditis elegans prior to the compound identification process. An anchor strain, PD1074, was included in every sample collection, resulting in a large set of biological replicates of a genetically identical strain that anchored each study. This enables us to estimate treatment effects within each batch and apply straightforward meta-analytic approaches to combine treatment effects across batches without the need for estimation of batch effects and complex normalization strategies. We collected 104 test samples for three genetic studies across six batches to produce five analytical datasets from two complementary technologies commonly used in untargeted metabolomics. Here, we use the model system C. elegans to demonstrate that an augmented design combined with experimental blocks and other metabolomic QC approaches can be used to anchor studies and enable comparisons of stable spectral features across time without the need for compound identification. This approach is generalizable to systems where the same genotype can be assayed in multiple environments and provides biologically relevant features for downstream compound identification efforts. All methods are included in the newest release of the publicly available SECIMTools based on the open-source Galaxy platform.