Data collection and transcriptome analysis of Triticum aestivum samples from Alabama revealed the presence of barley yellow dwarf virus

Livingston, Rachel M.; Mayfield, Michael; Zhao, Chaoyang; Strayer-Scherer, Amanda; Martin, Kathleen M.

doi:10.3389/fagro.2025.1721047

DATA REPORT article

Front. Agron., 08 December 2025

Sec. Disease Management

Volume 7 - 2025 | https://doi.org/10.3389/fagro.2025.1721047

Data collection and transcriptome analysis of Triticum aestivum samples from Alabama revealed the presence of barley yellow dwarf virus

Rachel M. Livingston¹

Michael Mayfield¹

Chaoyang Zhao²

Amanda Strayer-Scherer¹

Kathleen M. Martin^1*

¹Department of Entomology and Plant Pathology, Auburn University, Auburn, AL, United States
²National Soil Dynamic Laboratory, The United States Department of Agriculture – Agricultural Research Service (USDA-ARS), Auburn, AL, United States

Introduction

Winter wheat (Triticum aestivum L.) is an important crop in the Poaceae family, which includes other grasses such as barley, rye, and oats. In 2023, the United States was the fourth-largest producer of wheat globally, with wheat being the third most exported commodity, with a production value of $12.3 billion USD (FAOSTAT, 2023; USDAQuickstats, 2023). There are over 100 different wheat diseases caused by pathogens, with half of these being widely spread and resulting in significant damage and economic losses (Savary et al., 2019). An estimated 21.5% of global wheat production is lost annually due to these diseases (Savary et al., 2019). Diagnosing diseases in wheat can be complicated due to the wide range of pathogens and the different symptoms they cause, but it is essential for managing and preventing the spread of pathogens in fields.

Barley yellow dwarf (BYD) is caused by a complex of RNA viruses from the genera Luteovirus and Polerovirus (Walls et al., 2019). The luteoviruses that can cause BYD are barley yellow dwarf virus species PAV, PAS, SGV, MAV, kerII, and kerIII (BYDV; Luteovirus pavhordei, Luteovirus pashordei, Luteovirus sgvhordei Miller, Luteovirus mavhordei, Luteovirus kerbihordei, Luteovirus kertrihordei). The poleroviruses that cause BYD are cereal yellow dwarf virus-RPV and RPS (CYDV; Polerovirus CYDVRPV, Polerovirus CYDVRPS), and maize yellow dwarf virus-RMV (Polerovirus MYDVRMV) (Walls et al., 2019). The BYDV-GAV serotype is also commonly associated with BYD but has not been approved as an official species (Domier, 2011). These viruses cause damage to grasses worldwide, and symptoms may include chlorosis, red or purple discoloration, stunting, and yield loss (D’Arcy, 1995). These symptoms resemble typical nutritional deficiencies, making visual disease diagnosis challenging. BYDV was identified in Alabama more than 20 years ago, but no studies on the virus’s progression have been conducted since then (Bowen et al., 2003). No statewide incidence surveys of the virus in wheat have been published in recent years, although follow-up studies in the region have focused on aphid vectors and alternative grass hosts associated with BYDV and CYDV transmission (Hadi et al., 2011). Considering this gap, the present study aimed to determine whether BYDV is still present in Alabama wheat fields and document other pathogens that may produce symptoms resembling BYD. Field samples exhibiting BYD-like symptoms were collected, and RNA sequencing was performed. These datasets provide a resource for determining viral presence, broader microbial community impact, and host-pathogen interactions in symptomatic wheat cultivars in Alabama.

Methods

Sample collection and RNA extraction

In April 2023, four winter wheat leaf samples were collected from mature plants at harvest-ready growth stages in Alabama. The samples represented four different cultivars—GA 161, GA GORE, 26R33, and AGS4043—and were taken from plants displaying symptoms consistent with BYD infection, including yellowing and purple streaking of leaf tissue. To preserve RNA integrity, all samples were stored at -80°C until RNA extractions were performed. Total RNA was extracted from 100 mg of symptomatic leaf tissue using the IBI Mini Total RNA Kit for Plants (IBI Scientific, Dubuque, IA, USA). To enrich viral RNA, tissue sections were excised from regions showing visible symptoms. Samples were flash-frozen in liquid nitrogen and ground with a mortar and pestle prior to extraction. The manufacturer’s protocol was followed, including an on-column DNase I digestion step to eliminate residual genomic DNA. Concentrations of RNA samples were checked with the NanoPhotometer^® NP80 (Implemen) to ensure good quality (Table 1, row 2). Purified RNA was stored at −80°C until sequencing analysis.

Table 1

Table 1. Metadata information for each wheat cultivar dataset.

Library preparation and sequencing

For sequencing, total RNA was ribo-depleted following the manufacturer’s guidelines using the QIAseq FastSelect-rRNA Plant Kit (QIAGEN). Concentration and quality of samples were checked again as previously described (Table 1, row 3). Ribo-depleted RNA samples were submitted to the NC State Genomics Sciences Laboratory (Raleigh, NC, USA) for library preparation and sequencing. Libraries were constructed with the NEBNext^® Ultra™ II Directional RNA Library Prep Kit for Illumina^® (New England Biolabs) and sequenced on a NovaSeq 6000 platform using an S4 flow cell with 150 bp paired-end reads.

Taxonomic analysis of raw reads with Kraken2

Raw sequencing reads were processed with Trimmomatic (v0.39) (Bolger et al., 2014) to remove adapters and low-quality bases using the parameters LEADING:3, TRAILING:3, SLIDINGWINDOW:4:15, and MINLEN:36. Read quality was assessed with FastQC (v0.12.1) using default parameters before and after Trimmomatic to determine dataset quality improvement (Andrews, 2010). To perform taxonomic classification of the datasets, Kraken2 indexes were obtained from the Ben Langmead GitHub repository at https://benlangmead.github.io/aws-indexes/k2. Taxonomic classification was performed with Kraken2 (v2.0.8-beta) using the PlusPF-16 index (v2025-04-02) with a confidence threshold of 0.1 and minimum base quality of 20, in addition to default settings (Wood et al., 2019). The Kraken2 output was manipulated to have separate output files for the classified and unclassified reads. To increase the sensitivity of viral detection, Kraken2 was also run with the Viral RefSeq index (v2025-10-15) under the same parameters. Kraken2 reports were parsed and filtered with Python (v3.12.3) codes, and taxonomic distributions were visualized as bar graphs. For samples with less than ten reads that were classified as Luteovirus or BYDV, reads were extracted and run through the NCBI blastn webserver to confirm their taxonomic status. Reads that remained unclassified after comparison with the two indexes were run against a custom Kraken2 index constructed with the Triticum aestivum reference genome (GCF_018294505.1).

Taxonomic classification of contigs with BLAST+ and DIAMOND/MEGAN

After host and pathogen reads were removed with the T. aestivum and PlusPF-16 Kraken2 indexes, remaining unclassified reads were assembled into contigs using SPAdes (v3.15.5) with the default parameters (Prjibelski et al., 2020). Contigs were analyzed with coverM (v0.7.0) using the reads per base analysis method to determine contig depth (Aroney et al., 2025). The assembled contigs were then reanalyzed against the three previous Kraken2 indexes with the same parameters to remove sequences that may not have been classified initially due to quality thresholds. Remaining contigs ≥400 bp were queried against the BLAST+ core_nt database with default parameters and a tabular output format. Results were filtered with Python scripts to retain only the best match for each contig based on alignment length and percent identity.

Unclassified contigs that remained after BLAST filtering were subjected to taxonomic annotation using the DIAMOND/MEGAN workflow as outlined by Zhao et al. (2025). Contigs were searched against the NCBI non-redundant protein database (downloaded August 28, 2023) with DIAMOND (v2.1.8) in BLASTX mode (Buchfink et al., 2015). The resulting alignments were assigned to taxa using MEGAN (v6.25.3) with the megan-map-Feb2022.db in long-read mode through MEGANIZER (Huson et al., 2016). Final taxonomic assignments were inspected, and the taxonomic category “Virus” was exported using MEGAN’s interactive interface.

Data description

Sample sequencing and quality

Sequencing was performed on four wheat cultivars: GA 161, GA Gore, 26R33, and AGS4043. The raw reads are available on the NCBI repository with BioProject number PRJNA1250571, and the accession number for each dataset is listed in Table 1. For each dataset, the total number of raw reads before trimming and filtering was 47.2, 46, 40.8, and 38.9 million, respectively (Table 1, row 5). Before trimming and filtering, each dataset contained 7.1 Gbp (GA 161), 6.9 Gbp (GA Gore), 6.1 Gbp (26R33), and 5.8 Gbp (AGS4043). Per-base sequence quality remained above 20 for at least the first 130 bases in each dataset before Trimmomatic. Per base N content was negligible in each dataset before any filtering was applied. High sequence duplication was detected in all four datasets. After filtering, each dataset contained 5 Gbp, 4.9 Gbp, 4.2 Gbp, and 4.3 Gbp, respectively. No sequences were flagged as poor quality by FastQC before or after trimming and filtering. All samples were considered suitable for downstream analyses following Trimmomatic filtering. The total number of raw and filtered reads and respective GC contents are summarized in Table 1, rows 5-8.

Taxonomic annotation with Kraken2

The total number of reads classified by each Kraken2 index for each dataset are listed in Table 1, rows 9-11. Host-classified reads for each dataset were 61% (GA 161), 63% (GA Gore and 26R33), and 67% (AGS 4043) (Table 1, row 11). Taxonomic classifications are depicted for the top 20 genera from the PlusPF-16 index (Figures 1A, C, E, G) and top 10 species from the Viral RefSeq index (Figures 1B, D, F, H). Across all samples, bacteria and fungi accounted for the largest proportion of classified reads from the PlusPF-16 index (Table 1, row 9), while cultivar 26R33 additionally contained a high proportion of reads assigned to the viral genus Luteovirus (Figure 1E). The Viral RefSeq index identified at least one read in each dataset as BYDV. For cultivar GA 161, four reads were classified in the genus Luteovirus, with two reads classified as BYDV-PAV. For cultivar GA Gore, two reads were classified in the genus Luteovirus, with one of those reads classified as BYDV-PAV. Cultivar 26R33 had 53,673 reads classified as Luteovirus, with 51% of those reads being BYDV-PAV, 12% as BYDV-MAV, and 6% classified as BYDV-PAS (Figure 1F). This cultivar also contained 637 reads that were categorized as the unclassified Luteovirus BYDV-GAV. For cultivar AGS 4043, only one read was classified as BYDV-PAV. Since low coverage reads from next generation sequencing are typically considered artifacts, if was of interest to confirm the low coverage reads in three of the samples through additional methods (Schloss et al., 2011). All reads classified as Luteovirus and BYDV from GA 161, GA Gore, and AGS 4043 aligned to various BYDV-PAV genomes with percent identities between 84-100%. No reads were classified as poleroviruses in any dataset.

Figure 1

Bar charts display the top taxa at genus and species levels across four different cultivars: GA 161, GA Gore, 26R33, and AGS4043. Panels A, C, E, G show the top 20 taxa at the genus level, while B, D, F, H display the top 10 viral taxa at the species level. The charts use bars of varying lengths to represent the number of reads for each taxon, with color coding to denote different taxa. The x-axis indicates the number of reads on a logarithmic scale.

Figure 1. Kraken2 taxonomic classifications depicting the top 20 genus level taxa and top 10 species level virus taxa from each sample. The top taxa are defined by the taxa with the highest number of reads, which are shown in the common logarithm format. Output for each cultivar are represented in panels (A, B) (GA 161), panels (C, D) (GA Gore), panels (E, F) (26R33), and panels G and H (AGS 4043).

Taxonomic annotation with BLAST+ and DIAMOND/MEGAN

After Kraken2 taxonomic classification, the remaining unclassified reads were assembled into contigs to assess whether additional taxonomic assignments could be made. The total number of contigs generated per sample and contig quality are depicted in Table 1, rows 12-15. The proportion of contigs classified by Kraken2 was 85% (GA 161), 47% (GA Gore), 82% (26R33), and 57% (AGS4043). Analysis of these contigs against the previous Kraken2 databases yielded genera that were previously classified from the trimmed reads. The remaining contigs were queried against the BLAST+ core_nt database, which primarily identified additional host-derived sequences as well as bacterial and fungal taxa previously detected by the PlusPF-16 database (Table 1, row 17). Some sequences from each dataset aligned with non-plant viral reads, such as coronaviruses, which may be a result of sequencing classification artifacts. Further analysis and classification index updates may resolve these artifacts.

Contigs that remained unclassified after BLAST+ annotation were further analyzed with DIAMOND and MEGAN. For GA 161, classifications included fungi and Viridiplantae, with 24 contigs remaining unclassified (Table 1, column 2, rows 18-19). In GA Gore, contigs were assigned as bacteria, fungi, Heunggongvirae, and Viridiplantae, leaving 85 contigs unclassified (Table 1, column 3, rows 18-19). For 26R33, contigs were classified as metazoan, Orthornavirae, and Viridiplantae, with 25 contigs remaining unclassified (Table 1, column 4, rows 18-19). In AGS4043, classifications included bacteria, fungi, and Viridiplantae, leaving 73 contigs unclassified (Table 1, column 5, rows 18-19). Notably, one contig from the 26R33 cultivar was classified at the species level as BYDV-PAV.

Value of the data

These datasets represent the first publicly available RNA sequences of wheat exhibiting BYD-like symptoms from Alabama. These datasets can be used for wheat genome annotation and population genetic analyses across diverse cultivars. The detection of multiple species of BYDV in varying levels in all four samples underscores the potential of these datasets for investigating viral coinfection dynamics and host-pathogen interactions involving BYDV. These datasets may also enable variant calling and SNP analysis of BYDV species present in Alabama. The deposited data may also allow for investigation into the microbial community and the potential of co-infections in symptom development. Cultivar-specific transcriptome data can be used to explore host-pathogen interactions and differential gene expression. The data presented may also be informative for the design of bioinformatic pipelines for plant virome analyses.

Data availability statement

The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below: https://www.ncbi.nlm.nih.gov/genbank/, PRJNA1250571.

Author contributions

RL: Writing – original draft, Data curation, Investigation, Writing – review & editing. MM: Writing – review & editing, Methodology, Resources. CZ: Writing – review & editing, Formal Analysis, Conceptualization. AS-S: Investigation, Writing – review & editing, Resources. KM: Methodology, Conceptualization, Writing – review & editing, Funding acquisition.

Funding

The author(s) declared that financial support was received for this work and/or its publication. Department of Entomology and Plant Pathology at Auburn University.

Acknowledgments

We would like to thank Dr. Neha Potnis for her technical support and advice in preparing this report.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declare that Generative AI was used in the creation of this manuscript. Grammar correction, assistance in generation of codes to visualize the data.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

Andrews S. (2010). FastQC: A Quality Control Tool for High Throughput Sequence Data. Available online at: https://github.com/s-andrews/FastQC (Accessed November 17, 2025).

Google Scholar

Aroney S. T. N., Newell R. J. P., Nissen J. N., Camargo A. P., Tyson G. W., and Woodcroft B. J. (2025). CoverM: read alignment statistics for metagenomics. Bioinformatics 41. doi: 10.1093/bioinformatics/btaf147

PubMed Abstract | Crossref Full Text | Google Scholar

Bolger A. M., Lohse M., and Usadel B. (2014). Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120. doi: 10.1093/bioinformatics/btu170

PubMed Abstract | Crossref Full Text | Google Scholar

Bowen K. L., Murphy J. F., Flanders K. L., Mask P. L., and Li R. (2003). Incidence of viruses infecting winter wheat in alabama. Plant Dis. 87, 288–293. doi: 10.1094/pdis.2003.87.3.288

PubMed Abstract | Crossref Full Text | Google Scholar

Buchfink B., Xie C., and Huson D. H. (2015). Fast and sensitive protein alignment using DIAMOND. Nat. Methods 12, 59–60. doi: 10.1038/nmeth.3176

PubMed Abstract | Crossref Full Text | Google Scholar

D’Arcy C. J. (1995). “Symptomology and host range of barley yellow dwarf,” in Barley yellow dwarf: 40 years of progress (Minnesota, USA: American Phytopathological Society), 9–28.

Google Scholar

Domier L. L. (2011). Family: Luteoviridae (International Committee on Taxonomy of Viruses).

Google Scholar

FAOSTAT (2023). Countries by Commodity. Available online at: https://www.fao.org/faostat/en/rankings/countries_by_commodity (Accessed October 6, 2025).

Google Scholar

Hadi B. A., Flanders K. L., Bowen K. I., Murphy J. F., and Halbert S. E. (2011). Species composition of aphid vectors (Hemiptera: Aphididae) of barley yellow dwarf virus and cereal yellow dwarf virus in Alabama and western Florida. J. Econ Entomol 104, 1167–1173. doi: 10.1603/ec10425

PubMed Abstract | Crossref Full Text | Google Scholar

Huson D. H., Beier S., Flade I., Górska A., El-Hadidi M., Mitra S., et al. (2016). MEGAN community edition - interactive exploration and analysis of large-scale microbiome sequencing data. PloS Comput. Biol. 12, e1004957. doi: 10.1371/journal.pcbi.1004957

PubMed Abstract | Crossref Full Text | Google Scholar

Prjibelski A., Antipov D., Meleshko D., Lapidus A., and Korobeynikov A. (2020). Using SPAdes de novo assembler. Curr. Protoc. Bioinf. 70, e102. doi: 10.1002/cpbi.102

PubMed Abstract | Crossref Full Text | Google Scholar

Savary S., Willocquet L., Pethybridge S. J., Esker P., McRoberts N., and Nelson A. (2019). The global burden of pathogens and pests on major food crops. Nat. Ecol. Evol. 3, 430–439. doi: 10.1038/s41559-018-0793-y

PubMed Abstract | Crossref Full Text | Google Scholar

Schloss P. D., Gevers D., and Westcott S. L. (2011). Reducing the effects of PCR amplification and sequencing artifacts on 16S rRNA-based studies. PloS One 6, e27310. doi: 10.1371/journal.pone.0027310

PubMed Abstract | Crossref Full Text | Google Scholar

USDAQuickstats (2023). National statistics for wheat. Available online at: https://www.nass.usda.gov/Statistics_by_Subject/result.php?802D3389-EF8B-3115-BE2A-6D69D86F9B23&sector=CROPS&group=FIELD%20CROPS&comm=WHEAT (Accessed October 6, 2025).

Google Scholar

Walls J., Rajotte E., and Rosa C. (2019). The past, present, and future of barley yellow dwarf management. Agriculture 9, 23. Available online at: https://www.mdpi.com/2077-0472/9/1/23 (Accessed September 10, 2025).

Google Scholar

Wood D. E., Lu J., and Langmead B. (2019). Improved metagenomic analysis with Kraken 2. Genome Biol. 20, 257. doi: 10.1186/s13059-019-1891-0

PubMed Abstract | Crossref Full Text | Google Scholar

Zhao C., Escalante C., Jacobson A. L., Balkcom K. S., Conner K. N., and Martin K. M. (2025). Metatranscriptomic and metagenomic analyses of cotton aphids (Aphis gossypii) collected from cotton fields in Alabama, USA. Front. Insect Sci. 5. doi: 10.3389/finsc.2025.1461588

PubMed Abstract | Crossref Full Text | Google Scholar

Keywords: wheat, barley yellow dwarf, Luteovirus, bioinformatics, RNAseq

Citation: Livingston RM, Mayfield M, Zhao C, Strayer-Scherer A and Martin KM (2025) Data collection and transcriptome analysis of Triticum aestivum samples from Alabama revealed the presence of barley yellow dwarf virus. Front. Agron. 7:1721047. doi: 10.3389/fagro.2025.1721047

Received: 08 October 2025; Accepted: 21 November 2025; Revised: 20 November 2025;
Published: 08 December 2025.

Edited by:

Humberto J. Debat, Instituto Nacional de Tecnología Agropecuaria, Argentina

Reviewed by:

Osamah Alisawi, University of Kufa, Iraq
Zohreh Moradi, Sari Agricultural Sciences and Natural Resources University, Iran

Copyright © 2025 Livingston, Mayfield, Zhao, Strayer-Scherer and Martin. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Kathleen M. Martin, a21tMDE3M0BhdWJ1cm4uZWR1

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.