DATA REPORT article
Front. Agron.
Sec. Disease Management
Data collection and transcriptome analysis of Triticum aestivum samples from Alabama revealed the presence of barley yellow dwarf virus
Provisionally accepted- 1Auburn University, Auburn, United States
- 2USDA Agricultural Research Service National Soil Dynamics Laboratory, Auburn, United States
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
Winter wheat (Triticum aestivum L.) is an important crop in the Poaceae family, which includes other grasses such as barley, rye, and oats. In 2023, the United States was the fourthlargest producer of wheat globally, with wheat being the third most exported commodity, with a production value of $12.3 billion USD (FAOSTAT, 2023;USDAQuickstats, 2023). There are over 100 different wheat diseases caused by pathogens, with half of these being widely spread and resulting in significant damage and economic losses (Savary et al., 2019). An estimated 21.5% of global wheat production is lost annually due to these diseases (Savary et al., 2019). Diagnosing diseases in wheat can be complicated due to the wide range of pathogens and the different symptoms they cause, but it is essential for managing and preventing the spread of pathogens in fields.Barley yellow dwarf (BYD) is caused by a complex of RNA viruses from the genera Luteovirus and Polerovirus (Walls et al., 2019). The luteoviruses that can cause BYD are barley yellow dwarf virus species PAV, PAS, SGV, MAV, kerII, and kerIII (BYDV; Luteovirus pavhordei, Luteovirus pashordei, Luteovirus sgvhordei Miller, Luteovirus mavhordei, Luteovirus kerbihordei, Luteovirus kertrihordei). The poleroviruses that cause BYD are cereal yellow dwarf virus-RPV and RPS (CYDV; Polerovirus CYDVRPV, Polerovirus CYDVRPS), and maize yellow dwarf virus-RMV (Polerovirus MYDVRMV) (Walls et al., 2019). The BYDV-GAV serotype is also commonly associated with BYD but has not been approved as an official species (Domier, 2011). These viruses cause damage to grasses worldwide, and symptoms may include chlorosis, red or purple discoloration, stunting, and yield loss (D'Arcy, 1995). These symptoms resemble typical nutritional deficiencies, making visual disease diagnosis challenging. BYDV was identified in Alabama more Given that all four samples exhibit suspected BYDV symptoms, I recommend clearly stating whether BYDV is present in each sample based on the sequencing data. Ideally, the presence or absence of BYDV should guide how each sample is described in the results. You might not have time to make big changes, so probably just leave it to reviewers. Also, I feel too much analysis was done for a data report paper. I would just say we got 4 samples --> the virus is found in X of them --> here is a brief taxonomic overview of the read data. Try to keep the structure simple.than 20 years ago, but no studies on the virus's progression have been conducted since then (Bowen et al., 2003). No statewide incidence surveys of the virus in wheat have been published in recent years, although follow-up studies in the region have focused on aphid vectors and alternative grass hosts associated with BYDV and CYDV transmission (Hadi et al., 2011). Considering this gap, the present study aimed to determine whether BYDV is still present in Alabama wheat fields and document other pathogens that may produce symptoms resembling BYD. Field samples exhibiting BYD-like symptoms were collected, and RNA sequencing was performed. These datasets provide a resource for determining viral presence, broader microbial community impact, and host-pathogen interactions in symptomatic wheat cultivars in Alabama. In April 2023, four winter wheat leaf samples were collected from mature plants at harvestready growth stages in Alabama. The samples represented four different cultivars-GA 161, GA GORE, 26R33, and AGS4043-and were taken from plants displaying symptoms consistent with BYD infection, including yellowing and purple streaking of leaf tissue. To preserve RNA integrity, all samples were stored at -80°C until RNA extractions were performed. Total RNA was extracted from 100 mg of symptomatic leaf tissue using the IBI Mini Total RNA Kit for Plants (IBI Scientific, Dubuque, IA, USA). To enrich viral RNA, tissue sections were excised from regions showing visible symptoms. Samples were flash-frozen in liquid nitrogen and ground with a mortar and pestle prior to extraction. The manufacturer's protocol was followed, including an on-column DNase I digestion step to eliminate residual genomic DNA. Concentrations of RNA samples were checked with the NanoPhotometer® NP80 (Implemen) to ensure good quality (Table 1, row 2).Purified RNA was stored at -80 °C until sequencing analysis. For sequencing, total RNA was ribo-depleted following the manufacturer's guidelines using the QIAseq FastSelect-rRNA Plant Kit (QIAGEN). Concentration and quality of samples were checked again as previously described (Table 1, row 3). Ribo-depleted RNA samples were submitted to the NC State Genomics Sciences Laboratory (Raleigh, NC, USA) for library preparation and sequencing. Libraries were constructed with the NEBNext® Ultra™ II Directional Commented [CZ2]: Try to just describe how the analyses were done, and avoid mentioning any results, which should be described in the 'Data Description' section. For example, avoid referring to Table 1 or Figure 1. Also, lines 68, 74, and 91 are like 'results', not 'methods'. Raw sequencing reads were processed with Trimmomatic (v0.39) (Bolger et al., 2014) to remove adapters and low-quality bases using the parameters LEADING:3, TRAILING:3, SLIDINGWINDOW:4:15, and MINLEN:36. Read quality was assessed with FastQC (v0.12.1) using default parameters before and after Trimmomatic to determine dataset quality improvement (Andrews, 2010). To perform taxonomic classification of the datasets, Kraken2 indexes were obtained from the Ben Langmead GitHub repository at https://benlangmead.github.io/awsindexes/k2. Taxonomic classification was performed with Kraken2 (v2.0.8-beta) using the PlusPF-16 index (v2025-04-02) with a confidence threshold of 0.1 and minimum base quality of 20, in addition to default settings (Wood et al., 2019). The Kraken2 output was manipulated to have separate output files for the classified and unclassified reads. To increase the sensitivity of viral detection, Kraken2 was also run with the Viral RefSeq index (v2025-10-15) under the same parameters. Kraken2 reports were parsed and filtered with Python (v3.12.3) codes, and taxonomic distributions were visualized as bar graphs. For samples with less than ten reads that were classified as Luteovirus or BYDV, reads were extracted and run through the NCBI blastn webserver to confirm their taxonomic status. Reads that remained unclassified after comparison with the two indexes were run against a custom Kraken2 index constructed with the Triticum aestivum reference genome (GCF_018294505.1). After host and pathogen reads were removed with the T. aestivum and PlusPF-16 Kraken2 indexes, remaining unclassified reads were assembled into contigs using SPAdes (v3.15.5) with the default parameters (Prjibelski et al., 2020). Contigs were analyzed with coverM (v0.7.0) using the reads per base analysis method to determine contig depth (Aroney et al., 2025). The assembled contigs were then reanalyzed against the three previous Kraken2 indexes with the same parameters to remove sequences that may not have been classified initially due to quality thresholds.Remaining contigs ≥400 bp were queried against the BLAST+ core_nt database with default parameters and a tabular output format. Results were filtered with Python scripts to retain only the best match for each contig based on alignment length and percent identity.Unclassified contigs that remained after BLAST filtering were subjected to taxonomic annotation using the DIAMOND/MEGAN workflow as outlined by Zhao et al. (2025). Contigs were searched against the NCBI non-redundant protein database (downloaded August 28, 2023) with DIAMOND (v2.1.8) in BLASTX mode (Buchfink et al., 2015). The resulting alignments were assigned to taxa using MEGAN (v6.25.3) with the megan-map-Feb2022.db in long-read mode through MEGANIZER (Huson et al., 2016). Final taxonomic assignments were inspected, and the taxonomic category "Virus" was exported using MEGAN's interactive interface. Sequencing was performed on four wheat cultivars: GA 161, GA Gore, 26R33, and AGS4043. The raw reads are available on the NCBI repository with BioProject number PRJNA1250571, and the accession number for each dataset is listed in Table 1. For each dataset, the total number of raw reads before trimming and filtering was 47.2, 46, 40.8, and 38.9 million, respectively (Table 1, row 5). Before trimming and filtering, each dataset contained 7.1 Gbp (GA 161), 6.9 Gbp (GA Gore), 6.1 Gbp (26R33), and 5.8 Gbp (AGS4043). Per-base sequence quality remained above 20 for at least the first 130 bases in each dataset before Trimmomatic. Per base N content was negligible in each dataset before any filtering was applied. High sequence duplication was detected in all four datasets. After filtering, each dataset contained 5 Gbp, 4.9 Gbp, 4.2 Gbp, and 4.3 Gbp, respectively. No sequences were flagged as poor quality by FastQC before or after trimming and filtering. All samples were considered suitable for downstream analyses following Trimmomatic filtering. The total number of raw and filtered reads and respective GC contents are summarized in Table 1, rows 5-8. The total number of reads classified by each Kraken2 index for each dataset are listed in Table 1, rows 9-11. Host-classified reads for each dataset were 61% (GA 161), 63% (GA Gore and 26R33), and 67% (AGS 4043) (Table 1, row 11). Taxonomic classifications are depicted for the top 20 genera from the PlusPF-16 index (Figure 1,Panels A,C,E,G) and top 10 species from the Viral RefSeq index (Figure 1,Panels B,D,F,H). Across all samples, bacteria and fungi accounted for the largest proportion of classified reads from the PlusPF-16 index (Table 1, row 9), while cultivar 26R33 additionally contained a high proportion of reads assigned to the viral Commented [CZ3]: This is one of the most important sections in a data report paper. I would give more detailed description on your seq data rather than just referring to Table 1.genus Luteovirus (Figure 1, panel E). The Viral RefSeq index identified at least one read in each dataset as BYDV. For cultivar GA 161, four reads were classified in the genus Luteovirus, with two reads classified as BYDV-PAV. For cultivar GA Gore, two reads were classified in the genus Luteovirus, with one of those reads classified as BYDV-PAV. Cultivar 26R33 had 53,673 reads classified as Luteovirus, with 51% of those reads being BYDV-PAV, 12% as BYDV-MAV, and 6% classified as BYDV-PAS (Figure 1,panel F). This cultivar also contained 637 reads that were categorized as the unclassified Luteovirus BYDV-GAV. For cultivar AGS 4043, only one read was classified as BYDV-PAV. Since low coverage reads from next generation sequencing are typically considered artifacts, if was of interest to confirm the low coverage reads in three of the samples through additional methods (Schloss et al., 2011). All reads classified as Luteovirus and BYDV from GA 161, GA Gore, and AGS 4043 aligned to various BYDV-PAV genomes with percent identities between 84-100%. No reads were classified as poleroviruses in any dataset. After Kraken2 taxonomic classification, the remaining unclassified reads were assembled into contigs to assess whether additional taxonomic assignments could be made. The total number of contigs generated per sample and contig quality are depicted in Table 1, rows 12-15. The proportion of contigs classified by Kraken2 was 85% (GA 161), 47% (GA Gore), 82% (26R33), and 57% (AGS4043). Analysis of these contigs against the previous Kraken2 databases yielded genera that were previously classified from the trimmed reads. The remaining contigs were queried against the BLAST+ core_nt database, which primarily identified additional host-derived sequences as well as bacterial and fungal taxa previously detected by the PlusPF-16 database (Table 1, row 17). Some sequences from each dataset aligned with non-plant viral reads, such as coronaviruses, which may be a result of sequencing classification artifacts. Further analysis and classification index updates may resolve these artifacts.Contigs that remained unclassified after BLAST+ annotation were further analyzed with DIAMOND and MEGAN. For GA 161, classifications included fungi and Viridiplantae, with 24 contigs remaining unclassified ( 1,column 5,). Notably, one contig from the 26R33 cultivar was classified at the species level as BYDV-PAV. These datasets represent the first publicly available RNA sequences of wheat exhibiting BYD-like symptoms from Alabama. These datasets can be used for wheat genome annotation and population genetic analyses across diverse cultivars. The detection of multiple species of BYDV in varying levels in all four samples underscores the potential of these datasets for investigating viral coinfection dynamics and host-pathogen interactions involving BYDV. These datasets may also enable variant calling and SNP analysis of BYDV species present in Alabama. The deposited data may also allow for investigation into the microbial community and the potential of coinfections in symptom development. Cultivar-specific transcriptome data can be used to explore host-pathogen interactions and differential gene expression. The data presented may also be informative for the design of bioinformatic pipelines for plant virome analyses.
Keywords: wheat, Barley yellow dwarf, Luteovirus, bioinformatics, RNAseq
Received: 08 Oct 2025; Accepted: 21 Nov 2025.
Copyright: © 2025 Livingston, Mayfield, Zhao, Strayer-Scherer and Martin. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence: Kathleen M Martin, kmm0173@auburn.edu
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.
