Illumina Sequencing of Common (Short) Ragweed (Ambrosia artemisiifolia L.) Reproductive Organs and Leaves

Ambrosia artemisiifolia L. (A. artemisiifolia, common ragweed) is one of the most aggressive, rapidly spreading and highly allergenic weeds found in many agricultural settings in the temperate zone. Chemical control of ragweed has some limitation in some crops, therefore it may cause reduced yields both in Europe and North America. The most affected are maize, sunflower, soya bean and pea in Europe (Chollet et al., 1999; Konstantinovic et al., 2005; Chauvel et al., 2006) and grains, tobacco and root crops in North America (Bassett and Crompton, 1975). Pollen allergy of common ragweed affects the human population's quality of life. Its pollen allergens are considered to be major elicitors of type I allergy during late summer and fall inducing respiratory distress such as allergenic rhinitis and seasonal asthma but it is also linked to eczema, ear infections in children and sinusitis (bacterial infection of the sinuses) in adults (Dykewicz, 2003). The spread of this weed therefore causes severe agricultural and public health problems that are important to be solved globally. 
 
For better understanding the genetic regulation of the common ragweed reproduction biology we sequenced the mRNA of flower tissues and leaves of different developmental stages using the Illumina platform. To this end different gender flowers, of this monoecious, dicotyledonous invasive weed were collected from a natural Ambrosia population of a highly infested West-Transdanubian region in Hungary. 
 
The sequence data were assembled de novo to create a reference transcriptome for our future work for this species. Raw reads of the transcriptome assembly have been deposited to NCBI's Sequence Read Archive (SRA) database with the accession numbers SRR3995704 (male flower), SRR3995703 (female flower), and SRR3995705 (leaves). SRA accession: SRP08007. Bioproject ID: PRJNA335689. The Transcriptome Shotgun Assembly project has been deposited at DDBJ/ENA/GenBank under the accession {"type":"entrez-nucleotide","attrs":{"text":"GEZL00000000","term_id":"1068530956","term_text":"GEZL00000000"}}GEZL00000000. The version described in this paper is the first version, {"type":"entrez-nucleotide","attrs":{"text":"GEZL01000000","term_id":"1068530956","term_text":"gb||GEZL01000000"}}GEZL01000000. 
 
The presented data were used for the first time to determine the complete coding sequence and putative signal peptide of an Amb a 3 allergen isoform of A. artemisiifolia recently (Taller et al., 2016).


INTRODUCTION
Ambrosia artemisiifolia L. (A. artemisiifolia, common ragweed) is one of the most aggressive, rapidly spreading and highly allergenic weeds found in many agricultural settings in the temperate zone. Chemical control of ragweed has some limitation in some crops, therefore it may cause reduced yields both in Europe and North America. The most affected are maize, sunflower, soya bean and pea in Europe (Chollet et al., 1999;Konstantinovic et al., 2005;Chauvel et al., 2006) and grains, tobacco and root crops in North America (Bassett and Crompton, 1975). Pollen allergy of common ragweed affects the human population's quality of life. Its pollen allergens are considered to be major elicitors of type I allergy during late summer and fall inducing respiratory distress such as allergenic rhinitis and seasonal asthma but it is also linked to eczema, ear infections in children and sinusitis (bacterial infection of the sinuses) in adults (Dykewicz, 2003). The spread of this weed therefore causes severe agricultural and public health problems that are important to be solved globally.
For better understanding the genetic regulation of the common ragweed reproduction biology we sequenced the mRNA of flower tissues and leaves of different developmental stages using the Illumina platform. To this end different gender flowers, of this monoecious, dicotyledonous invasive weed were collected from a natural Ambrosia population of a highly infested West-Transdanubian region in Hungary.
The sequence data were assembled de novo to create a reference transcriptome for our future work for this species. Raw reads of the transcriptome assembly have been deposited to NCBI's Sequence Read Archive (SRA) database with the accession numbers SRR3995704 (male flower), SRR3995703 (female flower), and SRR3995705 (leaves). SRA accession: SRP08007. Bioproject ID: PRJNA335689. The Transcriptome Shotgun Assembly project has been deposited at DDBJ/ENA/GenBank under the accession GEZL00000000. The version described in this paper is the first version, GEZL01000000.
The presented data were used for the first time to determine the complete coding sequence and putative signal peptide of an Amb a 3 allergen isoform of A. artemisiifolia recently (Taller et al., 2016).

Plant Materials
Three sample types, such as male (♂) and female (♀) flowers from initial to final developmental stages and leaves of different developmental stage and position of A. artemisiifolia plants were collected from beside an agricultural field in West-Transdanubian region of Hungary (46 • 44 ′ 55.4 ′′ N, 17 • 14 ′ 20.1 ′′ E) during the whole flowering period from middle of July to the end of August. Samples were frozen immediately in liquid nitrogen and stored at −80 • C until further use. The highly pure total RNA from plant tissues were isolated using TaKaRa Plant RNA Extraction Kit according to manufacturer's instructions (Takara Bio Inc; Japan). Purity and concentration of all RNA samples were quantified spectrophotometrically using Agilent 2100 Bioanalyzer (Agilent Technologies; USA).

Enrichment of mRNA, cDNA Synthesis and Library Preparation for Illumina HiSeq Paired-End Sequencing
For poly-A based mRNA enrichment and cDNA synthesis the Illumina TruSeq TM RNA sample preparation kit (Low-Throughput protocol) was used according to manufacturer's instructions. Briefly, 1.5 µg of total RNA sample of male, female and leaf tissues were used for poly-A mRNA selection using streptavidin-coated magnetic beads. Two rounds of enrichment for poly-A mRNA was performed followed by thermal mRNA fragmentation. The cDNA was synthesized from enriched and chemically fragmented RNA using reverse transcriptase (Super-Script II) and random primers. cDNA was converted into double stranded (ds) DNA using the reagents supplied in the kit and ∼0.5-1 µg from each sample were used for the library preparation. The RNA-Seq was performed using Illumina HiSeq2000 system. The hybridization onto a flow cell, the dsDNA fragments were blunt-ended through an end-repair reaction and both ends were ligated to platform-specific double-stranded barcoded adapters. The samples were run in one lane using multiple indexing adapters. For library amplification an adapter-selective PCR reaction was performed. In order to avoid the skewing of the library representation the number of PCR cycles was minimized to 15. The optimum cluster density of libraries was created by qPCR according to qPCR Quantification Protocol Guide. The size and purity of the samples were checked by Agilent 2100 Bioanalyzer (Agilent Technologies, USA). The PCR products were at 260 bp, approximately. The DNA libraries were multiplexed normalizing them to 10 nM.

De Novo Assembling and Analysis of High Throughput Sequencing Data
Quality control and preprocessing of the 2 × 100 bp raw reads was done with FastQC (Andrews, 2010) and are summarized in Table 1. After quality trimming at a Phred score ≥ 28 a de novo assembly of the combined three transcriptome sequencing datasets was performed using Trinity (Haas et al., 2013) with 25 k-mer size. The resulted contigs of the assembly were used later as the reference transcriptome of the common ragweed. The statistics of the assembly are summarized in Table 2. The reads from the samples were then mapped separately back to reference using the short read aligner program Bowtie (Langmead et al., 2009). The transcript abundances were estimated using the RSEM program (http://deweylab.github.io/RSEM/) with the scripts provided in the Trinity package. A collection of 162494 unigenes (average length of 391 bp) was generated and their annotation

DIRECT LINK TO DEPOSITED DATA AND INFORMATION TO USERS
The raw reads are available in Fastq format at the following link http://www.ncbi.nlm.nih.gov/sra/SRP080078. The transcriptome shotgun assembly is available in FASTA format at the following link http://www.ncbi.nlm.nih.gov/ nuccore/GEZL00000000. Users can download and use the data freely for research purpose only with quoting this paper as reference to the data.

AUTHOR CONTRIBUTIONS
EV: wrote the manuscript, performed cDNA library preparation, bioinformatics and analysis on the data; EB: performed transcriptome assembling and bioinformatics; GH: performed statistical analysis and informatics operations; EN, BK, and KM: collected the samples and performed RNA isolation; JT: supervised the project and acquired funding.