Genomic Resource of Rice Seed Associated Bacteria

Plants are host to diverse microbiome that might have co-evolved since millions of years. This resident microbiota can act as extended genome by contributing in plant growth, development and protection from biotic and abiotic stresses. Rice (Oryza Sativa) is a staple food consumed by more than 50% of the world's population. Herein we targeted the bacterial community associated with the healthy rice seeds. In this direction, we isolated and carried out whole genome sequencing of 100 bacterial isolates. These isolates belong to three major bacterial phyla Proteobacteria, Firmicutes, and Actinobacteria that spread over 15 distinct genus and 29 species. A phylogenetic tree based on a broad set of phylogenomic marker genes confirmed the evolutionary relationship amongst the strains and their phylogenetic grouping. Average Nucleotide Identity was also used to establish species identity of isolates that form a particular phylogenetic and taxonomic grouping. The data generated from the present study is one of the first major genomic resources in the field of phytobiome research. Whole genome sequence of the members will be invaluable in this era of big data driven research. Moreover, the majority of genus and species identified in this study are already known for plant probiotic properties. This genomic data with annotation will aid in comparative, evolutionary and ecological studies of bacteria associated with plants or multi-kingdom bacteria associated with nosocomial infections.

Plants are host to diverse microbiome that might have co-evolved since millions of years. This resident microbiota can act as extended genome by contributing in plant growth, development and protection from biotic and abiotic stresses. Rice (Oryza Sativa) is a staple food consumed by more than 50% of the world's population. Herein we targeted the bacterial community associated with the healthy rice seeds. In this direction, we isolated and carried out whole genome sequencing of 100 bacterial isolates. These isolates belong to three major bacterial phyla Proteobacteria, Firmicutes, and Actinobacteria that spread over 15 distinct genus and 29 species. A phylogenetic tree based on a broad set of phylogenomic marker genes confirmed the evolutionary relationship amongst the strains and their phylogenetic grouping. Average Nucleotide Identity was also used to establish species identity of isolates that form a particular phylogenetic and taxonomic grouping. The data generated from the present study is one of the first major genomic resources in the field of phytobiome research. Whole genome sequence of the members will be invaluable in this era of big data driven research. Moreover, the majority of genus and species identified in this study are already known for plant probiotic properties. This genomic data with annotation will aid in comparative, evolutionary and ecological studies of bacteria associated with plants or multi-kingdom bacteria associated with nosocomial infections.

Isolation of Bacteria from Seed
Rice seeds were collected from farmer's field in Fazilka, Punjab, India practicing conventional farming and growing basmati variety. Bacterial isolations were done from a pool of seeds isolated from the same field grown and harvested in three successive years. First isolation was from the seeds harvested in the year 2011, next three isolations were from the crop harvested in 2012 and last from the year 2013. For bacterial extractions, 5 g of seeds were partially crushed (∼80%) in normal saline (0.85% NaCl) using sterile pestle and mortar and suspended in 50 ml of the solution (10%; Cottyn et al., 2001). These solutions were incubated for 2 h at 4 • C/28 • C and then dilution plating was done up to 10 −6 . Samples from each dilution were plated in triplicates on six different media, Peptone Sucrose Agar (PSA), Glucose Yeast extract Calcium carbonate Agar (GYCA), Luria broth (LB) agar, King's B (KB) agar, Nutrient broth (NB) agar, and Potato Dextrose Agar (PDA). Plates were incubated at 28 • C and growth was checked up to 6 days. Control plates with/without saline solution were also incubated to check for contamination up to 1 month. Bacterial colonies based on diverse morphology were selected and further processed, as the aim was to capture maximum diversity associated with rice seeds. Bacterial cultures were frozen in 15% glycerol at −80 • C.

Identification by 16S rDNA Sequencing
Bacterial isolates were streaked on nutrient broth (NB) agar to get single colonies and 3-4 colonies of each bacterium were suspended into 50 µl of water. Freeze-thaw shock was given to bacteria by freezing the vial at −80 • C for 10 min and then incubation at 95 • C for 5 min. After that samples were centrifuged at 10,000 rpm for 1 min to collect the supernatant and this step was repeated once again before proceeding further. Samples were quantified for DNA using NanoDrop (Thermo Scientific) and PCR was performed using universal 16S rRNA amplification primers 27F (AGAGTTTGATCMTGGCTCAG) and 1492R (GGTTACCTTGTTACGACTT). After checking for amplification on 1% agarose gel, samples were treated with Exo-Sap (USB, Affymetrix Inc. Cleveland, Ohio, USA) to remove single stranded DNA primers and unused dNTPs and samples were subjected to sanger sequencing using in-house facility ABI DNA sequencer. Data generated in ABI files were visualized in Finch TV v1.4.0 to select the sequences of high quality that were analyzed using Ez-BioCloud (Kim et al., 2012) to identify the closest bacterial species.
From five different isolations, 469 colonies were obtained as a pure culture. Further based on morphological characteristics on the agar plates, the sample size was reduced to 147 for identification of species by 16S rRNA sequencing. Out of these 147 cultures, further shortlisting was done to 100 isolates for genome sequencing, which consisted of minimum one representative of each species from each lot to represent the seed associated bacterial diversity.

Genome Sequencing, Assembly, and Analysis
Bacterial cultures were revived from −80 • C stocks and ZR Fungal/Bacterial DNA isolation kit (Zymo Research) was used to isolate DNA from these. DNA quality check was done using NanoDrop (Thermo Scientific) and agarose gel electrophoresis and quantitation of DNA was performed using Qubit 2.0 Fluorometer (Life Technologies). Sequencing library preparation was performed using Nextera XT sample preparation kit (Illumina, Inc., San Diego, CA, USA) and loaded on to in-house Illumina MiSeq platform (Illumina, Inc., San Diego, CA, USA) using company supplied pairedend sequencing kits. Adapter trimming was done automatically by MiSeq Control Software (MCS) and additional adapter contamination identified by NCBI server was removed by manual trimming. De novo assembly of the sequences were done using CLC genomic workbench v7.5 (CLC bio, Aarhus, Denmark) with default settings. Sequences were annotated using NCBI Prokaryotic Genome Annotation Pipeline (http://www.ncbi.nlm. nih.gov/genome/annotation_prok/). RNAmmer 1.2 server was used to annotate the RNA sequences and Ez-BioCloud to identify the closest bacterial species. Protein sequences of 10 known phylogenomic marker genes (infC, rplB, rplC, rplD, rplE, rplF, rplM, rplN, and rplP) were extracted from the genomic sequences, aligned, and concatenated to obtain multi locus strain phylogeny. These are single copy and universally distributed genes with core housekeeping functions (Wu and Eisen, 2008) and importantly found to be relatively immune to horizontal gene transfer (Jain et al., 1999). Sequences were aligned using Mega v6.0 (Tamura et al., 2013) and a phylogenetic tree was constructed using the Neighbor-Joining method with 500 bootstrap replicates. JSpecies 1.2.1 software was used to calculate Average Nucleotide Identity (ANI) amongst different strains (Richter and Rosselló-Móra, 2009).

Nucleotide Sequences Accession Numbers
The data has been submitted to NCBI GenBank under accession no. LDPZ00000000-LDTU00000000 and assembly statistics for the 100 bacterial genomes sequenced is provided in Table 1.

Interpretation of Data Set
High-quality sequencing data generated for each strain ( Table 1) was de novo assembled with coverage ranging from 49x to 270x. Analysis based on complete 16S rRNA sequence extracted from the whole genome sequences assigned them to 15 distinct genus and 29 species. It is also pertinent to note that genus/species assignment has been validated by a new QA protocol of NCBI during submission process. Here, the "input fasta sequences are BLASTed against a collection of 23 bacterial ribosomal protein COG families during submission." Multilocus phylogenetic tree based on marker genes further supports the distinction between different groups of bacteria (Figure 1) to strain level. Description of the strains assigned to 15 different genera is provided below:

Genus: Kocuria
Kocuria is a gram-positive bacterium, belonging to phylum Actinobacteria. Seven isolates (SA11, SA12, SA13, SA14, SA15, RSA5, and RSA28) belonging to this genus were sequenced from two different libraries and two different year lots. Complete 16S rRNA typing has assigned the seven isolates to same species Kocuria kristinae. Further ANI analysis showed the seven strains have genome level identity >99.8%, much above the cut-off of 94-96% for delineation of species (Konstantinidis and Tiedje, 2005;Richter and Rosselló-Móra, 2009) and suggests their monophyletic/clonal nature. Interestingly only in case of Kocuria, a single species was detected even after having multiple strains.

Genus: Curtobacterium
Curtobacterium is also a gram-positive bacterium belonging to phylum Actinobacteria and class Microbacteriaceae. Four isolates of Curtobacterium were obtained from seed microbiome that belong to three different species on the basis of 16S rRNA sequences i.e. Curtobacterium luteum (NS184), Curtobacterium citreum (NS330), and Curtobacterium oceanosedimentum (NS263, NS359), sequenced from two different libraries. ANI values amongst the genomes of four strains also support the presence of three species.

Genus: Leucobacter
Leucobacter is another gram-positive Actinobacteria, belonging to class Microbacteriaceae. One isolate NS354 belonging to    this genus was isolated from rice seeds and was assigned to Leucobacter chromiiresistens on the basis of 16S rRNA sequences.

Genus: Microbacterium
Microbacterium is another gram-positive Actinobacteria and six isolates from rice seeds were assigned to genus Microbacterium. These isolates were extracted from three different libraries and two different rice lots. 16S rRNA sequences assigned them into two different species Microbacterium testaceum (NS183, NS206, NS220, NS283, and RSA3) and Microbacterium oxydans (NS234), while ANI values suggest NS220 to be a different species as the values are less than 87.5% with all the other strains.

Genus: Exiguobacterium
Exiguobacterium is a gram-positive bacterium that is assigned to phylum Firmicutes. Two isolates (RSA11 and RSA42) belonging to this genus were isolated from one library and assigned to same species Exiguobacterium indicum on the basis of 16S rRNA gene sequences. While ANI value amongst these two strains is 94.33%, very close to the cut-off for species delineation, suggesting that these two strains may belong to two different species.
Estimated ANI values also support the species distinction between the four groups as they are above the cut-off for species delineation.

Genus: Paenibacillus
Paenibacillus is the bacterium belonging to gramvariable Firmicutes. One strain NS115 belonging to Paenibacillus jamilae was isolated from rice seed environment.

Genus: Aureimonas
Aureimonas is a gram-negative bacterium belonging to α-Proteobacteria. Two strains belonging to species Aureimonas ureilytica (NS226, NS365) were isolated from the rice seeds in two different preparations and they have ANI value of 96.91%.

Genus: Methylobacterium
Another gram-negative bacterium belonging to α-Proteobacteria, Methylobacterium was also isolated from rice seed environment. Five isolates from two different year rice lots belonging to two different species were extracted, Methylobacterium radiotolerans (SB2, SB3), and Methylobacterium aquaticum (NS228, NS229, NS230). ANI values also confirmed the delineation of two species.

Genus: Novosphingobium
One bacterial isolate belonging to Novosphingobium barchaimii NS277 was isolated that is also a gram negative α-Proteobacteria.

Genus: Pseudacidovorax
Pseudacidovorax is a gram-negative β-Proteobacteria. One isolate belonging to Pseudacidovorax intermedius NS331 was isolated from rice seeds.

Genus: Enterobacter
Enterobacter is a gram-negative bacteria belonging to family Enterobacteriaceae of γ-Proteobacteria. Bacterial isolates belonging to this genus were isolated from three different libraries and two rice lots. Eighteen isolates were sequenced belonging to three different species on the basis of 16S rRNA FIGURE 1 | Multi locus sequence analysis of rice seed associated bacterial isolates constructed using 10 phylogenomic marker genes with Neighbor-Joining method and 500 bootstrap replications. Bacterial strains belonging to different phylum; Proteobacteria (α, β, and γ), Firmicutes, and Actinobacteria, are highlighted with different background colors. Strains belonging to each species are grouped together with high boot strap values. sequence i.e. Enterobacter asburia (NS7, NS23, and NS34), Enterobacter xiangfangensis (NS19, NS24, NS28, NS29, NS49,  NS57, NS64, NS75, NS80, NS371, RSA8), and Enterobacter cancerogenus (NS31, NS104, NS111, NS188). ANI values amongst all Enterobacter asburia and E. xiangfangensis strains is more than 99.9%, suggesting that isolates actually belong to one species only while their ANI value with E. cancerogenus strains is around 86%.

AUTHOR CONTRIBUTIONS
SM, KB, and VC carried out the bacterial isolations from rice seeds and their identification using 16S rRNA gene sequences. DNA isolation, QC and sequence assembly was performed by KB and SS. SM, KB, and SS did the library preparation for high throughput sequencing, run analysis and analyzed strains for genome based taxonomy. NK and PPP carried out the revival of strains for sequencing and helped in sequin file preparation. PBP conceived the study and participated in its design and coordination.

ACKNOWLEDGMENTS
SM, VC, and NK are supported by the fellowship from Council of Scientific and Industrial Research (CSIR). KB, SS, and PP are supported by the fellowship from University Grant Commission (UGC). We acknowledge the funding from CSIR Network projects (BSC-402H and BSC-117/PMSI). We thank Girish Sahni, director, IMTECH, for encouragement and support.