AUTHOR=Marla Soma S. , Mishra Pallavi , Maurya Ranjeet , Singh Mohar , Wankhede Dhammaprakash Pandhari , Kumar Anil , Yadav Mahesh C. , Subbarao N. , Singh Sanjeev K. , Kumar Rajesh TITLE=Refinement of Draft Genome Assemblies of Pigeonpea (Cajanus cajan) JOURNAL=Frontiers in Genetics VOLUME=Volume 11 - 2020 YEAR=2020 URL=https://www.frontiersin.org/journals/genetics/articles/10.3389/fgene.2020.607432 DOI=10.3389/fgene.2020.607432 ISSN=1664-8021 ABSTRACT=Genome assembly of short reads from large plant genomes remains a challenge in computational biology despite major developments in Next Generation sequencing. Of late multiple draft assemblies of plant genomes are reported in many organisms. The draft genome assemblies of Cajanus cajan are with different levels of genome completeness, large number of repeats, gaps and segmental duplications. Draft assemblies with portions of genome missing, are shorter than the referenced original genome. These assemblies come with low map accuracy affecting further functional annotation and prediction of gene component as desired by crop researchers. Genome coverage i.e. number of sequenced raw reads mapped on to certain locations of the genome is an important quality indicator of completeness and assembly quality in draft assemblies. Present work was aimed at improvement of coverage in reported de novo sequenced draft genomes (GCA_000340665.1 and GCA_000230855.2) of pigeonpea, a legume widely cultivated in India. The two recently sequenced assemblies comprised 72% and 75% of estimated coverage of genome, respectively. We employed assembly reconciliation approach to compare draft assemblies and merged them, filled the gaps employing an algorithm to size sort mate pair libraries for generating a high quality and near complete assembly with enhanced contiguity. A majority of gaps present within scaffolds are filled with right sized mate pair reads. Finished assembly has reduced number of gaps than reported in draft assemblies with improved genome coverage of 82.4%. Map accuracy of the finished assembly was evaluated using various quality metrics and for presence of specific trait related functional genes. Employed pair-end and mate-pair local libraries helped to reduce gaps, repeats and other sequence errors resulted lengthier scaffolds compared to two draft assemblies. We reportprediction of putative host resistance genes against Fusarium wilt disease and evaluated them in both wet laboratory and field phenotypic conditions.