Identification of Edible Fish Species of Pakistan Through DNA Barcoding

Fish is a fundamentally healthy food, loaded with essential nutrients, high protein content, vitamin D, and omega-three fatty acid. Mislabeling is a common problem in the fish industry that causes an imbalance in prices and fluctuation in the market. DNA barcoding is a potential technique for authentication of mislabeled and misidentified fish species. In this study, 11 freshwater and 6 marine fish species were used for DNA barcoding and further authentication using the mitochondrial cytochrome b (Cyt b) gene. Cyt b was amplified using PCR, producing an average read length of 1,141 bp. The obtained sequences were compared to the National Center for Biotechnology Information database (NCBI) using the Basic Local Alignment Search Tool (BLAST). The average AT content (55.20%) was higher than the average GC content (44.78%) in marine and freshwater fish species. The mean genetic Kimura 2-parameter distances for species, genus, families, and orders were 0.311, 0.308, 0.023, and 0.337, respectively. Phylogenetic tree analysis revealed that most of the freshwater fish species clustered together due to the fact that they were in the same order or family, while the marine fish species clustered distantly. Single nucleotide polymorphism (SNP) analysis of all species in the study revealed distinct features regarding unique sites. All fish species could be identified based on their unique SNP profiles. Based on SNP data, DNA sequence based QR codes were developed for accurate identification of fish species. This is the first study to develop DNA-based QR barcodes for proper authentication of species during the chain of custody using simple technology.


INTRODUCTION
Fish are the most abundant vertebrate group on the earth, consisting of 50% of the vertebrate species. Fish consumption is often a staple of the human diet with high digestibility and good taste. Fisheries also play an essential role in generating income for many communities (Rafique, 2007;Rafique and Khan, 2012). So far, 33,000 fish species have been identified throughout the world (Di Pinto et al., 2015). In Pakistan, 531 species of fish have been identified, among which 233 are freshwater and the remaining 298 are marine fish species. According to studies conducted by Rafique (2007) and Rafique and Khan (2012), 78 of the 233 freshwater fish in Pakistan are economically important species.
Recent studies conducted by Armani et al. (2015) and Pollack et al. (2018) identified multifarious challenges in the fish market with issues of mislabeling, fraud, and substitutions that prevent the expansion of the market. Some mislabeling issues are a result of the close resemblance between different fish in terms of appearance, topology, texture, taste, and other morphometric characters. However, in some cases, low-quality fish is advertently mixed with or mislabeled as higher quality fish to fetch a better price for otherwise commercially unimportant fish species (Cawthorn et al., 2012). These fraudulent practices negatively impact the fish market, demanding suitable control measures to protect the local food industry. Initiatives are required to raise public awareness and develop effective means for authentication programs that can detect and prevent fish mislabeling (Ali et al., 2018).
The authentic and reliable identification of fish is essential to prevent mislabeling in the fish markets. One of the leading techniques for authentication of fish is to identify species based on morphological and morphometric features (Bottero and Dalmasso, 2011). Fish have extremely diverse morphological characteristics as they transition through ontogenetic metamorphism, and thus, morphometric characteristics change during the process of ontogenetic development (Zhang and Hanner, 2011). Similarly, convergent and divergent adaptations impose further challenges in the identification process (Keskin and Atar, 2013). The use of molecular approaches for identifying fish species has been suggested to mitigate the limitations associated with morphological based identification systems and the lack of local fish identification expertise (Zhang and Hanner, 2011;Keskin and Atar, 2013;Di Pinto et al., 2015). With advancements in the modern taxonomic system, features such as internal anatomy, physiology, genes, isozymes, behavior, and geography have been introduced for appropriate identification (Costa and Carvalho, 2007). DNA barcoding, a technique that applies genetically variable DNA sequences with low intraspecific but high interspecific variability to discriminate between species, has been used as a practical approach in food traceability (Galimberti et al., 2013). DNA can be isolated from processed meat for DNA barcoding and thus, can be performed at any stage within the chain of custody (Khaksar et al., 2015).
Various DNA biomarkers have been used for fish identification. The DNA barcoding approach has high reproducibility and can be tested or verified at any point in a chain of custody, as long as the bridge between DNA sequences and voucher specimens are validated (Nicolè et al., 2011). Additionally, genomic DNA extraction and amplification of genetic markers are technically simple and usually nondestructive; thus, this approach does not require the destruction of valuable samples (Nicolè et al., 2013). DNA barcoding has been extensively applied in sectors including fish authentication, labeling, and biodiversity, conservation, ecological, and forensic studies (Sullivan et al., 2013;Di Pinto et al., 2015;Verzeletti et al., 2015;Pollack et al., 2018).
It can be difficult to recover a sufficient quantity and quality of nuclear DNA molecules from raw or processed meat; thus, the use of nuclear DNA is limited compared to organelle DNA (Asif and Cannon, 2005). Almost 500 plus species have been targeted, and most of them belong to gadoids, scombroids, and salmonids. One of the most familiar and most targeted DNA markers is mitochondrial cytochrome b, which has its common applications in forensic, taxonomic, and ecological fields (Beamish and Rothschild, 2009;Teletchea, 2009;Kochzius et al., 2010). Use of Cyt b gene is a wise choice for identification of fish species, chickens, praomyin rodents, and many researchers reported its wide acceptance in systematics and molecular ecology (Kartavtsev, 2011;Nicolas et al., 2012;Yacoub et al., 2015;Fernandes et al., 2017). Other studies included use of Cyt b regions for phylogenetics and population analyses in fish species (Beamish and Rothschild, 2009;Li et al., 2018). However, other genes such as cytochrome c oxidase subunit I (COI) have also proven useful (Hebert et al.,, 2003;Prieto et al., 2003). Compared to nuclear genes, mitochondrial DNA (mtDNA) is more suitable for DNA barcoding due to high copy numbers, lack of introns, low recombination, and maternal inheritance (Nicolè et al., 2013). Hebert et al., (2003) used the mitochondrial cytochrome c oxidase subunit I (COI) gene sequence for DNA barcoding. The intraspecific diversity of the COI gene in animals had lower resolving power than interspecific diversity as a DNA barcode. The COI gene is used extensively for DNA barcoding in other biological groups, but less so for fish (Doña et al., 2015).
The Cyt b gene has been used extensively in fish barcoding studies (Fernandes et al., 2017) and is considered the best mitochondrial gene for phylogenetic analysis concerning protein function and structure (Degli Esposti et al., 1993). The slowly evolving codon positions and variable domains of Cyt b are ideal for examining the systematic diversity of phylogeny (Kumazawa and Nishida, 2000). The aim of this study is to determine the efficacy of the Cyt b gene for the identification of Pakistan's freshwater and marine fish species. Moreover, the DNA sequence data generated from this study was used to develop a "Quick Response Code" (QRC).

Fish Sample Collection
The research was conducted at the Center for Advanced Studies in Agriculture and Food Security (CAS-AFS), University of Agriculture, Faisalabad, Pakistan. The fish for this experiment were collected from two cities in Pakistan: Faisalabad, Punjab (31.42 • N, 73.08 • E); and Karachi, Sindh, (24.91 • N, 67.08 • E) (Figure 1). Overall, eleven freshwater fish species belonging to six families and five orders, and six marine fish species belonging to five families and one order were collected ( Table 1). The raw fish samples obtained were thoroughly washed, immediately transported to the laboratory in polythene bags and stored at -80 • C until DNA extraction. These total 17 individuals (11 freshwater + 6 marine fish species) were further used for DNA extraction, PCR amplification, sequencing and DNA barcoding.

DNA Extraction, Visualization, and Quantification
DNA was extracted from a 30 mg muscle tissue sample using the GeneJet Genomic DNA Purification Kit (Thermo Fisher Scientific Cat. # K0721). Genomic DNA was visualized on 1% agarose gel and stored at -20 • C for the downstream applications. Quantification and purity of the extracted DNA were determined using NanoDrop R -ND-8000 (Thermo-Scientific, Waltham, MA).

Amplification of Conserved Regions of Cyt b Gene and Sequencing
High-quality DNA was used for PCR amplification, as reported by Sevilla et al. (2007). Amplification was performed using a C1000 Touch Thermo Cycler (Bio-Rad). For this purpose, a 20 µl reaction mixture was combined in PCR tubes with 50 ng DNA template, 0.5 µl Taq DNA polymerase (5 U/µl, Thermo Scientific, America), 2 µl Taq Buffer (10X, Thermo Scientific, America), 2 µl MgCl 2 (25 mM), 2 µl dNTP's (10 mM, Thermo Scientific, America), 8 µl Milli-Q H 2 O and 1 µl of each primer (10 mM), (forward, 5 -AACCACCGTTGTTATTCAACTACAA-3 and reverse 5 -CCGACTTCCGGATTACAAGACCG-3 ). The PCR amplification of Cyt b consisted of the initial denaturation at 95 • C for 30 s, followed by 40 cycles of denaturation at 94 • C for 30 s, annealing at 50 • C for 35 s, extension at 72 • C for 120 s, a final extension at 72 • C for 4 min, and then an infinite hold at 4 • C. The amplified PCR products were visualized and sized on 1% agarose gel. Then, before Sanger sequencing, the amplified PCR products were purified using FavorPrep PCR Clean-Up Mini Kit (Cat. # FAPCK001-1). Sanger sequencing was performed uni-directionally for discrimination of freshwater and marine fish species.

SNP Detection and DNA Barcoding
A sequence file including only the experimental sequences (11 freshwater, 6 marine) was aligned through MEGAX using the MUSCLE alignment tool. Additionally, all sequences were edited manually, i.e., similar, highly mismatched sites and gaps were removed, and by using SeqMan software (DNAStar software); each base of the spliced sequence was checked before submission to GenBank (Bingpeng et al., 2018). Based on above alignment data and manual, single nucleotide polymorphism (SNPs) was detected for estimation of unique sites same as described by Fatima et al. (2019).
QR code is easily accessible two-dimensional barcode, readable by smartphones. It allows to encode over 4000 characters in a two-dimensional barcode. SNP data were used for the development of DNA barcodes for each species using an online QR code generator 1 . Each SNP fish sequence was TABLE 1 | Identification of freshwater and marine fish species sampled from a local market based on Cyt b gene sequence homology.   pasted in online site described previously and QR codes were generated, respectively.

Data Analyses and BLAST Annotation
The Basic Local Alignment Search Tool (BLAST) database is a highly efficient tool for determining sequence similarities with reference sequences from GenBank. The edited sequences were confirmed by our expert taxonomist from Department of Zoology, Wildlife and Fisheries, University of Agriculture, Faisalabad, Pakistan; uploaded to BLASTn (BLAST nucleotide) on the National Center for Biotechnology Information (NCBI) database for validation and identification of the fish species. The input sequences were compared with the maximum similarity data sets of fish species based on the lowest significant E-values for the pairwise generated alignment. Hence, species were validated by our expert taxonomist based on high BLAST identity percentage with the lowest E-value.
The 17 validated reference sequences for all fish species were downloaded from GenBank for utilization in the construction of a phylogenetic evolutionary tree (neighbor-joining tree). Additionally, genetic distances between fish species were calculated from the neighbor-joining tree using MEGAX. The genetic Kimura 2-Parameter (K2P) distances of the Cyt b nucleotide bases between the fish species were also analyzed with MEGAX using the pairwise genetic distance method.

Sequencing and Composition
The Cyt b primers produced a single amplification product with a read length of 1,141 bp (Figure 2). The sequence files were computed in two ways. The file with gaps removed after alignment was used for analyzing the evolutionary relationship among experimental species with reference to sequences downloaded from the NCBI database. The information generated through this sequence was used to trace the phylogeny of freshwater and marine fish species. The sequence file computed with only the experimental sequences was used to generate scannable QR codes. The freshwater and marine fish species nucleotide discrimination revealed varied AT (adenine + thiamine) and GC (guanine + cytosine) contents. Among the 11 freshwater fish species, the observed nucleotide base composition of all analyzed sequences was 56.0% AT (range: 309-659) and 43.96% GC (range: 207-546) ( Table 2). Similarly, in marine fish species, the nucleotide composition was 53.73% AT (range: 301-478) and 46.26% GC (range: 222-447), respectively (Figure 3). The results demonstrated that for these freshwater and marine fish species, the total nucleotide composition consisted of more AT than GC bases ( Table 2).
The interspecies genetic distances were calculated with the K2P model using pairwise comparison to trace the evolutionary relationship between species. The K2P genetic distances between

Evolutionary Relationship of Experimental Species
BLAST was used to perform a similarity-based search of the GenBank databases. Sequence-specific BLAST was performed for all fish (freshwater and marine) separately, and the species with maximum identity percent (ID) score and query cover were selected for further analysis. Additionally, sequences with maximum similarity (reference sequences with Accession numbers) from the BLAST search were downloaded from the NCBI database for comparison to the experimental species. BLAST search in reference to experimental sequences was performed in supervision of our expert taxonomist for clarification of any doubts in GenBank sequences.   lentjan (AF381269.1) for the marine fish (Table 4). After arranging all the experimental sequences, a complete file was uploaded to MEGAX for further analysis, alignment, and phylogenic tree construction. An evolutionary neighbor-joining tree was used to validate all species (Figure 4). The sum of the tree branch lengths was 2.43; 500 bootstrap replicates with the same units that were used to measure the evolutionary distances were used in the phylogeny test. Kimura 2-parameter method was used to compute evolutionary distances. For the phylogenetic tree construction, all gaps were removed in order to determine the ancestral relationships among the species. A total of 17 nucleotide sequences were involved in the phylogenetic analysis. Moreover, 287 positions were present in the final dataset (Figure 4). The evolutionary relationships among species revealed that most of the fish species were clustered together, except for the marine species. The results reflected no taxonomic deviation, indicating that the majority of species can be authenticated using a barcode approach.

Labeo calbasu T T A C T T G A T C A T T A C G A T G T G C T T C T G T G C A G A A A G G G A G C A A G A A A A A A G G A A A A G A A C A A A A A A A A G C A A T A G A A A A A A
TABLE 6 | Identification of marine fish species based on single nucleotide polymorphism data analysis.

GA CC GAT A T T T T A A ACCT CCAGCA GT T CCT T CT CCT GA T A T C GCGT AT A GT C T A A A AT C T CT A A ACGCT T GCGT GGA T CT A GGCCGA T CT CT C C C T CA GA T A AT AT G
Frontiers in Marine Science | www.frontiersin.org

Single Nucleotide Polymorphism Screening and Generation of Scannable QR Codes
Moreover, the sequences selected for SNP detection revealed single base pair differentiation in all freshwater and marine fish species (Table 1). In the case of freshwater fish species, a total of 52 unique sites were found in Labeo calbasu with Cyt b, more than all other species (Table 5). In Labeo rohita, Ctenopharyngodon idella, and Hypophthalmichthys molitrix, only one unique site was found, while Channa marulius contained two unique sites, Chitala chitala contained three sites, Labeo gonius contained four sites, Ompok bimaculatus and Oreochromis niloticus contained five sites each, and Mystus cavasius contained seven unique sites based on SNPs. Interestingly, no unique sites were identified in the Wallago attu sequence using SNP detection, which means it cannot be validated using the SNP method. For the marine fish species, 56 unique sites were found in Pampus argenteus, Tenualosa ilisha and Scomberoides commersonianus contained five unique sites each, Carangoides malabaricus had eighteen sites, Lactarius lactarius had ten sites, and Scomberomorus commerson contained thirteen unique sites (Table 6). Finally, all freshwater and marine fish SNP sequences were used to generate scannable QR codes. DNA sequence based QR codes for freshwater and marine fish species are given in Figures 5, 6, which can be scanned with simple mobile device applications.

DNA Barcoding
Mitochondrial DNA fragments can be used for the authentic identification and discrimination of unknown or closely related species (Dawnay et al., 2007). Moreover, variations between populations can be detected through changes in mitochondrial DNA sequences such as cytochrome oxidase subunit 1 (COI) and Cyt b (Avise et al., 1987). Parson et al. (2000) reported use of Cyt b for efficient identification of species from 5 major vertebrate groups including fish. They used to trace similarities between species of choice through BLAST similarity. Our study is different than Parson et al. (2000) in freshwater and marine fish species recognition as we have used BLAST, phylogeny testing, SNPs detection and DNA barcoding for authentication of fish species. In another study, Vergara-Chen et al. (2009) reported PCR-RFLP based identification of Cynoscion species in Bay of Panama. They used to amplify mitochondrial Cyt b gene for efficient identification of Cynoscion species. Cyt b marker shown promise in accurate identification of larval species of Cynoscion. This PCR-RFLP is an attractive approach in identification of species based on enzymes. Our study is different from Vergara-Chen et al. (2009), in species discrimination. We used modern sequencing, alignment and SNP detection methods for accurate identification of fish species. In addition, RFLP method does not work always for authentication of species. Therefore, our results are far better and authentic compared to Vergara-Chen et al. (2009).
Barcode analysis using the cytochrome-b locus could delineate fish for the identification of mysterious specimens in order to recognize unpredicted diversity between them (Meyer and Paulay, 2005;Kerr et al., 2009). The Cyt b gene sequence has no insertions, deletions, or stop codons, indicating that all amplified sequences are obtained from the functional mitochondrial gene sequences. Amplification of the Cyt b DNA fragment using PCR to achieve an average read length of 1,141 bp in 11 freshwater and 6 marine fish species is a significant indicator that DNA barcoding could be applied as a global standard for identifying fish species.

Nucleotide Discrimination Among Freshwater and Marine Fish Species
Our analysis revealed that the average nucleotide base composition was 56% AT and 43.96% GC in freshwater fish species. Similarly, the average AT content in marine fish species was 53.73% and the GC content was 46.26%. Overall, in freshwater and marine fish species, the average AT content (55.20%) was higher than average the GC content (44.78%). This result is consistent with previous studies that reported higher AT (59.60%) content than GC content with Cyt b gene amplification in Clupisoma garua species (Nei and Kumar, 2000;Saraswat et al., 2014).

Genetic Divergence (K2P) Among Taxa
In this study, the K2P model was used to evaluate the genetic distance between different taxonomic levels. The average interspecific genetic distance among species was 0.311%, compared with 0.308% for genera. Moreover, the mean genetic distance among families was 0.369% and among orders was 0.337%. In our study, the mean interspecific genetic distance among families was higher than orders, genus, and species, respectively. Our results are consistent with previous studies by Ardura et al. (2013), Ward et al. (2005), Hubert et al. (2008), andLara et al. (2010), which report high interspecific genetic distances in marine fish species. Thus, the genetic distances sufficiently discriminated all freshwater and marine fish species.

Tree Construction and Lineage
The constructed phylogenetic tree provided similar classification concerning taxonomy and morphology, along with insignificant differences at the taxonomic levels. Our results highlighted the efficacy of barcoding for the identification and authentication of Pakistan fish. In this study, 11 freshwater and 6 marine fish species comprising 9 orders, 11 families, 15 genus, and 17 species of Pakistan fish were categorized. The phylogenetic relationship demonstrated that all morphologically similar or closely related species were clustered under the same nodes, while the distant species were clustered in distinct nodes.
In the phylogenetic tree, Labeo rohita and Labeo gonius are sister species that originate from the same cluster. The same is true for Hypophthalmichthys molitrix and Ctenopharyngodon idella, and all four species are closely related to each other, belonging to the same order (Cypriniformes) and family FIGURE 5 | QR codes generated using unique single nucleotide polymorphism data for freshwater fish species.
(Cyprinidae). However, Labeo calbasu is distantly related and clustered separately with respect to the other freshwater fish species. In addition, freshwater species Mystus cavasius, Wallago attu, and Ompok bimaculatus are also closely related to each other and belong to the same order (Siluriformes) and family (Siluridae)

DNA Sequence-Based Development of QR Codes
We have developed DNA sequence-and SNP-based QR codes that can be scanned using mobile phone applications in the same way that barcodes are scanned in supermarkets (Figures 5, 6). To our knowledge, this is the first study to develop QR codes for the identification of fish species based on molecular approaches. Previously, Yang et al. (2019) developed a DNA barcode as an example for the precise identification of Teleost fish species. Our approach differs from that of Yang et al. (2019) as we developed DNA sequence based QR codes instead of using a Bio-Rad DNA barcode generator for generating barcodes. The use of species authentication supported by DNA barcoding could provide an effective approach for monitoring, management, and conservation of the fisheries sector. This study was pioneer research, targeting 17 commercially available freshwater and marine fish species of Pakistan, based on a molecular approach rather than visible morphology. Species-level fish identification in Pakistan is not common; here we validate the DNA barcoding approach as a gateway for identification and authentication using QR barcodes.

CONCLUSION
The increased consumption and of fish and fish products and the morphological similarities between species has led to the inadvertent and deliberate mislabeling of fish in markets. Barcoding provides a novel technique for the authentication of fish species using sequencing of the Cyt b gene of mitochondrial DNA, without relying on morphological and meristic characteristics. Thus, DNA barcoding has been proven as a reliable tool for the detection of fish and the enhancement of food safety. Despite the high success rate of this technique, it is still in the infancy phase. The International Barcode of Life previously stated that "DNA sequence can be used to identify the various species, just as a supermarket scanner can use a familiar black strip that encodes the Universal Product Code (UPC) to recognize the purchase products". A digital barcode hologram is ultimately needed to identify fish species by using a barcode reader swiftly. The digital data collected by the next-generation storage system can also be used to complement the barcode sequences for all fish species.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below: Figshare (https://doi. org/10.6084/m9.figshare.12994073).

ETHICS STATEMENT
The fish used in this study were treated and handled according to the standard protocols and Ethics Committee of the University of Agriculture, Faisalabad, Pakistan.

AUTHOR CONTRIBUTIONS
MN, SA, and AA performed practical work. MG, MI, MJ, and SK wrote the manuscript. AA, MG, AU, ZK, and SAf revised the manuscript. AA provided funds for this study. NM helped in English editing. All authors contributed to the article and approved the submitted version.

FUNDING
Work in lab of AA has been supported by Higher Education Commission, Islamabad, Pakistan under grant # NRPU-6350.