A bioinformatic approach to identify confirmed and probable CRISPR–Cas systems in the Acinetobacter calcoaceticus–Acinetobacter baumannii complex genomes

Introduction The Acinetobacter calcoaceticus–Acinetobacter baumannii complex, or Acb complex, consists of six species: Acinetobacter baumannii, Acinetobacter calcoaceticus, Acinetobacter nosocomialis, Acinetobacter pittii, Acinetobacter seifertii, and Acinetobacter lactucae. A. baumannii is the most clinically significant of these species and is frequently related to healthcare-associated infections (HCAIs). Clustered regularly interspaced short palindromic repeat (CRISPR) arrays and associated genes (cas) constitute bacterial adaptive immune systems and function as variable genetic elements. This study aimed to conduct a genomic analysis of Acb complex genomes available in databases to describe and characterize CRISPR systems and cas genes. Methods Acb complex genomes available in the NCBI and BV-BRC databases, the identification and characterization of CRISPR-Cas systems were performed using CRISPRCasFinder, CRISPRminer, and CRISPRDetect. Sequence types (STs) were determined using the Oxford scheme and ribosomal multilocus sequence typing (rMLST). Prophages were identified using PHASTER and Prophage Hunter. Results A total of 293 genomes representing six Acb species exhibited CRISPR-related sequences. These genomes originate from various sources, including clinical specimens, animals, medical devices, and environmental samples. Sequence typing identified 145 ribosomal multilocus sequence types (rSTs). CRISPR–Cas systems were confirmed in 26.3% of the genomes, classified as subtypes I-Fa, I-Fb and I-Fv. Probable CRISPR arrays and cas genes associated with CRISPR–Cas subtypes III-A, I-B, and III-B were also detected. Some of the CRISPR–Cas systems are associated with genomic regions related to Cap4 proteins, and toxin–antitoxin systems. Moreover, prophage sequences were prevalent in 68.9% of the genomes. Analysis revealed a connection between these prophages and CRISPR–Cas systems, indicating an ongoing arms race between the bacteria and their bacteriophages. Furthermore, proteins associated with anti-CRISPR systems, such as AcrF11 and AcrF7, were identified in the A. baumannii and A. pittii genomes. Discussion This study elucidates CRISPR–Cas systems and defense mechanisms within the Acb complex, highlighting their diverse distribution and interactions with prophages and other genetic elements. This study also provides valuable insights into the evolution and adaptation of these microorganisms in various environments and clinical settings.


Introduction:
The Acinetobacter calcoaceticus-Acinetobacter baumannii complex, or Acb complex, consists of six species: Acinetobacter baumannii, Acinetobacter calcoaceticus, Acinetobacter nosocomialis, Acinetobacter pittii, Acinetobacter seifertii, and Acinetobacter lactucae.A. baumannii is the most clinically significant of these species and is frequently related to healthcareassociated infections (HCAIs).Clustered regularly interspaced short palindromic repeat (CRISPR) arrays and associated genes (cas) constitute bacterial adaptive immune systems and function as variable genetic elements.This study aimed to conduct a genomic analysis of Acb complex genomes available in databases to describe and characterize CRISPR systems and cas genes.
Methods: Acb complex genomes available in the NCBI and BV-BRC databases, the identification and characterization of CRISPR-Cas systems were performed using CRISPRCasFinder, CRISPRminer, and CRISPRDetect.Sequence types (STs) were determined using the Oxford scheme and ribosomal multilocus sequence typing (rMLST).Prophages were identified using PHASTER and Prophage Hunter.
Results: A total of genomes representing six Acb species exhibited CRISPRrelated sequences.These genomes originate from various sources, including clinical specimens, animals, medical devices, and environmental samples.Sequence typing identified ribosomal multilocus sequence types (rSTs).CRISPR-Cas systems were confirmed in .% of the genomes, classified as subtypes I-Fa, I-Fb and I-Fv.Probable CRISPR arrays and cas genes associated with CRISPR-Cas subtypes III-A, I-B, and III-B were also detected.Some of the CRISPR-Cas systems are associated with genomic regions related to Cap proteins, and toxin-antitoxin systems.Moreover, prophage sequences were prevalent in .% of the genomes.Analysis revealed a connection between these prophages and CRISPR-Cas systems, indicating an ongoing arms race between the bacteria and their bacteriophages.Furthermore, proteins associated with anti-CRISPR systems, such as AcrF and AcrF , were identified in the A. baumannii and A. pittii genomes.

Discussion:
This study elucidates CRISPR-Cas systems and defense mechanisms within the Acb complex, highlighting their diverse distribution
The Acb complex includes opportunistic pathogens, mainly related to healthcare-associated infections (HCAIs), multidrug-resistant phenotypes, and resistance to desiccation and disinfectants (Manchanda et al., 2010).Other authors have described epidemiological differences among these species, although they frequently share hospital environments (Wisplinghoff et al., 2012;Calix et al., 2019).A. baumannii is the main species with the most clinical importance; it is commonly isolated in intensive care units and is related to illness types such as ventilator-associated pneumonia, meningitis, bloodstream, and urinary tract infections (Moradi et al., 2015).A. baumannii is resistant to a broad spectrum of antibiotics, limiting therapeutic options.Recently, A. baumannii has become resistant to carbapenems and has been included in the list of priority pathogens resistant to antibiotics published by the World Health Organization (World Health Organization, 2017).
The A. baumannii genome shows great plasticity, which exposes it not only to rapid changes due to mutations but also to the transfer and acquisition of exogenous material (Yakkala et al., 2019), that is, the dynamic conversion of genetic information for acquisition and removal (Bondy-Denomy and Davidson, 2014).A. baumannii acquires genetic material through bacteriophages via a transduction mechanism (Chevallereau et al., 2022).
Bacteria can develop adaptive defense mechanisms against when exposed to different exogenous genetic material.The clustered regularly interspaced short palindromic repeats (CRISPR)-CRISPR-associated enzyme (Cas) system has also been associated with the acquisition of mobile genetic material and is an adaptive mechanism of immunity and resistance used by bacteria and archaea against bacteriophages and/or plasmids (Dy et al., 2013;Makarova et al., 2020).
CRISPR systems comprise palindromic elements interspersed with sequences of constant length (Mojica and Montoliu, 2016) flanking the spacers acquired from bacteriophages, plasmids, or any genetic material.The CRISPR-Cas system comprises two main components: a guide RNA (crRNA or gRNA) and genes that encode Cas proteins, which are essential for the adaptation and incorporation of genetic material (Koonin et al., 2017).
The mechanism of action of the CRISPR-Cas system involves three stages: acquisition, expression, and interference.In the acquisition stage, elements that carry genetic material are identified and recognized (protospacer); this information is cut and integrated into the CRISPR locus at the 5′ end followed by the leader sequence.In the expression stage, the sequences that are added to the CRISPR locus are recognized as spacers, which are expressed in the form of primary transcripts or precrRNAs and are cut into smaller fragments (crRNAs) through endonucleases (Waddington et al., 2016).Finally, in the interference stage, when the bacterium receives exogenous genetic material, the crRNA accompanied by Cas proteins binds via base complementarity to the previously acquired sequence, signaling to the nucleases that the external genetic element must cleave (Bhaya et al., 2011).Moreover, the CRISPR system mediates the transfer of genetic material between genomes (Marraffini and Sontheimer, 2008;O'Meara and Nunney, 2019).Genomes displaying a CRISPR array but lacking associated cas genes, it is proposed that bacteria could capture information from bacteriophages and incorporate it into their genome as a prophage.In contrast, bacteria without a CRISPR-Cas system can be infected by bacteriophages, acquiring information that encodes genes linked to resistance and virulence, thereby enhancing their adaptive capacity (Leungtongkam et al., 2020).
CRISPR-Cas IFb has been proposed as a method for subtyping A. baumannii strains, allowing identification of the route of origin and dissemination (Touchon et al., 2014;Karah et al., 2015).The CRISPR-Cas system identified in A. baumannii is characterized by the presence of the cas1, cas2, cas3, cas5, cas7, and cas8 genes, which have been identified in chromosome and plasmid sequences (Mangas et al., 2019).A few studies have provided information about the prevalence of these systems in clinical strains of the Acb complex.This study aimed to perform genomic analysis of the Acb complex to identify and characterize CRISPR arrays and/or the cas genes, using bioinformatics programs.

Materials and methods
A flowchart for data collection for the Acb complex genomes, detection and analysis of the CRISPR-Cas systems is shown in Figure 1.
The metadata of each strain were manually extracted, including the date, GenBank accession numbers, species, genetic material, strain ID, year of isolation, associated disease, sample, host, collection place, rST, ST, origin, site of isolation, and size (Supplementary Table 1).
The CRISPR-Cas flanking genomic regions were manually reviewed, considering the 20 000 bp sequences upstream and downstream of the arrays.The sequences, predicted structures, and functional annotations of the proteins were analyzed using the UniProt database and I-TASSER server.

. . Analysis of CRISPR arrays
The matrices were extracted manually from the output file.The number, size, and location of RSs (repeated sequences) and SSs (spacer sequences) were determined.The RSs were analyzed with the CRISPRDetect program to establish the variations into RSs and associated with a consensus sequence.The SSs were analyzed using the CRISPRTarget program (http://crispr.otago.ac.nz/CRISPRTarget/crispr_analysis.html;Biswas et al., 2013) to identify the protospacer adjacent motif (PAM) and the genes associated with each spacer.The types and subtypes of the CRISPR systems were analyzed with the CRISPRmap program (http:// rna.informatik.uni-freiburg.de/CRISPRmap/Input.jsp; Lange et al., 2013) using the consensus RS file as the input file.

FIGURE
Flowchart for the search for CRISPR-Cas systems.Flowchart indicating data collection for Acb complex genomes, detection and analysis of the CRISPR-Cas systems, resistance and virulence genes, and prophage-associated sequences.
was inferred using an approximately-maximum-likelihood phylogenetic with FastTree.

. GenBank accession number and data availability
The GenBank accession numbers of the genome sequences used in this study are listed in Supplementary Table 1.The data generated in this work are available at the following link: https:// github.com/JetsiMancilla/System-CRISPR-Cas.

. Genome description and identification of ST and rSTs
The analyzed chromosomal genomes (213) had sizes between 3.4 and 4.3 Mb, and the plasmid (80) genomes had sizes between 0.004437 and 0.33 Mb (Supplementary Table 1).The genomes exhibited a GC content of 39% (±0.1).The RAST annotation showed more than 3,690 to 4,181 coding regions in the genomes, linked to between 303 and 323 subsystems.In contrast, the Prokka annotation revealed a range of 3,427 to 5,415  coding regions (https://github.com/JetsiMancilla/System-CRISPR-Cas).
The number of RSs ranged from three to 158, with lengths of 24 to 29 bp in the CRISPR arrays.The RSs consensus sequences obtained by CRISPRDetect were compared with the RS sequences identified in this study (Figure 2, Supplementary Table 2).
The RS consensus sequences were clustered into 14 distinct groups, with the sequence "GTTCATGGCGGCATACGCCATTTAGAAA" being the most frequent.Interestingly, the consensus RS described in this study showed a minimum of four variations and a maximum of 11 variations compared with the consensus type I-Fb RS reported previously (Karah et al., 2015).Additionally, four RSs were detected in three plasmids, only one of which was closely related (IX) to the RS identified on chromosomes (Supplementary Figure 2).Furthermore, 2,768 SSs with lengths between 29 and 38 bp were identified in this study.Interestingly, the A. baumannii and A. calcoaceticus genomes had the greatest numbers of SSs (Supplementary Table 3).
The SSs were clustered into 30 groups according to their sequence.A total of 30.5% (845/2,768) of the SSs were shared among the genomes, and 69.4% (1,923/2,768) of the SSs were exclusive, which means that they were not commonly found in the other analyzed genomes.Interestingly, in 45.0% (36/85) of the CRISPR arrays, the same spacer was identified two or more times (Figure 3, Supplementary Table 3).
The Acb complex genomes showed the presence of CRISPR arrays, and adjacent sequences corresponding to the six cas genes (cas1, csy1, csy2, csy3, csy4, and cas3) identified subtypes I-Fa, I-Fb and I-Fv (Supplementary Figures 3-12).The size of the CRISPR-Cas system in the genome ranged from 4 617 to 750 328 bp (Table 1).
. The flanking regions in CRISPR-Cas systems were linked to Cap and the toxin-antitoxin system Analysis of genes upstream and downstream (the flanking regions) revealed sequences that encode Cap4 proteins, which can recognize cyclic oligonucleotide-based antiphage signaling systems (CBASSs), in two A. baumannii CRISPR arrays (the 10_3 and 10_4 genomes).Similarly, CRISPR-associated primasepolymerases (CAPPs) were identified in both genomes (Figure 4, Supplementary Figure 12).
Forty-two percent (1180/2760) of the SSs were associated with regions of prophages and plasmids belonging to the Acinetobacter genus, as well as 22 other genera, including mainly Staphylococcus, Aeromonas, Bacillus, Pseudoalteromonas, Salmonella, and Escherichia.The spacer target was a specific sequence within the prophages; however, different SSs matched the same bacteriophage (Supplementary Table 7).While some genomes possess SS that provides immunity against specific prophages, the detection of the same SS in different CRISPR arrays has been identified (Figure 5).
. Toxin-antitoxin systems and antibiotic resistance elements associated with A. baumannii plasmids Analysis of the Acb complex genomes revealed genes involved in colonization and virulence, including genes associated with adherence, biofilm formation, immune evasion, iron absorption, regulation, quorum sensing, and serum resistance.The genes encoded on the chromosome, including the OmpA protein involved in adhesion, invasion, persistence, and dissemination, were detected in 96.7% (206/213) of the genomes, and the genes encoding efflux systems (AdeFGH pumps), 82.6% (176/213) of the Csu fimbriae and 97.2% (207/213) of the PNAG polysaccharides were associated with the formation of biofilms.
Only 68.1% (145/213) of the genes encoded the Bap protein, which also participates in biofilm formation.On the other hand, 97.9% (208/213) of the Acb complex genomes encoded genes related to two-component systems and quorum sensing proteins involved in biofilm formation and cell adhesion (BfmRS).In the Acb complex genome, operons related to capsule Lpx and LPS were identified in 84.0%(179/213).Interestingly, 72.8% (155/213) of the genomes of A. baumannii, A. pittii, and A. nosocomialis contained genes associated with iron acquisition and acinetobactin biosynthesis (Supplementary Table 9).
Toxin-antitoxin systems were identified in only 12.5% (10/80) of the plasmids.Twenty percent (16/80) of the genes were encoded to the stress response proteins, and 8.7% (7/80) were associated with resistance to toxic compounds.
The cephalosporin resistance genes bla ADC−25, bla PER−1, and bla NDM−1 were identified in one plasmid of A. baumannii and A. pittii (Supplementary Table 9).
Several regions encoding copper ATPases, copper chaperones, regulatory proteins, and copper oxidases identified in plasmids have also been associated with the transport and oxidation of copper.The data suggested that these elements interfere with the processes of colonization and immune evasion in A. baumannii; however, these elements were located only in plasmids from A. seiferttii and A. pittii.Our data indicate that A. seiferttii and A. pittii plasmids encode more resistance mechanisms than A. baumannii plasmids.

FIGURE
Correlation between spacers and intact prophages in the genomes.Briefly, the correlations and interactions between intact prophages and spacers targeting them are described.The purple circles represent the genomes with CRISPR-Cas systems, and the colored circles represent the intact prophages identified among the studied genomes.The networks show the interactions among the genomes that carry spacers associated with the identified prophages.The analysis and visualization of the networks were carried out with Gephi . .(https://gephi.org/,Bastian et al., ).

FIGURE
Anti-CRISPR proteins encoded in the genomes of the Acb complex.Identification of the AcrF protein in the LAC , , , and genomes.The position of the protein coincides with the position of the PHAGE_Acinet_YMC / /R in the genomes of the Acb complex.

Discussion
A. baumannii has been implicated in healthcare-associated infections (HCAIs), such as ventilator-associated pneumonia, meningitis, bloodstream infections, and urinary tract infections. A. baumannii together with A. pitti, A. nosocomialis, A. seiferttii, and A. lactucae belongs to the Acb complex.A. pitti, A. nosocomialis, A. seiferttii, and A. lactucae also cause infections and have been associated with resistance to multiple drugs (Fitzpatrick et al., 2015;Li et al., 2021;Alonso et al., 2023;Bajaj et al., 2023;Kang et al., 2023).A. calcoaceticus is mainly found in environmental samples and may also carry antibiotic resistance determinants (Al Atrouni et al., 2016).
The use of bioinformatics tools and the availability of genome sequences have facilitated the analysis and characterization of A. baumannii genomes.These studies have contributed to understanding A. baumannii genome dynamics and the functions of its different genetic determinants, including its response to selective environmental pressure during the evolutionary process.The analysis of A. baumannii genomes has allowed us to explore the presence of repeated sequences associated with CRISPR-Cas systems.However, these systems have yet to be investigated in other members of the Acb complex.The matrices that make up the repeated sequences associated with CRISPR Cas systems belong to types I-F, characterized by the presence of the cas1, cas2, cas3, cas5, cas6, and cas7 genes (Karah et al., 2015;Mangas et al., 2019;Yadav and Singh, 2022).
The CRISPR-Cas system was found in one plasmid because it has been established that CRISPR-Cas systems in plasmids provide bacteria with similar efficacy in protecting against bacteriophage infections compared to those encoded in the chromosome (Siedentop et al., 2024).Additionally, we identified other plasmids characterized by a series of orphan arrays.This phenomenon has been observed in plasmids from other bacteria and archaea, where systems on plasmids are often incomplete or composed of small orphan arrays (Pinilla-Redondo et al., 2022).The prevalence of CRISPR-Cas systems in these genomes may also be associated with the evolution and adaptation of these species in diverse environments (Westra et al., 2019).
The availability of several bioinformatic tools (online and for command-line use) has facilitated the identification of CRISPR-Cas systems.This study identified and confirmed 85 CRISPR arrays and Cas genes according to the CRISPRCasFinder, CRISPRDetect, and CRISPRminer programs.
In this study identified CRISPR arrays or cas gene harbored subtypes I-B and III-B, as previously described by other authors (Yadav and Singh, 2022).Also, CRISPR arrays or genes associated with subtypes III-A and III-B in A. pittii were described for the first time.These systems are considered non-functional.However, there are reports suggesting that these small arrays could be evolutionarily preserved (Shmakov et al., 2023).Various phenomena could explain the isolation of these arrays; one possibility is the emergence of de novo arrays, or it could be that their acquisition is linked to transfer through mobile elements (Westra et al., 2019;Shmakov et al., 2023), which could explain the presence of transposases adjacent to these arrays in our study.
An interesting aspect with CRISPR-arrays was their diversity.The RSs exhibited a clear modularity; even if it's not preserved in all the identified systems.Although these sequences are not directly associated with the exogenous material that makes up the system, it is important to demonstrate their modularity, since they play a crucial role in the marking and organization of the spacer sequences, as well as in the expression and interference processes (Nethery et al., 2021).
RSs also plays a crucial role in marking and organizing spacer sequences in genomes (Nethery et al., 2021).RSs sequences can exhibit modularity; that is, they are uniform and repeated throughout the CRISPR array (Yair and Gophna, 2019).The diversity of the RSs may also be related to the diversity of spacer sequences identified.
Spacer sequences in CRISPR-Cas systems provide information on the prior exposure of bacteria to various elements (Garrett, 2021).Our data showed that the analysis of these sequences corresponded to spacers and regions of interest, thereby ensuring potential immunity against prophages and plasmids, as described by other authors (Maniv et al., 2016).
Analysis of SS in the CRISPR-Cas system revealed a correlation between the identified bacteriophages in the genomes and those for which the bacteria store information within the spacers of the CRISPR-Cas system.Furthermore, the existence of genome groups is observed, which, although not sharing identical spacer sequences, may contain information related to the same prophage.
The variability of intact prophages in Acb complex genomes analyzed in this study suggest diversity.When the environmental bacteriophage diversity is high, CRISPR-Cas systems confer an advantage to bacteria harboring these gene elements.As already described, spacers can be effective against a specific bacteriophage fraction.In this context, the low diversity of prophages among genomes suggests that those carrying CRISPR-Cas systems confer greater fitness than those lacking them (Zaayman and Wheatley, 2022).
The identification of CRISPR-Cas systems involves exploring the flanking regions of the genome.Importantly, these flanking regions in all genomes carried by CRISPR-Cas systems were not always conserved.Our study revealed that the flanking regions encoded to Cap4 proteins associated with clustered bacterial immune systems (CBASSs) (Lowey et al., 2020), which interfere with the replication of prophages, providing bacteria with a protective mechanism against bacteriophage infections.Our data suggested that there could be genomes that harbor more than two defense mechanisms against bacteriophages in parallel, leading to a decrease in the fitness of the bacteria.In contrast, other authors have proposed that a CRISPR-Cas system in association with another mechanism enhances the fitness of bacteria; therefore, two systems can coexist, i.e., CRISPR-Cas systems, restrictionmodification (RM) systems, and Argonaute systems (Makarova et al., 2009;Oliveira et al., 2014;Lisitskaya et al., 2018).
Interestingly, bacteria carrying CRISPR-Cas systems have the ability to defend themselves against infection by bacteriophages.However, the bacteriophages have mechanisms that allow them to evade the action of CRISPR-Cas systems.Recent data indicate that these elements act at the DNA level or interfere with the binding of DNA to Cas proteins (Parsaeimehr et al., 2022) and in the presence of anti-CRISPR proteins in the genomes of A. baumannii (Yadav and Singh, 2022).However, more specific data detailing the characterization of A. baumannii and A. pitti, where anti-CRISPR Cas proteins have also been found, are still needed.These anti-CRISPR proteins seem to have different modes of action; in A. baumannii, they have been shown to act via recognition of the PAM sequence, and in A. pitti, they act by inhibiting the activity of the Cas protein (Niu et al., 2020;Forsberg, 2023).Interestingly, this study determined how bacteriophages carry anti-CRISPR-Cas information; in contrast, other bacteria do not contain anti-CRISPR-Cas information to act against the CRISPR-Cas system.

Conclusion
This study elucidates the diversity and complexity of CRISPR-Cas systems and other defense mechanisms in strains of the Acb complex.These systems play a crucial role in the adaptation and fitness of these microorganisms in various environments.The confirmed arrays displayed size and sequence variations, with consensus sequences proving difficult to link to specific Acb complex species.

FIGURE
FIGURE Identification of CRISPR-Cas systems through bioinformatics tools.Characteristics of the CRISPR-Cas systems detected in Acb complex chromosomes showing the SR consensus sequence and its variations.(A) Subtypes of CRISPR-Cas systems identified in A. baumannii chromosomes.(B) Subtypes of CRISPR-Cas systems identified in A. nosocomialis chromosomes.(C) Subtypes of CRISPR-Cas systems identified in A. pittii chromosomes.(D) Subtypes of CRISPR-Cas systems identified in A. calcoaceticus chromosomes.A graphic of the sequences was made with WebLogo (https://weblogo.berkeley.edu/., Crooks et al.,).

FIGURE
FIGUREIdentification of SSs in CRISPR-Cas systems.The number of spacers is indicated on each confirmed array.Shared SSs between genomes are highlighted in light green boxes, and unique spacers are shown in purple boxes.SSs that appear multiple times in the array and are shared between genomes are represented by light blue boxes.Unique SSs that appear multiple times in the array are shown in navy blue.Visualization was performed with the iTol program(Letunic and Bork,  ).

FIGURE
FIGURECRISPR-flanking genomic regions of the Acb complex.This figure shows the regions associated with the arrays, highlighting the regions encoding Cap , CAPPSs, transposons, and toxin-antitoxin systems.

TABLE CRISPR -
Cas systems confirmed in Acb complex genomes.