Reverse Transcriptase Mechanism of Somatic Hypermutation: 60 Years of Clonal Selection Theory

The evidence for the reverse transcriptase mechanism of somatic hypermutation is substantial and multifactorial. In this 60th anniversary year of the publication of Sir MacFarlane Burnet’s Clonal Selection Theory, the evidence is briefly reviewed and updated.


OvervieW
The molecular mechanism underlying somatic hypermutation (SHM) of rearranged immunoglobulin (Ig) genes (V[D]J) has been controversial for some time. Although the process of DNA deamination has dominated discussion in recent years, insufficient attention has been paid to a mechanism based on reverse transcription. One reason therefore for writing this Perspective is to counter balance a widely held view in the Ig SHM field that all relevant studies on the molecular mechanism deal only with the "DNA Deamination Model" which ended in complete consensus over 10 years ago sometime between 2004 and 2007 [ Table 1 and Ref. (1) in particular]. The other is a personal tribute, in this anniversary year, to the founder of modern immunology, Sir Macfarlane Burnet. It is now 60 years since the publication of the first iteration of "The Clonal Selection Theory of Acquired Immunity" (2), the foundation stone of modern immunology. It was fully expounded in his 1959 book (3) where the main idea was clonal antigenic selection from a pre-existing diverse antibody repertoire from which somatic mutations might emerge as "forbidden" anti-self clones. Joshua Lederberg then gave the concept sharp molecular focus (4) as did Melvin Cohn and colleagues (5)(6)(7). Alastair Cunningham's concept of "clonal variation around a theme" placed antigendriven SHM firmly within the context of expanding B lymphocyte clones (8). Somatic mutation of Ig variable region genes has therefore been part and parcel of Burnet's clonal selection concept since its inception and is central to a rational understanding of immunological diversification, self-tolerance and the emergence of cancer. We now have a very good idea of the molecular mechanism of SHM. I have chosen to fit this scientific progress within 60 key publications since the late 1950s ( Table 1). The most plausible central molecular mechanism of Ig SHM, that fits with and explains all the evidence (9-11) is based on "Reverse Transcription" of the base-modified Ig pre-mRNA (Figure 1). That is, error-prone reverse transcription, by DNA Polymerase-η, of the Ig pre-mRNA template intermediate at rearranged V[D]J gene somatic loci. The Ig pre-mRNA encoding the V[D]J region is copied off the transcribed DNA strand carrying prior AID C-to-U deamination lesions (Uracils and Abasic sites), and it also accumulates ADAR-deaminase mediated RNA editing A-to-I modifications. This already base-modified pre-mRNA sequence is then copied back to the B lymphocyte genomic DNA and integrated at the rearranged V[D]J site (concurrent with antigen-mediated selection of Ig receptor bearing B lymphocytes, Centrocytes, in the Germinal Center). This is essentially the "Reverse Transcriptase Mechanism" which Jeff Pollard and I first published 30 years ago (12). The   (10), as well as from figures in Steele (9,11) and Steele and Lindley (14). This is also an adaptation of the target site reverse transcription process of Luan et al. (66). Shown is an RNA Polymerase II-generated Transcription Bubble with C-site and A-site substrate deamination events by AID and ADAR proteins, which generates the strand-biased mutation signatures-A-to-G, G-to-A, G-to-T, and G-to-C (9,11,14). DNA strands shown by black lines; pre-mRNA as red lines; cDNA strands as thick blue lines due to DNA polymerase η (59). Green bars are Inosines. Shown also is the action of the RNA exosome (64) allowing access of AID deaminase to cytosines on the transcribed strand (TS). The ssDNA regions on the displaced non-transcribed strand (NTS) are established targets of AID action (53)(54)(55)(56). Note that DNA mutations are first introduced as AID-mediated C-to-U, followed by excision of uracils by DNA glycosylase (UNG), which creates Abasic sites in the TS (these can mature into single strand nicks with 3′-OH ends via the action of AP endonuclease). These template Uracil and Abasic sites can be copied into pre-mRNA by RNA Pol II generating G-to-A and G-to-C modifications as shown (67). Following target site reverse transcription (66), this results in G-to-A and G-to-C mutations in the NTS, in a strand biased manner (9)(10)(11)14). Separately at WA targets in nascent dsRNA substrates, adenosine-to-inosine (A-to-I) RNA editing events, mediated by ADAR1 deaminase, are copied back into DNA by reverse transcription via Pol-η (59). In theory, ADARs can also deaminate the RNA and DNA moieties in the RNA: DNA hybrid (14,15). The strand invasion and integration of newly synthesized cDNA TS, as well as random-template mismatch repair (68) are hypothesized additional steps (not shown here). In short, RNA Pol II introduces modifications in the Ig pre-mRNA as it copies TS DNA with AID lesions and this is coupled to A-to-I in dsRNA stem-loops near the transcription bubble (62) as well as in RNA:DNA hybrids within the bubble (14, 15). Next, a RT-priming substrate is formed when the nicked TS strand with an exposed 3′-OH end anneals with the base modified pre-mRNA copying template allowing cDNA synthesis by Y Family translesion DNA polymerase-η (48), now acting in its reverse transcriptase mode (59). These 3′-OH annealed priming sites could arise due to excisions at previous AID-mediated Abasic sites. Alternatively, they could arise due to an endonuclease excision associated with the MSH2-MSH6 heterodimer engaging a U:G mispaired lesion (61). Shown is an A-to-T transversion generated at the RT step at a template Inosine. ADAR, Adenosine Deaminase that acts on RNA; AP, an Abasic, or apurinic/apyrimidinic, site; APOBEC family, generic abbreviation for the dC-to-dU deaminase family of which AID is a member (e.g., APOBEC1; APOBEC3 A, B, C, D, F, G, H); AID, activation induced cytidine deaminase causing C-to-U lesions at WRCY/RGYW C-site motifs in ssDNA; W, A, or U/T; WA-site, target motif for ADAR deaminase including DNA polymerase-η error prone incorporation in vitro (50,51); Y, pyrimidines T/U or C.; R, purine A or G. mechanistic steps, many logical, are clearly outlined in Figure 1, which shows that the characteristic A >> T and G >> C strand bias-generating mutagenic activity is firmly focused on the nascent RNA intermediate in the context of the Transcription Bubble (9-11, 13, 14). Recent publications should be consulted for further definitive ADAR A-to-I editing of both RNA and DNA moieties at RNA:DNA hybrids within Transcription Bubbles (11,14,15). Not only is it important to understand the correct molecular mechanism of SHM for cancer diagnosis and detection (16,17) but also to the current efforts to better understand (18, 19) the origin of Ig diversity involving the mechanism of evolution of the sets germline V segments and the long IGHV and IGLV haplotypes in individual human beings (20,21).

criticAL FOcUs ON tHe rNA/rt-MecHANisM
The author has comprehensively reviewed the detailed evidence for the reverse transcription-based mechanism of SHM in previous and current studies (9)(10)(11). However, as flagged at the start of this article, many immunology researchers describe the mechanism of Ig SHM as being via DNA Polymerase-η-mediated DNA lesion repair independent of pre-mRNA in the context of the AID-initiated "DNA Deamination Model. " It will be informative then to not only refer to these literatures but also summarize the evidence directly supporting an Ig pre-mRNA intermediate and reverse transcription, as summarized in Figure 1.
The alternative to the RNA/RT-based mechanism is the "DNA Deamination Model, " which is assumed to be coupled to direct DNA-based error-prone repair via translesion DNA polymerase-η acting solely by error-prone copying of DNA templates (50, 51) during gap-repair surrounding AID-generated lesions (Uracils, Abasic sites, ssDNA nicks), as outlined in detail by Neuberger and associates (1, 58), Gearhart and associates (61,65), and many other laboratories (53)(54)(55)(56)(57)63) published mainly in the period 2002-2011. Quite apart from all the data at odds or inconsistent with this alternative theory, there have been three direct published tests of the Reverse Transcriptase Mechanism since 2001, one study was inconclusive and two studies reported positive data directly consistent with the RNA/ RT-based mechanism.
In the first direct test of the RT model, Sack et al. (69) treated immunized mice with retroviral RT inhibitors, AZT, ddC and determined mutation frequencies in the anti-NP response of the rearranged VH186.2 sequence from control and test mice and showed a systematic lowering of the somatic mutation frequency by about 33-35% in both test groups compared to the control [see Table 2 in Ref. (69)]. The authors however concluded that these retroviral RT inhibitors had no statistically significant effect (the P values were P = 0.056 and P = 0.069, respectively), thus claiming that "standard reverse transcription is not required for antibody V region hypermutation in the mouse" (69). This study and the conclusions drawn have been critically evaluated, and the present author considers that the data published in Sack et al. (69) have been misinterpreted (9,11,70).
In the next test, Franklin et al. [(59), Figure 1 and legend] showed that the sole known error-prone DNA polymerase involved in Ig SHM, DNA Polymerase-η (52, 63) is a very efficient reverse transcriptase: as indeed are human DNA Polymerases iota (-ι) and kappa (-κ) although less active than eta (-η).
Lastly Steele et al. (62) tested directly if a quantitative relationship exists between the number of appropriate Ig VκOxJκ5 mRNA secondary structures bearing WA target sites for the ADAR1 RNA editor (adenosine to inosine, A-to-I) and the recorded incidence, across the full length of the in vivo mutated VκOx1Jκ5 sequence, of A-to-G mutations (the standard proxy for A-to-I RNA editing, where W = A or T). We showed that a highly significant and specific correlation (P < 0.002) existed between the frequency (or number) of WA-to-WG mutations and the number of mRNA hairpins that could potentially form at such WA mutation sites. This is still the best direct data-driven evidence for an RNA intermediary in Ig SHM as it implies a direct role for both RNA editing and reverse transcription during SHM in vivo, occurring at the highest frequency in the nascent RNA stem-loops presenting WA-sites in dsRNA substrates just emergent from the Transcription Bubble. We now also know that both the RNA and DNA moieties in the RNA:DNA hybrid in the Transcription Bubble can potentially be A-to-I edited and contribute to A-to-G and T-to-C somatic mutations (14,15).
These two sets of positive results consistent with the RNA/ RT-based model are completely outside the ambit of the "DNA Deamination Model" neither explained by it nor predicted by it (9,11). This fact was pointed out explicitly in 2008 (71).
The reader is referred to the considerable detail reviewed in Steele (9,11) and Lindley and Steele (10), but attention should also be drawn to an awkward fact that cannot be explained by the "DNA Deamination" model yet is readily explained and predicted by the RNA/RT-mechanism (Figure 1)-these are the clear strand biases of somatic mutations whereby mutations off A exceed mutations off T (A >> T, mainly A-to-G >> T-to-C) and yet paradoxically in the same data set or experiment, somatic mutations off G exceed mutations of C (G >> C, mainly G-to-A >> C-to-T). We have illustrated the contradictions of this paradox clearly in Lindley and Steele (10)-as these characteristic strand biases are noted not only in Ig SHM datasets but also in AID/APOBEC driven "Ig-SHM-like responses" in cancer genomes (10,16).
The other foundation inspiration for our work is the series of discoveries, begun in the 1950s (72,73), which led to the demonstration in 1970 of reverse transcription in RNA tumor viruses by Howard Temin and David Baltimore (74,75).
In summary, the DNA-based model of Neuberger and Gearhart, or the "DNA Deamination Model, " is based on AIDinduced C-to-U lesions and short-patch error-prone DNA repair by DNA Polymerase-η operating around such lesions (1,61,65). However, the RNA/RT-based mechanism ("Reverse Transcriptase Model") actually subsumes this initiating AIDmediated step and then couples it in the production of the full spectrum of strand-biased mutations at both G:C and A:T base pairs: error-prone cDNA synthesis via an RNA-dependent DNA polymerase (Pol-η) copying the base-modified Ig pre-mRNA template and leading to this now error-filled cDNA copy being integrated back into the normal chromosomal site (Figure 1). The modern form of this mechanism thus depends both on initiating AID C-to-U lesions in DNA and then long-tract errorprone cDNA synthesis of the TS by DNA Polymerase-η acting in its reverse transcriptase mode (59). There are several possible tests. The first could involve measuring the outcome of ADAR A-to-I editing of the RNA and DNA moieties at RNA:DNA hybrids (15) during SHM in vivo. Thus on a DNA polymerase-η deficient background (52,63) the lowered number of mutations at A:T base pairs may allow A-to-I editing of the RNA:DNA hybrid and nascent dsRNA stem loops (Figure 1), but the lack of a RNA-to-DNA copying step could show that T-to-C mutations now balance or exceed A-to-G mutations. Furthermore, a direct test of ADAR deamination in Ig SHM in vivo could be achieved in either ADAR1 deficient Aicardi-Goutières Syndrome (AGS) patients (76,77) or catalytically inactive ADAR1 mouse strains, such as Adar1 E861A/E861A Ifih1 −/− (78). The caveat to both approaches is a statistically sufficient numbers of A/T mutations and a strategy to avoid or minimize strand bias blunting PCR recombinant artifacts (9).

AcKNOWLeDGMeNts
The author appreciates the critical comments of reviewers and editors in preparation of this MS for publication.