Computational investigations on target-site searching and recognition mechanisms by thymine DNA glycosylase during DNA repair process

DNA glycosylase, as one member of DNA repair machineries, plays an essential role in correcting mismatched/damaged DNA nucleotides by cleaving the N-glycosidic bond between the sugar and target nucleobase through the base excision repair (BER) pathways. Efficient corrections of these DNA lesions are critical for maintaining genome integrity and preventing premature aging and cancers. The target-site searching/recognition mechanisms and the subsequent conformational dynamics of DNA glycosylase, however, remain challenging to be characterized using experimental techniques. In this review, we summarize our recent studies of sequential structural changes of thymine DNA glycosylase (TDG) during the DNA repair process, achieved mostly by molecular dynamics (MD) simulations. Computational simulations allow us to reveal atomic-level structural dynamics of TDG as it approaches the target-site, and pinpoint the key structural elements responsible for regulating the translocation of TDG along DNA. Subsequently, upon locating the lesions, TDG adopts a base-flipping mechanism to extrude the mispaired nucleobase into the enzyme active-site. The constructed kinetic network model elucidates six metastable states during the base-extrusion process and suggests an active role of TDG in flipping the intrahelical nucleobase. Finally, the molecular mechanism of product release dynamics after catalysis is also summarized. Taken together, we highlight to what extent the computational simulations advance our knowledge and understanding of the molecular mechanism underlying the conformational dynamics of TDG, as well as the limitations of current theoretical work.


Introduction
Human genome is constantly under threat from external or internal damaging agents, such as the ultraviolet (UV) radiation and/or detrimental factors from host cells that can lead to chemical modifications of the nucleobases, including deamination, oxidation and alkylation [1]. These DNA lesions/errors can cause devastating diseases via altering the genome stability, including cancer and premature aging [2]. Fortunately, the cells have evolved advanced strategies to correct the above damages via various repairing mechanisms according to the nature of the lesions. Among which, base excision repair (BER), as one of the critical DNA repair pathways, targets and then corrects the damaged or mismatched nucleobases via consecutively recruiting and employing several enzymes, in-cluding DNA glycosylase, endonuclease, DNA polymerase and DNA ligase, along with other accessory proteins [3].
The base excision by TDG can be broadly viewed as a multi-step process, namely target searching, base-flipping, cleavage of the Nglycosidic bond and product release ( Figure 1A). As a repair enzyme, TDG is required to firstly pinpoint the target nucleobases among millions of canonical ones in a highly efficient way. Considering genome folding and crowding effects imposed by the cellular environment, a combination of 3D random collision, intersegmental transfers, sliding, and hopping, has been proposed as a highly efficient target-searching mechanism ( Figure 1B) [15]. After locating the target sites, TDG adopts a base-flipping mechanism to finally recognize the nucleobases in the active site, as also observed for many DNA-binding enzymes, including methyltransferases (e.g., M.HhaI, M.HaeIII, and M.TaqI), other DNA glycosylases [e.g., human alkyladenine DNA glycosylase (AAG), bacterial MutM, human OGG1, human uracil DNA glycosylase (UDG), and T4 EndoV], and endonucleases (e.g., E. coil endonuclease IV and human APE1) [16][17][18][19][20][21][22][23][24][25][26][27]. Then, the catalysis takes place by employing one water molecule as a nucleophile to attack the anomeric carbon C1′, resulting in the cleavage of the N-glycosidic bond between the target base and sugar [4]. Finally, the excised nucleobase is released from the TDG active-site, and the resulting TDG-DNA complex is then handed over to other BER enzymes.
Former experimental studies have successively resolved structures of several static TDG-DNA complexes involved in the abovementioned base excision process. The first glimpse of the TDGbound DNA complex was obtained by Maiti's group in 2008 [28]. In this structure, TDG is trapped to bind with either specific or nonspecific DNA chain, and the latter likely represents a form that TDG interrogates the nucleobases before base flipping. While in the specific complex, TDG binds to one DNA duplex with a flipped abasic nucleotide (AP-site), and one critical intercalation residue Arg275 can penetrate into the base step and occupy the space vacated by the flipped nucleobase. This structure thus corresponds to the product complex right after the catalysis. Notably, the follow-up Figure 1. Schematic diagram of target searching and recognition mechanism underlying the base excision repair process performed by TDG (A) Sequential conformational changes are required in the TDG-involved base excision repair process, including target-site searching/interrogation, base-flipping of the target nucleobase, and product release following catalysis. Notably, Arg275, the intercalation residue in TDG (illustrated as a blue rectangular), was found to stabilize the partially flipped nucleobase at the early stage of base eversion, which further promotes the subsequent base-flipping process. Moreover, Arg275 could occupy the void space left by the completely flipped nucleobase, thereby locks the TDG-DNA complex in a fully base-flipped state. (B) A facilitated diffusion mechanism has been proposed to describe the target-searching process performed by TDG, whereby a combination of 3D-diffusion, intersegmental transfer, and short-range sliding/hopping along the DNA strand might take place.
Despite the previous efforts on elucidating the structural features of TDG, the conformational dynamics of TDG involved in various stages of DNA repair process is still inaccessible using experimental techniques under limited spatial-temporal resolution. Molecular dynamics (MD) simulation, as a powerful computational tool, has been applied to investigate critical conformational dynamics of extensive biomolecular systems at atomic resolution. Notably, one can now explore comparatively long-timescale dynamics i.e., hundreds of microseconds of certain structural changes for complex biomolecules by constructing Markov state model (MSM) [36][37][38][39][40]. The general pipline for MSM construction is shown in Figure 2. MSM decomposes conformational space sampled from MD simulations into a set of microstates, where transitions within each state are relatively fast comparing to the inter-state transitions [41][42][43][44][45][46]. This separation of timescales allows the construction of a Markovian model, in which the probability of transiting from state i to state j depends only on the identity of i but not previously visited states. MSM can be built from extensive short MD trajectories (e.g., hundreds of nanoseconds MD simulations), and dynamics obtained from these short MD simulations can then be propagated to a longer timescale based on the following equation: where P(n∆t) is a vector of state populations at time n∆t and T is the transition probability matrix. In recent years, MSM has been successfully applied to elucidate conformational mechanisms of many biological molecules [47][48][49][50][51][52][53][54][55][56][57][58][59].
In this review, we summarize current understandings and our recent studies of the conformational dynamics of TDG involved in the DNA repair process from the perspective of computational simulations [60][61][62]. The focused subjects include how TDG searches for its targets when approaching the lesions, how DNA conformation, i.e., DNA bending, minor-groove width, and roll angle, affects the TDG recognition, and what key structural motifs in TDG are responsible for recognizing certain DNA deformation. We will also discuss molecular mechanisms of detailed conformational dynamics of the base-flipping and excised-thymine release processes. More in-depth reviews regarding the catalysis and biological functions of TDG can be found elsewhere [6,63].

Target-searching Mechanism of TDG
How the DNA-binding protein searches for its target site among millions of normal nucleobases is an intriguing research focus for extensive experimental and theoretical studies. A widely accepted model called "facilitated diffusion" has been proposed to describe the dynamics of target-searching proteins [15,[64][65][66][67][68][69]. That is, proteins first bind to nonspecific site on DNA via 3D diffusion, likely impacted by crowded cellular environments. Then, sliding or hopping motions of protein along DNA takes place until the protein finally locates the target-site. The sliding involves constant association between protein and DNA, while the hopping requires transient micro-dissociation of DNA-binding protein from DNA and rebinding at a few base pairs (bps) apart. In addition, interseg-mental transfers can also be a possible strategy whereby protein can transfer to other DNA strands or the same DNA molecule across a large bp interval.
To date, many experimental techniques, such as NMR [70], biochemical [71][72][73][74] and single-molecular fluoresces studies [75,76], have been utilized to investigate the target-searching mechanisms of DNA glycosylases. Early biochemical work indicated that the UDG employs a processive mode for lesion search [77]. Likewise, AAG can also undergo a 1D-diffusion motion along DNA, and increasing the ion concentration can profoundly impact on the efficiency of search [78]. Intriguingly, AAG can bypass the road-blocker molecule EcoRI and bind to other target sites [79]. Similar results have also been observed for other DNA glycosylases, including methyl-CpG-binding domain protein 4 (MBD4) [80], adenine DNA glycosylase (MutY) and bacterial formamidopyrimidine-DNA glycosylase (Fpg or MutM) [81]. In particular, Stivers's group designed a "molecular clock" strategy to differentiate the associative/dissociative diffusion and applied this method to UDG and hOGG1 [71,73]. Their results suggest that both systems can slide along DNA via 1D translocation mode within a few bp range, although this finding is different from the former single-molecule fluorescence study [73]. Moreover, further in vivo assays revealed that the cellular environments, such as the ion concentration and microenvironments, can significantly influence the diffusion rate of protein along DNA [82]. In addition, the presence of nucleosomes can also largely affect the DNA-repair efficiency [83][84][85].
Considering the similar structural folds between TDG and UDG, it can be expected that TDG might be also capable of sliding along DNA in an associative mode, as previously observed for UDG [71]. Recent atomic force microscopy (AFM) and fluorescence studies showed that the TDG binding can bend both non-specific and specific DNA chain at two dominant conformations, with a bend angle of~30°and~60°, respectively [86]. Specifically, the former conformation is also present for the lesion-containing DNA before TDG binding, suggesting that the 30°-conformation is an intrinsic property of the free DNA, and the TDG binding can further induce DNA to a more severely bended conformation. Despite the extensive structural studies of TDG in complex with various mismatched/damaged nucleobases [29-35], the detailed mechanisms of how TDG scans and targets to the lesion sites before base-flipping remain elusive. Previous studies have obtained one TDG-DNA complex where the interrogating bp is an intrahelical form and also a canonical bp (PDB id: 2rba). Therefore, the 2rba structure is unable to reveal the true binding process between TDG and DNA prior to the base extrusion [28].
To reveal how TDG locates to the target-site when approaching the lesions, we investigated a rotation-coupled sliding dynamic process of TDG along a 9-bp DNA segment that contains a G:T mismatch by constructing MSMs based on extensive unbiased MD simulations [60] (Figure 3). We firstly built nine TDG-bound DNA complexes wherein TDG binds at varied bp-site. We then performed a number of targeted MD (TMD) simulations to derive an initial TDG sliding pathway along DNA. The resulting TMD trajectories were subject to extensive unbiased MD simulations. Around 25 microsecond MD simulations dataset was finally collected and MSM was constructed. The MSM results clearly identify nine metastable states, and for each state TDG locates at a certain bp-site. Thus, the transition of TDG between two adjacent bp sites is expected to overcome an energy barrier. Notably, the thermodynamically most

798
Structural dynamics of TDG at atomic resolution favorable state is the conformations where TDG targets to the G:T mispair. From a kinetics view, TDG is found to diffuse rapidly when it is distant from the target-site, whereas it slows down as it approaches the mispaired site. This perturbed sliding rate of TDG is originated from the profound structural changes of TDG induced by the altered interacting interfaces between TDG and DNA during the process of locating the target. Particularly, one key intercalation loop in TDG (residues Ala274- Then, extensive unbiased molecular dynamics (MD) simulations can be performed to explore the phase space along the above initial transition path. The collected simulation dataset can be finally used to construct MSM, that is, clustering the MD conformations into hundreds of microstates using time-structure independent components analysis (tICA) and k-centers/k-means, etc., followed by lumping the microstates into several macrostates according to their kinetic properties.

Structural dynamics of TDG at atomic resolution
Ala277) can switch between various conformations during the sliding process [60]. When TDG locates at the nonspecific sites, the intercalation loop tends to insert shallowly into the DNA minorgroove, thereby adopting more flexible and solvent-exposed conformations. However, as TDG gets close to the target-site, the in-tercalation loop penetrates deeply into the DNA minor-groove by exhibiting a conformation resembling to the crystal structures (e.g., 2rba). This interrogating loop-conformation potentially promotes the opening of the G:T mispair, DNA bending and widening of the minor-groove. In addition, two nearby TDG residues, Phe252 and . Nine metastable states (S1-S9) identified by MSM during the target-searching process performed by TDG Representative conformations for S1-S9 are shown. The representative conformation of each metastable state was randomly selected from the most populated microstates belonging to this state. For each state, the investigated 9-base pairs (bp) are shown in orange cartoon and the interrogated bp is depicted in orange sticks except that the mismatched G:T in S5 is highlighted with violet and green sticks. The intercalation loop of TDG is highlighted with blue cartoon. The zoomed-in view of intercalation loop is shown on the right panel, with the key intercalated residue Arg275 highlighted with blue sticks. In addition, the sliding direction of TDG and the starting point (at S1) are labeled by a grey arrow and black dashed line, respectively. The figures are modified from ref. 60.

800
Structural dynamics of TDG at atomic resolution Tyr288, are found to stabilize the specific loop-conformation via nonpolar interactions. Moreover, three positively charged TDG residues, i.e., Lys161, Lys232, and Arg281, serve as key electrostatic anchor points for facilitating the TDG transfer between adjacent bpsites. That is, TDG tends to establish stable interactions via the above three residues with the DNA backbones when interrogating a certain bp site, whereas transiting to the adjacent sites necessitate the break of these salt-bridge contacts, resulting in loosed interactions between TDG and DNA. Our computational modeling therefore warrants further experimental tests. It is noteworthy that the observed conformational switches of TDG during target searching have also been highlighted in other DNA-binding proteins, e.g., AAG [87], hOGG1 [23,73,75], and UDG [88,89]. In specific, a two-state model (open and closed) has been proposed to describe the interplays between protein and DNA, whereby the nonspecific binding favors an 'open' conformation of the DNA-targeting proteins. Meanwhile, binding to the specific site leads to a stable and 'closed' protein conformation [88,89]. Additionally, the TDG binding at the target site can profoundly bend the DNA backbone at~20°, which is consistent with former simulations result that TDG prefers to recognize a DNA conformation that bends at~20° [90]. In this work, by constructing several TDG-DNA complexes with varied DNA bending angles (ranging from 0°t o 60°), we discovered the key TDG residues responsible for recognizing certain bended DNA conformation. More strikingly, scrutinized structural analyses indicate that the roll-angle patterns of consecutive bps are well correlated with the DNA bend angle. This work therefore provides structural insights into the molecular mechanisms underlying the TDG-DNA recognition before baseflipping.

Base-flipping Dynamics of TDG Substrates
When locating the lesions, TDG adopts a base-flipping strategy to extrude the target nucleobases from the DNA duplex into the TDG active-site, as observed in many DNA-binding proteins [21][22][23][24][25]. The base eversion is accompanied by the deep insertion of the TDG intercalation-loop into the DNA minor-groove. In particular, the key loop-residue Arg275 can occupy the void space left by the flipped base, thereby preventing the flipped base swinging back to the DNA duplex. Early studies have demonstrated that the intrahelical nucleobases can spontaneously flip from DNA helix, even for the canonical bps [91][92][93]. Introduction of the mismatched bp would significantly promote the base-flipping event [94]. It can be expected that the presence of DNA-binding proteins can impose significant structural distortions in DNA (e.g., backbone bending and bp opening etc.), which in turn facilitates the base-extrusion process [29,95,96]. As described above, although many base-flipped TDG-DNA complexes have been obtained using crystallographic methods, the TDG-DNA complex before base-flipping and the complete base-flipping dynamics remain unclear.
One interrogated complex of TDG-DNA structure where TDG inspects an intrahelical bp was built by computational modeling and employed as the starting structure to generate initial base-flipping pathways [61]. A series of MSMs were then constructed to reveal the complete base-flipping dynamics of one mismatched thymine exerted by TDG based on extensive MD simulations. The resulting MSM captured the key intermediates of the flipped thymine during the complete base-extrusion process, and revealed the critical TDG residues responsible for mediating the inter-state transitions, including the residues Gly142, Asn157, Ser272, Ser273, and Cys276 ( Figure 4A). Additional comparison studies were conducted to evaluate how different TDG substrates, i.e., dU, 5fC, and 5caC, influence the base-flipping dynamics. The transition-state analyses suggest that the base-flipping rates for various nucleobases likely follow an order of dU≈5fC>5caC>dT, owning to the varied chemical groups ( Figure 4B). The above findings not only comply with existing experimental evidence [29,97], but also potentiate additional experimental validation.
One intriguing question for the base-flipping study is whether the proteins recognize the target nucleobases in an active or a passive mode. The active mode describes a searching mechanism whereby the protein firstly inspects an intrahelical bp, then proactively promotes the base-flipping. An alternative scenario is that the spontaneously flipped nucleobases can be transiently captured by the searching proteins (passive mode). Former NMR work has suggested a passive searching mechanism for UDG [70]. An active baseextrusion, however, has been proposed for Fpg/MutM using singlemolecular and crystallographic techniques, and also for bacterial 3methyladenine glycosylase (AlkA) and MutY [98,99]. Likewise, our computational work supports an active involvement of TDG in triggering the base flipping. Specifically, the intercalated residue Arg275 plays an essential role in stabilizing the partially flipped thymine via cation-π interactions, resulting in a low energetic flipping path along the DNA minor groove [61].

Dynamics of the Product Release from TDG Active-site after Catalysis
Upon entering into the TDG active-site, the flipped nucleobase can be cleaved by employing one water molecule as nucleophile [1,100,101]. Then, the excised base has to be released from the active site to generate a stable TDG-DNA complex with an AP-site. The bound TDG is responsible for protecting the AP-site from undesired damages and recruiting other BER enzymes. Former experimental efforts have endeavored to trap the tertiary TDG complex after catalysis, however, all failed to capture any excised base in the active site, suggesting that the product is prone to dissociating from the binding pocket right after the cleavage [31,33,34]. Nevertheless, potential product-releasing channels have been proposed by former crystallographic studies [31].
Computational simulations have been conducted to identify the key intermediates of the excised thymine as well as to demonstrate complete product-release pathway and the associated thermodynamic/kinetic properties [62]. The constructed MSM revealed the detailed interaction networks between the product and TDG-DNA complex, and pinpointed the rate-limiting transition during the whole product release process ( Figure 5A,B). Moreover, structural inspections demonstrated strong interplays between TDG and DNA chains that, via a conformational selection mode, facilitate the transfer of excised thymine through the narrow releasing channel ( Figure 5C-E) [62]. More intriguingly, the study identified a key TDG residue, Gly142, as a gating residue lying along the product release pathway. Gly142 therefore serves as a potential substitution site to trap the product in the active site. In our previous study, we performed free energy calculations to evaluate the relative productreleasing rates between the wild-type (WT) and the G142Y mutant [62]. Specifically, the potential of mean force along the dominant releasing path was calculated by conducting steered molecular dynamics simulation with constant velocity for each system. The

801
Structural dynamics of TDG at atomic resolution

802
Structural dynamics of TDG at atomic resolution distance between the center of mass of the product and one backbone P atom of DNA was defined as the reaction coordinate (RC). The final free energy profiles were obtained based on the Jarzynski's equality that evaluates the free energy difference between two states from the performed work [102]. The results indicate that the G142Y mutant shows a higher transition barrier during the product release process than the WT, with a free-energy difference of~7.3 k B T, which corresponds to~1000-fold decrease in the transition rate. This work thus provides a potential solution to successively obtain the tertiary complex using experimental techniques.
To examine how different bases (e.g., U, 5hmU, 5fC, and 5caC) impact the release kinetics, the ideal way is to construct MSM for each base system as we did for the G:T mispair, which, however, requires tremendous computational resources to construct a reliable kinetic model. Instead, Da et al. [62] evaluated the relative productreleasing rates for the above TDG substrates, by employing the thymine as a reference system. For each given system, the RC and detailed setups for free-energy calculation are the same as those used for the G142Y mutant. The results show that U, 5caC, and 5hmU exhibit relatively faster product releasing rate than thymine, whereas 5fC displays similar release kinetics. Such a methodology thus provides an efficient way to evaluate the relative releasing rates for various TDG substrates. Taken together, the computational work demonstrates that the overall releasing time for thymine is 10 μs. Considering the similar structural fold of TDG to that of UDG, we hypothesize that product-releasing mechanism identified in TDG is likely universal among different UNG members.

Conclusion Remarks
Here, we reviewed the current progress towards understanding the critical conformational dynamics of TDG involved in the BER pathway in a perspective of computational simulations. The main focus includes the searching dynamics of TDG along DNA that contains one specific target-site, base-flipping dynamics of one thymine nucleotide promoted by TDG, and the product release process after the catalysis with TDG. By constructing the kinetic models, we are able to identify the key intermediate states involved in the conformational transitions and pinpoint the critical structural elements and residues responsible for regulating the state-to-state transitions. As a result, the theoretical and computational work support designing new experimental tests for further model validations. Despite that, there are some limitations of the current computational studies. In the target-searching work, only one possible searching mechanism, i.e., rotation-coupled sliding motion, is investigated. Alternative scenario, such as the hopping model, has not been taken into account. Therefore, comparisons of different searching strategies are yet to be achieved. In addition, we examined the relative kinetics of various TDG substrates for the base-flipping or product-release process, by assuming that all the substrates undergo similar transition paths. The assumption, however, is not necessarily true, as different TDG substrates with varied chemical groups may exhibit totally different conformational transitions.

803
Structural dynamics of TDG at atomic resolution Additional efforts are needed to develop cutting-edge computation methods to explore protein-DNA conformational space in a more efficient way, for example, using machine learning method, etc.

Funding
This work was supported by the grants from the Natural Science Foundation of Shanghai (Nos. 20ZR1425400 and 21JC1403100), and the Startup Funding from Shanghai Center for Systems Biomedicine, Shanghai Jiao Tong University (No. WF220441503).