First-principles anharmonic quantum calculations for peptide spectroscopy: VSCF calculations and comparison with experiments.

First-principles quantum calculations for anharmonic vibrational spectroscopy of three protected dipeptides are carried out and compared with experimental data. Using hybrid HF/MP2 potentials, the Vibrational Self-Consistent Field with Second-Order Perturbation Correction (VSCF-PT2) algorithm is used to compute the spectra without any ad hoc scaling or fitting. All of the vibrational modes (135 for the largest system) are treated quantum mechanically and anharmonically using full pair-wise coupling potentials to represent the interaction between diﬀerent modes. In the hybrid potential scheme the MP2 method is used for the harmonic part of the potential and a modified HF method is used for the anharmonic part. The overall agreement between computed spectra and experiment is very good and reveals diﬀerent signatures for diﬀerent conformers. This study shows that first-principles spectroscopic calculations of good accuracy are possible for dipeptides hence it opens possibilities for determination of dipeptide conformer structures by comparison of spectroscopic calculations with experiment.


Introduction
Biological functions of peptides and proteins largely depend on their 3D structures. The high conformational diversity in biomolecules sets a challenge to the high-resolution experiments and theoretical approaches to predict their accurate structures and the intramolecular force-fields. The gas phase high-resolution IR spectroscopy along with theoretical calculations can reliably reveal the intrinsic structure of small biomolecules. However, with the increasing size and conformational complexity, structural identifications become more challenging. In recent years, considerable advancements have been made in highresolution measurements of mid-size peptides by using cryogenic cooling of biomolecular ions, 1-6 resonance two-photon IR, 7 fluorescence-dip spectroscopy, 8 and by other techniques. [9][10][11][12][13] These advanced methods are capable of yielding the IR spectroscopy of individual conformers which are free from interference from one another. The vibrational spectra obtained by these experiments provide many characteristic features and constraints which have to be met by theoretical treatments. This challenges computational approaches to investigate the vibrational spectroscopy of biomolecules with adequate accuracy but reasonable computational effort. Several theoretical methods are available to compute the IR spectra. The first-principles approaches, which use suitable quantum chemical potentials and vibrational methods, have the importance of being free of adjustable parameters and of fitting to experiment. The simplest common approach in this direction is the harmonic oscillator (HO) approximation. While it is easily applicable for large systems, its accuracy is inadequate. Therefore, anharmonic treatment is essential for better accuracy. As an alternative approach, scaling corrections are used for the harmonic frequencies to match with experimental results. However, this approach is empirical and does not reflect on the potential energy surface (PES) underlying the system. On the other hand, rigorous vibrational methods are not feasible for large systems due to very high computational cost. Therefore, a suitable approximate method is needed which can be applied to large systems like peptides and proteins with adequate accuracy. One such inexpensive yet practical method is the VSCF approximation. [14][15][16][17][18][19][20][21] It has been successfully used for small to medium sized biomolecules. [22][23][24][25][26] Other alternatives such as perturbation theory 27,28 and classical MD simulations 29 are also available. The former has been applied with good accuracy for small biomolecules. 30 The Born-Oppenheimer MD (BOMD) simulations are applied for biomolecules where the nuclei are treated classically and the electrons are treated quantum mechanically using a DFT method. 31,32 However, at low temperature where the quantum effects are significant the classical MD calculations may result large errors especially for intensities.
In this contribution, we calculate the first-principles based anharmonic IR spectra of three protected dipeptides by means of quantum mechanically calculated PES using quantum VSCF-PT2 approximation 33 of a level that turns out to be sufficiently accurate for comparison with experiments. So far, VSCF calculations were carried out for peptides and a protein using empirical 34 and semi-empirical 25 potentials. However, these potentials are unrealistic at least for the purpose of spectroscopy. Recently Roy et al. 35 showed the possibilities to calculate the anharmonic spectroscopy of a decapeptide antibiotic gramicidine S. Due to the very large size (176 atoms) of the decapeptide, a few subsets of important vibrational modes (12-16 modes per set) were considered to construct the pairwise coupling potential. The results showed very good accuracy in comparison with experiment. In the present study, we used quantum chemical full pair-wise coupling potentials considering all the vibrational modes of the three dipeptides. Until now, VSCF-PT2 calculations using quantum chemical potentials of suitable accuracy were carried out for amino acids, 22,36 their complexes with water molecules 37 and other biomolecules 38 of similar sizes. The present paper extends the state-of-the-art quantum VSCF-PT2 calculations to dipeptides. Most of the earlier works on vibrational calculations of dipeptides were based on the HO 39 or use of the empirical scaling factors 7,8,40,41 or on retaining up to cubic coupling terms of the Taylor series expansion of the vibrational potential. 6 The three protected dipeptides chosen here contain no free acid (COOH) and/or basic (NH 2 ) groups ( Fig. 1). Such protections are extensively used in experimental studies. As a result, these are suitable to consider as model systems in quest of understanding the 3D structures of polypeptides. The first two systems are the two lowest energy conformers of methyl-capped dipeptides: N-acetyl tryptophan amide (NATA) represented by two different structural motifs, say C5 and C7. 8 The C5 structure (NATA-C5) consists of extended peptide backbone and the C7 structure (NATA-C7) contains a hydrogen bond of c-amide NH to f-amide carbonyl group forming a seven member ring. The third structure is a dipeptide of Ac-Val-Phe-OMe (AVPO), protected at both the terminal positions by an acetyl and methyl group. 7 All the three structures were measured in gas phase and a few characteristic spectral data were reported. In those studies 7,8 the harmonic IR spectra at HF and DFT levels were computed and scaled by empirical scaling factors to bring the calculated frequencies close to the experiment. For AVPO, different scaling factors were used for higher and lower frequency ranges. 7 It is often found that such multiple scaling factors are needed depending on the degree of anharmonic couplings in vibrational transitions. While use of such multiple scaling allows a good fit for many of the vibrational modes it does not provide any specific details of the molecular PES since the factors are empirical and not obtained from an anharmonic potential. The empirical nature of the anharmonic correction factors reduces the confidence in the vibrational prediction. In comparison with experiment, the 3D structure determinations thus may not be completely reliable. That motivates to compute the anharmonic vibrational spectra of large biomolecules using quantum mechanical VSCF approximation without any ad hoc adjustments.

Theory
The basic variant of VSCF theory uses a separable approximation. 14,15 It assumes that the full vibrational wave function is a product of single mode wave functions corresponding to each normal mode. Using a variational principle the final working equations for the wave functions and vibrational energy levels are derived. These involve mean field potentials that represent the average effects of the other modes on each mode. The corresponding equations are solved self-consistently using numerical techniques to get the vibrational states. The major challenge for this approach is the calculations of the multi-dimensional potentials which depend on its typical mathematical form. The VSCF potential can be written as a sum of one-mode, two-mode, three-mode terms and so on in mass weighted normal coordinate by (Q), The first term in eqn (1) is a sum of terms, where each is a single normal-mode (the so-called ''diagonal approximation''). The second term consists of pair-wise couplings between different normal modes, etc. As a result of this form of the potential, the total number of grid points (NP) can be written as, where, NG is the number of grid points along a normal mode and NV is the number of vibrational modes. The first term gives one-dimensional grid points and yields intrinsic anharmonicity. The next term gives the total number of two-dimensional grid points and hence pair-wise coupling and so on. It can be easily seen that the number of grid points increases very rapidly with the number of normal modes, number of grid points per normal modes and the dimensionality of the potential. In this work the full PES is simplified as a sum of single-mode plus a sum of pair-wise coupling interactions between the normal modes to carry out the calculations for the dipeptides of 93 and 135 vibrational modes in a reasonable time scale. It is already found that this pair-wise coupling approximation leads to good accuracy in many applications. 22,23,38 To bring in effect of non-separability, the vibrational levels can be further corrected by second-order perturbation theory (VSCF-PT2). 33,38 This method is typically more accurate than the basic VSCF approximation. Any perturbative methods may suffer from degeneracy issues which are discussed later. The corresponding vibrational band intensities (in km mol À1 ) are computed by standard expression using anharmonic wave functions and the transition moments are calculated directly from the given electronic structure method. The value of the dipole moments of the three components are considered at the same points as the potential. Using an Eckart frame the integrated absorption coefficients for vibrational band intensities were computed as, In this equation it is considered that all the initial population is in the vibrational ground state. Ignoring the thermal effect on population for simplicity the (n 0 À n m ) is set equal to one, where n 0 and n m are fractions of molecules in the initial and final states corresponding to zero temperature. c (0) i and c (m) i are VSCF wave functions for the ground and excited vibrational states. The vector u, is a function of the nuclear coordinates and evaluated from electronic structure theory. Considering the equilibrium position as the origin, u can be expanded in powers of normal coordinates. It is a good approximation for fundamental and overtone transitions, at least for the low lying states, to consider only u(0,. . .,Q i ,. . .,0) which we denote by u(Q i ). The intensities are evaluated by numerical integration using the numerical anharmonic wavefunctions computed from the VSCF equations. More details of the VSCF and VSCF-PT2 theories can be found in a recent review. 38 As the main computational bottleneck of the VSCF-PT2 calculations is the construction of the pair-wise coupling potentials which require calculations of the multi-dimensional grid points, the choice of the quantum mechanical methods to construct the potential is very crucial. It was found that both the DFT and post HF methods (such as MP2) are suitable for the VSCF calculations to get good accuracy. While the DFT based methods are more practical to apply for large systems than MP2, the latter gives slightly better results than the former. 22 However, calculations of the full pair-wise couplings by MP2 or DFT based methods are computationally intensive for such large systems. The harmonic part of the potential requires the greatest accuracy, since the harmonic contribution to each frequency is much larger than the anharmonic one.
Considering this fact we used a hybrid potential scheme by ''upgrading'' PESs of a lower level to produce more accurate anharmonic frequencies. 36 This improved potential can be written as Here V LM and V ULM are the potentials for N-mode systems at the low-level method (LM) and upgraded low-level method (ULM), respectively. The scaling coefficients l i are the ratio of the harmonic frequencies of the LM (w LM i ) to the high-level method (w HM i ). These scaling coefficients are chosen such that the upgraded potential reproduces the harmonic frequencies of the high level potential. It was applied successfully for several small systems. 26,36 The upgraded potentials constructed here are calculated from the much faster HF method while the harmonic potentials at this level are upgraded by much more accurate MP2 method. We used double-zeta Dunning type cc-pVDZ basis and 8 grid points per normal modes throughout this study and calculated the fundamental bands for all the vibrational modes. All the calculations are performed using locally modified GAMESS program. 42

Results and discussions
The structures of the two conformers of NATA and AVPO are taken from the previous studies. 7,8 It is found that, similar to the preceding studies, the NATA-C7 is the global minima and it is more stable than NATA-C5 by 1.2 kcal mol À1 (with ZPE) at MP2/cc-pVDZ level of theory. Geometry optimizations were performed using gradient convergence tolerance of 0.0001 Hartree per Bohr and subsequent harmonic frequency calculations were performed to confirm that each structure is in the true minimum.
Peptides or proteins give rise to characteristic IR absorption bands namely N-H stretching, C-H stretching, CQO stretching (amide I) and N-H bending (amide II). These are most important IR bands to reveal the conformational changes of proteins and peptides. The first two modes are found generally above 2700 cm À1 and the last two modes are found below 1700 cm À1 .
In Table 1, the computed HO frequencies in HF and MP2 potentials and VSCF-PT2 along with VSCF frequencies in HF/ MP2 potentials are presented and compared with the experiment for NATA-C5. In Fig. 2a, the corresponding VSCF-PT2 spectra are given and compared with experiment. The VSCF values are given in the table to show the extent of corrections due to PT2 approximation. As stated earlier, any perturbative method occasionally suffers from resonance problem particularly for low frequency modes where the spectral congestion is very high. Considering the large size of the three peptides, it is expected that these must contain several low frequency vibrations which ought to be more vulnerable as far as resonance is concerned, leading to numerical instabilities. For example, we found numerical convergence problem in VSCF-PT2 calculations 1610 | Phys. Chem. Chem. Phys., 2016, 18, 1607--1614 This journal is © the Owner Societies 2016 for the second lowest mode (torsional mode of B25 cm À1 ) of the NATA-C5 mostly due to resonance. Keeping this mode as harmonic the convergence problem is removed. 43 However, the level of accuracy in the peak positions showed that the resonances contributed weakly at least to the peaks we assigned. NATA has four NH stretching transitions, two from NH 2 group, one from indole group and one from f-amide NH. Only those VSCF-PT2 transitions are given in the tables for which the experimental data were observed (see ESI † for the full list of computed vibrations and intensities). The comparison between theoretical and experimental transitions is made for each type of vibrations considering peak positions and intensity pattern. The theoretical spectra of NATA-C5 in comparison to experiment are given in the Fig. 2a. Note that, VSCF algorithm can compute frequencies and intensities but not the spectral line shapes. It requires to treat spectral width and line broadening which yield spectral line shapes. The Lorentzian broadening function with HWHM of 5 cm À1 is used to calculate the spectral line shapes in this study. As it can be seen, very good agreements are found for peak positions of the NH stretches. The bands located at 3538 and 3417 cm À1 corresponding to asymmetric and symmetric NH 2 stretching modes match extremely well in terms of positions and band shapes with errors of only 1 and 11 cm À1 , respectively for the band positions. The calculated f-amide NH stretching mode at 3424 cm À1 is accurately located at the experimental band position (3430 cm À1 ) with similar intensity. Only the indole NH stretching is blue shifted by 45 cm À1 . It may happen due to the fact that indole NH stretch has hardly any interaction with the different dipeptide backbones and hence coupled weakly with other modes. That reduces the degree of anharmonic corrections to retain this peak blue of the experiment. Two adjacent peaks of f-amide NH and symmetric NH 2 stretches which are separated by 13 cm À1 in experiment are found to be separated by 4 cm À1 in the computed spectra.
The CH stretching vibrations of NATA include pyroll, aromatic and alkyl CH stretches. The experimental spectra showed that this region suffers from spectral congestion with mixing of Fermi resonances due to CH bending modes. For such cases the computed spectra can provide support to assign the measured spectra more clearly. For example, two aromatic CH stretches were experimentally measured at 3074 and 3059 cm À1 in NATA-c5. In theoretical calculations it is found that four such transitions are possible with three intense peaks and one very weak peak (Table 1 and Table S1, ESI †). Comparing the experimental values with the theoretical ones, very good agreement was found for both the peaks with an error of only 3 and 5 cm À1 , respectively. The intensity pattern is also good. A third computed peak at 3050 cm À1 with similar intensity of 3054 cm À1 indicates that one more experimental peak very close to 3059 cm À1 may be present which might be overlapped with the former to give a single peak instead. In the alkyl CH stretching region (B2930-3010 cm À1 ) three peaks were experimentally identified out of six possible peaks. Looking at the experimental intensity pattern we found that the two extreme peaks can be characterized by their higher intensity than the others. Our computed intensities nicely agree with the same trend. The overall match in this region is very good where the first two peaks result in good agreement and the last peak is red shifted with an error of 1.5%. Though the intensity pattern in this region matches very well the experiment, the overall line heights of the computed CH intensities are relatively smaller than the experiment. As stated earlier, we obtained the spectral line shapes by fitting to Lorentzian function. This is arbitrary in nature and consequently often yields unsatisfactory result compared to experiment. In this study we also found that the line shapes are unsatisfactory in some cases while the intensities and particularly the peak positions are in good agreement with experiment. Overall good agreement is found for NATA-C5 with mean absolute percentage error (MAPE) of only 0.46% (see Table S1, ESI †) for the peak positions.
Next, we turn to the NATA-C7 conformer and its comparisons with experiment and also with NATA-C5. At first glance  Fig. 2a and b), it can be seen that the spectral pattern of the two NATA conformers show distinct features in particular for NH stretch regions. However, comparing with experiment, NATA-C7 results less satisfactory agreement particularly in the NH stretching regions. In NATA-C7, the measured asymmetric and symmetric stretching modes for NH 2 are found at 3516 and 3334 cm À1 while the computed values are located at 3521 and 3406 cm À1 , respectively (Table 2). In comparison to NATA-C5, large experimental red shift in the symmetric NH 2 stretch is due to the intra-molecular hydrogen bonding. While the corresponding computed asymmetric mode matches very well with the deviation of only 5 cm À1 , the symmetric mode yields large blue shift with an error of 2.16%. This high error is probably due to the inaccuracy in the electronic structure method itself (HF in this case) which may be inadequate to describe this hydrogen bond and hence the anharmonicity accurately. As a simple test to assess this error, we compared the corresponding NH and the adjacent hydrogen bond distances at HF and MP2 optimized structures. While the NH and hydrogen bond distances by MP2 are 1.02 and 1.98 Å, HF yields 1.00 and 2.15 Å, respectively. Clearly, HF shows more localized interactions by binding the hydrogen more tightly towards the nitrogen and consequently underestimates the hydrogen bond interactions. As a result the corresponding NH vibration shows large blue shift. The indole NH stretch is once again blue shifted, similar to NATA-C5, with an error of 1.47%. These two large blue shifts in NATA-C7 lead to the final spectral pattern rather unsatisfactorily. The remaining f-amide NH stretch shows reasonable accuracy with a deviation of 32 cm À1 to the blue of the experiment. Due to such blue shifts the final computed spectra in this region (Fig. 2b) show some inconsistencies in comparison with experiment. The separation between the experimental asymmetric NH 2 and indole NH stretching peaks is only 5 cm À1 . However, the large blue shift in the computed indole NH stretch increases this splitting by 52 cm À1 . Similarly, due to the large blue shift in the computed symmetric NH 2 mode, the extent of separation between it and the indole NH stretching is narrowed down in comparison with measured spectra. However, the overall intensity pattern is good in this region. The aromatic CH stretches of NATA-C7 show very good agreement with the measured spectra. The deviations are less than 20 cm À1 for all the three transitions. The same good agreement is found for the alkyl CH stretches where the errors are within 25 cm À1 for all the cases. Although there are some large errors in the NH stretching regions, the overall good accuracy is found with the MAPE only 0.83% (see Table S2, ESI †) due to good accuracy in rest of the modes. It is found that the VSCF and the VSCF-PT2 values of the second lowest mode of NATA-C7 (torsional mode, about 45 cm À1 ) converge to negative frequency values, which are obviously unphysical. The harmonic and diagonal values are reasonable. In cases such as this, the VSCF and VSCF-PT2 values for this frequency should be disregarded. This is a case where the VSCF algorithm fails to converge to a physical value, but frequencies of other modes, if physical can be retained. 38 The intensities, as can be seen, agree nicely in comparison with the experimental vibrational pattern.
The third dipeptide AVPO is the largest system studied here and experimental data is available for both the high and mid-IR ranges. We tested a few other possible conformers by B3LYP/cc-pVDZ method and found several low energy structures, but the reported one remains as the global minima. The measured and calculated spectra are shown in Fig. 3 (corresponding spectral data are given in Table 3 and Table S3, ESI †). As it was found for NATA, two distinct high frequency regions are observed in experiment due to NH stretches around 3400-3450 cm À1 and CH stretch in the range of B2950-3100 cm À1 . There are two NH stretches located above 3400 cm À1 which indicates that the molecule is free from any hydrogen bonding (i.e. CQOÁ Á ÁH-N).
33 cm À1 . Comparing with the intensities we found that the other three aromatic CH stretches match well with experiment and the error range is around 1.1%. Below 3000 cm À1 , in the region of aliphatic CH stretches, three transitions were measured at 2974, 2965 and 2941 cm À1 respectively, with a maximum of 2974 cm À1 . These vibrations attributed to intense CH stretches of the methyl group of Val. The corresponding calculated spectra are found at 2998, 2974 and 2942 cm À1 respectively, with a maximum of 2998 cm À1 . As it can be seen two out of three peaks match very well with the experiment. Intensity pattern is also good in this region. The experimental transitions in the mid-IR ranges are observed for AVPO. In this region three CQO stretches due to Phe, acetyl and Val were measured and located at 1765, 1711 and 1696 cm À1 , respectively. The corresponding calculated frequencies are found at 1789, 1753 and 1730 cm À1 where all of them are blue shifted with deviations B20-40 cm À1 . The MAPE for AVPO is within 1.1% where most of the transitions match well with the experiment (Fig. S1, ESI †).
Overall it is found that for most of the cases we have accuracy within 10-20 cm À1 . For some cases we have deviations up to 40-50 cm À1 . Ideally, if one increases the level of electronic structure theory (i.e., basis set) the accuracy should increase by paying much higher computational cost. But practically that may not happen always as occasionally the cancellation/ addition of errors nullify the expected improvement. On the whole the level of agreement with experiment found here seems to be roughly of the level of agreement obtained so far for much smaller molecules, e.g. amino acids. This is therefore a significant step forward in this field.

Conclusions
This work shows that the approximate first-principles calculations for anharmonic molecular vibrations using ab initio PESs offer very good agreement with high resolution experiments for three dipeptides of as large as 135 vibrational modes. To our knowledge, so far these are the largest systems where the full pair-wise couplings for all the modes are used in quantum mechanical VSCF-PT2 calculations using first-principles potentials.
The overall comparison with experiment is very good for most cases and shows distinctive features for each conformer. The choice of potential (hybrid) plays the key role for accurate and affordable calculations of such large systems. Only for a very few transitions, comparatively larger errors are obtained due to inadequacy in the HF potential itself. However, keeping in mind of the large size of the peptides we have considered, very fast algorithm like HF is desirable to pursue the computations in reasonable time scale with the payment of some accuracy. Resonances are another possible setback which may invoke numerical instabilities in the VSCF-PT2 calculations and may yield errors in the final spectra.
A statistical analysis of % errors (Fig. 4) of the computed frequencies to the available experimental frequencies for all the three systems shows that more number of computed frequencies are slightly blue shifted than the red. For example, it is found that, most of the calculated hydrogenic stretches are distributed between À1.5 to 1.0% error range to the experiment. It shows that HF/MP2 hybrid potential is stiffer and binds stronger for such transitions. The same trend was found for pure MP2 potential 22 and it is no surprise that the hybrid HF/MP2 potential also reflects the same trend. It is also expected that HF, being a lower level electronic structure method than MP2, can be less accurate but more suitable to carry out the calculations for such large systems within reasonable computational time. We expect that quantum calculations, as the VSCF-PT2 done here, will be more accurate than the classical MD at low temperature. As it was found in some previous studies while the less anharmonic far-IR spectra showed good accuracy by MD, the mid-IR and high frequency region showed considerable deviations in peak positions as well as intensities. 31,32 It is anticipated that the commonly used BLYP potential for MD calculations should be less accurate than the B3LYP or MP2 and the latter methods are computationally costlier for MD. We also note that empirically scaled HO values can yield good agreement using different scaling factors to fit the experiment best. The present anharmonic method is a first-principles one, based on quantum calculations for both the potentials and the vibrations, with no adjustable parameters. The approach is thus unbiased unlike any empirical scaling. The level of matching with experiment highly increases the confidence about the accuracy of the corresponding vibrational spectra and molecular structure. Thus such approach can be used for even more challenging systems e.g. peptide-water complexes. The method used leads to the determination of an accurate anharmonic part of the potential, as the test of spectroscopy establishes. Moreover, knowledge of the anharmonic part of the potential is important for a range of molecular properties, such as intramolecular vibrational energy flow. 44 In the harmonic approximation, there is no coupling between different normal modes, thus the effect of vibrational energy flow between different modes is all due to the anharmonic part of the potential. Finally, we show the limitations of the state-of-the-art anharmonic calculations with the increase in molecular size. This paper shows that the calculations for small peptides (dipeptides) can be done by anharmonic quantum calculations, using good enough potentials (the hybrid HF/MP2) almost as a matter of routine and hence is applicable for determining the geometry of conformers and interactions for such systems. This opens a way to study mid-size peptides in gas-phase, peptide-water complexes and other biomolecules.