Python DNA Free Energy, Enthalpy, and Energy Analyzer

Online tutoring services

Need help with this question or any other Computer Science assignment help task? Click on the button below to to hire an expert cheap.

I need someone to write this code for me ASAP. I heard that this company writes code for others. Please let me know if you can do this (and please make sure when you commit, you commit) and how much it would cost. Realistically, it should not take too long. I would need it by 9/21/2020 11:59 PM (but I can be a little flexible). I am willing to pay around $140 for this project. Thank you.



DNA Thermodynamic Project Instructions:


Take the code I attached in MatLab (AllawiThermo) and copy it into Python Code.


This is the project. However, this project should be simple. It is just taking the program in MatLab and COPYING it over in Python.


Then you should be able to input these values below and get the following outputs:


For example. If I were to input 'AATTG' then for deltaH, deltaS, and deltaG I would get -29.1000, -87.6000, and -1.9200, respectively.


 'AATTG' :deltaH = -29.1000; deltaS = -87.6000; deltaG37 = -1.9200

'AAAAAAA': deltaH = -42.8000; deltaS = -126.4000; deltaG37 = -3.5400

'AATTGGCC': deltaH = -54.9000; deltaS = -151.8000; deltaG37 = -7.8400"


If you were to stop reading RIGHT HERE. Then you should have enough information to do this project. However, if you want to learn more about this project then you can keep on reading.


I attached the paper from which the values are from (bp thermo 1997, the python code that I wrote (which is incomplete), and the correct MatLab code that he wrote.





Alright here is the project and what I have done so far.


"Take a look at the attached paper (bp thermo 1997). It's quite dense, but the important part is the values in Table 1. Take a look and see if you can understand how to calculate the three values: deltaG, deltaH and deltaS (free energy, enthalpy and entropy) from an arbitrary DNA sequence. Let's do it in Python so it's portable--I use Jupyter, but feel free to use whatever build works for you. I can check your answers in Matlab, or you can use the oligo analyzer tool at IDT DNA (https://www.idtdna.com/site/account/login?returnurl=%2Fcalc%2Fanalyzer) to check.


For now, just practice reading the paper and see if you can build a thermodynamics method in Python that takes in a DNA sequence and spits out the three numbers. Don't forget to add the strand initiation energies. From it, see if you can calculate the parameters of the three unique DNA strands in the second paper I attached. Feel free to ask any questions you have--I'm happy to guide."


I attached the bp thermo 1997 paper below (TABLE 1 REMEMBER).


Alright, I wrote a method to do this called "Python Free Energy, Enthalpy, and Entropy" and attached the python file for that.


Then, this is the most recent update I got.


"We're getting close. Check your syntax at the beginning--using OR at the beginning for both initiation and termination means that the correction will only happen once. You should check the first letter, decide on the correction, and also check the last letter. This means that there will be a correction on each end.


Also, the checking for pairs is close, but some of the complements are off ('CT' and 'AG' are in the same if loop, etc.). I've attached my version of the code in matlab, and you can see what syntax I'm using (with different coding language). See if you can try to match the two. It should open in notepad or the like.


Some numbers to calibrate the code:

 'AATTG' :deltaH = -29.1000; deltaS = -87.6000; deltaG37 = -1.9200

'AAAAAAA': deltaH = -42.8000; deltaS = -126.4000; deltaG37 = -3.5400

'AATTGGCC': deltaH = -54.9000; deltaS = -151.8000; deltaG37 = -7.8400"


I attached the file of his code in MatLab (AllawiThermo) for you to copy off of. Basically. Take the MatLab program that he wrote and put it into python (following my coding guidelines), and calibrate the code using the numbers that I gave you.


Let me know if you have any questions. I need this code done ASAP.


Get Help With a similar task to - Python DNA Free Energy, Enthalpy, and Energy Analyzer

Login to view and/or buy answers.. or post an answer
Additional Instructions:
function [ deltaH, deltaS, deltaG37 ] = AllawiThermo( oligo ) %Calculates nearest-neighbor model thermodynamics based on Allawi 1997 %Assumes 1M NaCl %Assumes WC pairing (for now?) numBases = length(oligo); deltaH = 0.0; %kcal/mol deltaS = -1.4; %eu + symmetry correction deltaG37 = 0.4; %kcal/mol + symmetry correction %strand initiation if strcmp(oligo(1), 'G') || strcmp(oligo(1), 'C') deltaH = deltaH + 0.1; deltaS = deltaS - 2.8; deltaG37 = deltaG37 + 0.98; elseif strcmp(oligo(1), 'A') || strcmp(oligo(1), 'T') deltaH = deltaH + 2.3; deltaS = deltaS + 4.1; deltaG37 = deltaG37 + 1.03; else error('non-canonical terminal base'); end %strand initiation (terminal) if strcmp(oligo(numBases), 'G') || strcmp(oligo(numBases), 'C') deltaH = deltaH + 0.1; deltaS = deltaS - 2.8; deltaG37 = deltaG37 + 0.98; elseif strcmp(oligo(numBases), 'A') || strcmp(oligo(numBases), 'T') deltaH = deltaH + 2.3; deltaS = deltaS + 4.1; deltaG37 = deltaG37 + 1.03; else error('non-canonical terminal base'); end %nearest-neighbor calculations for i = 1:numBases-1 pair = [oligo(i) oligo(i+1)]; if strcmp(pair, 'AT') deltaH = deltaH - 7.2; deltaS = deltaS - 20.4; deltaG37 = deltaG37 - 0.88; elseif strcmp(pair, 'TA') deltaH = deltaH - 7.2; deltaS = deltaS - 21.3; deltaG37 = deltaG37 - 0.58; elseif strcmp(pair, 'CG') deltaH = deltaH - 10.6; deltaS = deltaS - 27.2; deltaG37 = deltaG37 - 2.17; elseif strcmp(pair, 'GC') deltaH = deltaH - 9.8; deltaS = deltaS - 24.4; deltaG37 = deltaG37 - 2.24; elseif strcmp(pair, 'AA') || strcmp(pair, 'TT') deltaH = deltaH - 7.9; deltaS = deltaS - 22.2; deltaG37 = deltaG37 - 1; elseif strcmp(pair, 'CA') || strcmp(pair, 'TG') deltaH = deltaH - 8.5; deltaS = deltaS - 22.7; deltaG37 = deltaG37 - 1.45; elseif strcmp(pair, 'GT') || strcmp(pair, 'AC') deltaH = deltaH - 8.4; deltaS = deltaS - 22.4; deltaG37 = deltaG37 - 1.44; elseif strcmp(pair, 'CT') || strcmp(pair, 'AG') deltaH = deltaH - 7.8; deltaS = deltaS - 21.0; deltaG37 = deltaG37 - 1.28; elseif strcmp(pair, 'GA') || strcmp(pair, 'TC') deltaH = deltaH - 8.2; deltaS = deltaS - 22.2; deltaG37 = deltaG37 - 1.3; elseif strcmp(pair, 'GG') || strcmp(pair, 'CC') deltaH = deltaH - 8.0; deltaS = deltaS - 19.9; deltaG37 = deltaG37 - 1.84; else error(['orthogonal base detected at ' i]); end end end
Thermodynamics and NMR of Internal G‚T Mismatches in DNA Hatim T. Allawi and John SantaLucia, Jr.* Department of Chemistry, Wayne State UniVersity, Detroit, Michigan 48202 ReceiVed October 16, 1996; ReVised Manuscript ReceiVed June 18, 1997X ABSTRACT: Thermodynamics of 39 oligonucleotides with internal G‚T mismatches dissolved in 1 M NaCl were determined from UV absorbance versus temperature profiles. These data were combined with literature values of six sequences to derive parameters for 10 linearly independent trimer and tetramer sequences with G‚T mismatches and Watson-Crick base pairs. The G‚T mismatch parameters predict ∆G°37, ∆H°, ∆S°, andTM with average deviations of 5.1%, 7.5%, 8.0%, and 1.4°C, respectively. These predictions are within the limits of what can be expected for a nearest-neighbor model. The data show that the contribution of a single G‚T mismatch to helix stability is context dependent and ranges from +1.05 kcal/mol for AGA/TTT to-1.05 kcal/mol for CGC/GTG. Several tests of the applicability of the nearest-neighbor model to G‚T mismatches are described. Analysis of imino proton chemical shifts show that structural perturbations from the G‚T mismatches are highly localized. One-dimensional NOE difference spectra demonstrate that G‚T mismatches form stable hydrogen-bonded wobble pairs in diverse contexts. Refined nearest-neighbor parameters for Watson-Crick base pairs are also presented. Mismatches occur naturally in DNA as a result of errors from misincorporation of bases during replication (Goodman et al., 1993), due to heteroduplex formation during homolo- gous recombination (Bhattacharyya et al., 1989), and from mutagenic chemicals (Leonard et al., 1990a; Plum et al., 1995), ionizing radiation (Brown, 1995), and spontaneous deamination. In addition to Watson-Crick base pairing, there are eight possible mispairs, namely A‚A, A‚C, A‚G, C‚C, C‚T, G‚G, G‚T, and T‚T. Repair of these mismatches requires the recognition and excision of mismatched bases by proofreading enzymes or by postreplication mismatch repair systems (Modrich & Lahue, 1996). Understanding the thermodynamics of mismatches in DNA duplexes will improve our understanding of these processes (Aboul-ela et al., 1985; Werntges et al., 1986; Petruska et al., 1988; Mendelman et al., 1989; Johnson, 1993). Several molecular biological techniques require accurate prediction of hybridization thermodynamics to “matched” versus “mismatched” sites (Wallace et al., 1979; Aboul-ela et al., 1985; Kawase et al., 1986; Ikuta et al., 1987) including PCR1 (Saiki et al., 1988), Kunkel mutagenesis (Kunkel et al., 1987), sequencing by hybridization (Fodor et al., 1993), and gene diagnostics (Freier, 1993). In each of these techniques, the choice of a nonoptimal sequence or temper- ature can lead to amplification or detection of wrong sequences (Steger, 1994; SantaLucia et al., 1996). In addition, knowledge of mismatch stability is an important step toward acquiring a parameter database for DNA secondary-structure prediction algorithms (Allawi, Peyret, and SantaLucia, unpublished experiments). Previously, we and others (SantaLucia et al., 1996; Sugimoto et al., 1994; Doktycz et al., 1995) showed that, despite the structural variability observed in DNA structures (Callidine & Drew, 1984; Hunter, 1993), a nearest-neighbor model is sufficient to reliably predict the stability of DNA duplexes with Watson-Crick pairs. We hypothesized that a nearest-neighbor model could also apply for DNA duplexes with internal G‚T mismatches. To test this hypothesis, thermodynamic measurements of 39 G‚T mismatch-contain- ing DNA oligonucleotides were combined with six literature values to derive G‚T mismatch nearest-neighbor parameters in 1 M NaCl buffer. The availability of nearest-neighbor parameters for G‚T mismatches along with refined param- eters for Watson-Crick pairs allows the reliable prediction of duplex stability from sequence. Exchangeable proton one- dimensional NMR spectra show that G‚T mismatches form a wobble hydrogen-bonded structure in diverse contexts. MATERIALS AND METHODS DNA Synthesis and Purification.Oligonucleotides were supplied by Hitachi Chemical Research and were synthesized on solid support using standard phosphoramidite chemistry (Brown & Brown, 1991). DNA oligomers were removed from the solid support and deblocked by treatment with concentrated ammonia at 50°C overnight. Each sample was evaporated to dryness, and the crude mixture was dissolved in 250 mL of water and purified on a Si500F thin-layer chromatography plate (Baker) by eluting for 5 h with n-propanol/ammonia/water (55:35:10 by volume) (Chou et al., 1989). Bands were visualized with a UV lamp, and the least mobile band was cut out and eluted three times with 3 mL of distilled deionized water. The sample was then evaporated to dryness. Oligonucleotides were desalted and further purified with a Sep-pak C-18 cartridge (Waters). The DNA was eluted with 30% acetonitrile buffered with 10 mM ammonium bicarbonate, pH 7.0. Purities were checked by analytical C-8 HPLC (Perceptive Biosystems) and were greater than 95%. * To whom correspondence should be sent. Phone: (313) 577-0101. FAX: (313) 577-8822. E-mail: jsl@chem.wayne.edu. X Abstract published inAdVance ACS Abstracts,August 1, 1997. 1 Abbreviations: Na2EDTA, disodium ethylenediaminetetraacetate; eu, entropy units (cal/K mol); HPLC, high-performance liquid chro- matography; NOE, nuclear Overhauser enhancement; PCR, polymerase chain reaction; SVD, singular value decomposition;TM, melting temperature; UV, ultraviolet. 10581Biochemistry1997,36, 10581-10594 S0006-2960(96)02590-1 CCC: $14.00 © 1997 American Chemical Society Melting CurVes. Absorbance versus temperature profiles (melting curves) were measured at 280 or 260 nm with a heating rate of 0.8°C min-1 on an AVIV 14DS UV-vis spectrophotometer as described previously (SantaLucia et al., 1996). The buffer used for thermodynamic studies was 1.0 M NaCl, 10 mM sodium cacodylate, and 0.5 mM Na2EDTA, pH 7.0. Oligonucleotide samples were “annealed” and degassed by raising the temperature to 85°C for 5 min. While at 85°C, the absorbance of each sample was measured at 260 nm for determination of oligonucleotide concentrations (CT) using extinction coefficients calculated from dinucleo- side monophosphates and nucleotides, as described previ- ously (Richards, 1975). Data Analysis. Thermodynamic parameters for duplex formation were obtained from absorbance versus temperature melting curves using the program MELTWIN v2.1 (Mc- Dowell & Turner, 1996) by two methods: (1) enthalpies and entropies from fits of individual melting curves with sloping base lines were averaged (Petersheim & Turner, 1983), and (2) plots of reciprocal melting temperature (TM-1) vs lnCT according to eq 1 (Borer et al., 1974) were made. For non-self-complementary molecules,CT in eq 1 was replaced byCT/4. Both methods assume that the transition equilibrium involves only two states (i.e., duplex and random coil) and that the difference in heat capacities (∆Cp°) of these states is zero (Petersheim & Turner, 1983; Freier et al., 1986). Agreement of parameters derived by the two methods is a necessary, but not sufficient, criterion to establish the validity of the two-state approximation (see below; SantaLucia et al., 1990; Marky & Breslauer, 1987). Design of Sequences.Sequences were designed to have a melting temperature (TM) between 30 and 60°C and to minimize the possibility of forming stable alternative second- ary structures such as slipped duplexes or hairpins; this maximizes the likelihood of observing two-state thermo- dynamics. In addition, sequences were chosen to provide uniform representation of the 11 different G‚T mismatch containing nearest-neighbors. Throughout this paper nearest- neighbor base pairs are represented with a slash separating the strands in antiparallel orientation and with mismatches underlined (e.g., AT/TG means5′AT3′ paired with3′TG5′). The 11 G‚T nearest-neighbors occur in this study with the following frequencies: AG/TT) 12, AT/TG) 12, CG/GT ) 17, CT/GG) 12, GG/CT) 9, GT/CG) 20, TG/AT) 23, TT/AG) 17, GT/TG) 3, TT/GG) 3, TG/GT) 3. 18 of the 45 sequences used to derive nearest-neighbor param- eters form self-complementary duplexes with two non- adjacent G‚T mismatches. Three-State Equilibrium Calculations.Some of the se- quences in this study are better described by a three-state model rather than a two-state model. We follow the method described by Longfellow et al. (1990) to carry out three- state equilibrium calculations. Self-complementary se- quences have the potential to form both duplex and hairpin species as described by the following coupled equilibria: where A, A2, and AH represent A in the random coil, duplex, and hairpin states, respectively. The total strand concentra- tion, CT, is given by and the equilibrium constants are given by where i ) 1 or 2 and∆H°i and ∆S°i are measured or predicted thermodynamic parameters for the individual equilibria. Substituting eqs 3 and 2b into eq 2a gives the quadratic equation The concentration of A is given by the analytical solution of the quadratic equation (only one root is physical). The concentrations of A2 and AH are calculated from eqs 2a and 2b. Similar analytical solutions can be derived for non-self- complementary sequences that have hairpin intermediates (Allawi and SantaLucia, unpublished results). Determination of the G‚T Mismatch Contribution to Helix Formation. The thermodynamic increments associated with the folding of the mismatch portion of an oligonucleotide cannot be directly measured. Instead, each thermodynamic measurement provides the total energy change for strands going from the random coil state to duplex state. According to the nearest-neighbor model, the total energy change is the sum of energy increments for helix initiation (see below), helix symmetry, and nearest-neighbor interactions between base pairs (Freier et al., 1986). The nearest-neighbor model can be extended to include parameters for interactions between mismatches and neighboring base pairs (He et al., 1991). For example (underlined residues are mismatched): Thus, to derive the mismatch contribution to duplex forma- tion,∆G°37(mismatch), the contributions from the Watson- Crick pairs, helix initiation, and helical symmetry (for self- complementary sequences) are subtracted from the total free energy. In the example above, this amounts to simply rearranging eq 6: Inserting the experimental free energy (Table 3) and the Watson-Crick nearest-neighbor numbers (Table 1) into eq 7, we obtain The nearest-neighbors CT/GG and CG/GT in eq 8 are TM -1 ) R/∆H° ln CT + ∆S°/∆H° (1) 2A h A2 K1 ) [A2]/[A] 2 (2a) A h AH K2 ) [AH]/[A] (2b) CT ) [A] + 2[A2] + [AH] (3) Ki ) exp(-∆H°i/RT+ ∆S°i/R) (4) 0) CT - (1+ K2)[A] - 2K1[A] 2 (5) 10582 Biochemistry, Vol. 36, No. 34, 1997 Allawi and SantaLucia unknowns. Similar calculations for∆H° and∆S° are carried out to calculate∆H°(mismatch) and∆S°(mismatch). Alternatively, the mismatch contribution can be determined by measuring the thermodynamics of a “core sequence” and adding back the nearest neighbor that is interrupted by the mismatch (Wu et al., 1995), as shown below: where CTG/GGC) CT/GA+ CG/GT. The two methods in eqs 7 and 9 are equally reliable due to uncertainties in measurements and in the nearest neighbor parameters (Table 1). We chose the former method because the measurement of the thermodynamics of all the core sequences is not required. Analysis of the G‚T Mismatch Contribution in Terms of Linearly Independent Sequences. Previous work by Gray and Tinoco (1970), Vologodskii et al. (1984), and Goldstein and Benight (1992) have shown that nucleic acid thermo- dynamics should be analyzed in terms of linearly independent sequences. For duplexes with Watson-Crick base pairs, there are 10 different nearest neighbors and two initiation parameters that can be determined from a carefully designed set of oligonucleotides (D. M. Gray, personal communica- tion). If constraints are imposed on all the sequences in the data set, however, such as fixing the oligonucleotide length, fixing the ends of the duplex, or making measurements in polymers that formally do not contain ends, then each sequence constraint results in one fewer parameter that can be derived from the set of sequences (Gray & Tinoco, 1970). A fundamental assumption of the nearest-neighbor model is that terminal nearest neighbors make the same contribution as internal neighbors. For Watson-Crick pairs in RNA and DNA (see below), it appears that this assumption is reason- able as evidenced by the fact that the nearest-neighbor parameters make accurate predictions for sequences with diverse nucleotide content and termini. However, for G‚T mismatches in DNA this assumption is invalid. Terminal G‚T mismatches always make favorable contributions to helix stability (Jenkins and SantaLucia, unpublished results), while internal G‚T mismatches can make favorable or unfavorable contributions, depending on the context. Hence, our data set for internal G‚T mismatches does not contain sequences that have terminal G‚T mismatches. As a result, the maximum number of linearly independent parameters that can be derived from our data set is 10. This is verified from the column rank of the stacking matrix, which is 10 for our data set. These 10 uniquely determined parameters are linear combinations of the 11 G‚T nearest-neighbor dimers. A simple way to construct a set of sequences that are linearly independent is to start with the 11 G‚T mismatch containing dimers and add a Watson-Crick base pair to each end of the base pair doublet that has a G‚T mismatch. For example, the neighbor AT/TG becomes ATC/TGG (again, the “/” indicates pairing of strands in the antiparallel orientation) by the addition of a5′G-C3′ base pair to the left side. The trimer sequence ATC/TGG is simply the sum of the nearest neighbors AT/TG and GG/CT. For neighbors with two G‚T mismatches such as GT/TG, the linearly independent se- quence is GGTC/CTGG which is the sum of GG/CT+ GT/ TG and GG/CT. When this is done for all 11 nearest neighbors, two of the trimers are identical (GTC/CGG occurs twice) so the total number of unique parameters is reduced to 10 and these are linearly independent (see Tables 4 and 5). The choice of which base pair to place at the left is arbitrary, but placing the same pair at the end of each sequence simplifies the analysis of sequences in terms of linearly independent sequences. For example, eq 6 can be rewritten in terms of the following linearly independent sequences: Note the term GTC/GGC is subtracted to account for the extra terms for TC/GG and GC/TG that are found in the sequences CTC/GGG and CGC/GTG but are not found in the actual sequence. This equation can be rearranged to give the mismatch contribution to helix stability. It is important to note that the 10 linearly independent parameters obtained still conform to a nearest-neighbor model and do not account for next-nearest-neighbor interac- tions. Duplex Initiation Parameter. Recently, it has been shown that the duplex initiation parameter contains contributions from the terminal base pairs (D. M. Gray, personal com- munication). Gray’s work shows that the difference between DNA sequences with terminal G-C and T-A base pairs can be accounted for by introducing two initiation param- eters. This model assumes that the contribution to duplex stability for terminal G-C equals terminal C-G and terminal A-T equals terminal T-A. To accomodate this, two parameters are introduced: “initiation with terminal G-C” and “initiation with terminal A-T” (Table 1). A duplex with two terminal G-C pairs would use total initiation) 2 × (initiation with terminal G-C). A duplex with one terminal G-C and one terminal A-T would use total initiation) Table 1: Nearest-Neighbor Thermodynamic Parameters for Watson-Crick Base Pair Formation in 1 M NaCla propagation sequence ∆H° (kcal/ mol) ∆S° (eu) ∆G°37 (kcal/ mol) AA/TT -7.9( 0.2 -22.2( 0.8 -1.00( 0.01 AT/TA -7.2( 0.7 -20.4( 2.4 -0.88( 0.04 TA/AT -7.2( 0.9 -21.3( 2.4 -0.58( 0.06 CA/GT -8.5( 0.6 -22.7( 2.0 -1.45( 0.06 GT/CA -8.4( 0.5 -22.4( 2.0 -1.44( 0.04 CT/GA -7.8( 0.6 -21.0( 2.0 -1.28( 0.03 GA/CT -8.2( 0.6 -22.2( 1.7 -1.30( 0.03 CG/GC -10.6( 0.6 -27.2( 2.6 -2.17( 0.05 GC/CG -9.8( 0.4 -24.4( 2.0 -2.24( 0.03 GG/CC -8.0( 0.9 -19.9( 1.8 -1.84( 0.04 init. w/term. G-Cb 0.1( 1.1 -2.8( 0.2 0.98( 0.05 init. w/term. A-Tb 2.3( 1.3 4.1( 0.2 1.03( 0.05 symmetry correciton 0 -1.4 0.4 a Errors are resampling standard deviations (see text).b See text for how to apply the initiation parameters. G‚T Mismatches in DNA Biochemistry, Vol. 36, No. 34, 199710583 (initiation with terminal G-C) + (initiation with terminal A-T). A duplex with two terminal A-T pairs would use total initiation) 2 × (initiation with terminal A-T). Regression Analysis.The free energy and enthalpy contributions of the 10 linearly independent G‚T mismatch containing trimer and tetramer sequences were determined by multiple linear regression using MATHEMATICA (Wol- fram 1992). Thermodynamic parameters for 45 G‚T mis- match containing duplexes (Table 2) were used to construct a list of 45 equations (analogous to eq 11 above) with 10 unknowns. These equations were then cast in the form of matrices for input into linear regression. The∆G°37- (mismatch) for all 45 sequences formed the column matrix GMis with elements∆Gi, where the subscripti denotes different oligonucleotides. The number of occurrences of each G‚T mismatch containing trimer or tetramer sequence formed the “stacking matrix,”S, with dimensions of 45× 10. The unknown values of the 10 G‚T mismatch trimers and tetramers form the column matrixGNN with elements ∆Gj, where the subscriptj denotes the 10 linearly indepen- dent sequences. Therefore, the data for all the sequences are written as The solution of eq 12 for the unknowns,GNN, was obtained using singular value decomposition (SVD) (Press et al., 1989) which effectively inverts the stacking matrix and minimizes the error weighted squares of the residuals (Bevington, 1969): whereσi are the propagated errors in∆Gi, andSij are the matrix elements ofS. Theσi were calculated as the square Table 2: Thermodynamics of Duplex Formation of Oligonucleotides with G‚T Mismatchesa SHGG CT TH S b f GMis ) S‚GNN (12) ø2 ) Σij|(∆Gi - Sij∆Gj)/σi|2 (13) 10584 Biochemistry, Vol. 36, No. 34, 1997 Allawi and SantaLucia root of the sum of the squares of the errors for∆Gi(Total) and the standard errors for the nearest neighbors (Table 1). Analogous calculations were performed to obtain G‚T mismatch nearest-neighbor∆H° parameters. Entropic pa- rameters for G‚T mismatch contributions were calculated from ∆G°37 and∆H° using the equation To verify our calculation methodology, we derived the G‚T mismatch nearest-neighbor entropic contributions as it was done for∆G°37 and∆H°, using SVD, and the results agreed with those obtained using eq 14. Error Analysis. The sampling errors reported in Table 2 for 1/TM vs log CT plots and for the fits of the shapes of melting curves were obtained by standard methods (Santa- Lucia et al., 1991; McDowell & Turner, 1996) and reflect the precision or reproducibility in the experimental measure- ment (Bevington, 1969). The accuracy of the∆G°37, ∆H°, and∆S° parameters derived from the van’t Hoff analysis of the UV melting curves are estimated as standard deviations of 4%, 5%, and 6%, respectively. These estimates are based on the typical agreement between model independent calo- rimetry and UV melting measurements of∆H° and ∆S° (Albergo et al., 1981) and by measurements made by different laboratories on the same sequences (J. SantaLucia, unpublished results). The small errors observed for∆G°37 andTM are the result of the fact that∆H° and∆S° determined from a van’t Hoff analysis of UV melting data are greater than 99% correlated (Petersheim & Turner, 1983; SantaLucia et al., 1991). Plots of experimental∆H° vs∆S° are provided in the Supporting Information. The error propagation from ∆H° and∆S° to ∆G°37 andTM using standard methods [eq 4.8 of Bevington (1969)] are given by the following equations (SantaLucia et al., 1991): whereR∆H°∆S° is the correlation coefficient between∆H° and ∆S° and theσ terms are the standard deviations in the measurements. The errors in experimental measurements are rigorously propagated to the nearest-neighbor parameters in the variance-covariance matrix given by the SVD analysis (Press et al., 1989; SantaLucia et al., 1996). The propagated errors in the nearest-neighbor parameters have been inde- pendently confirmed by resampling analysis of the data. Resampling Analysis of the Data. Since our data set contains 45 equations with 10 unknowns, the problem is overdetermined. We took advantage of this and used a resampling analysis (Efron & Tibshirani, 1993) of our data to determine the uncertainties of the 10 linearly independent sequences. We performed 30 resampling trials. For each trial, a different set of 35 randomly selected sequences was used in the SVD analysis to calculate the 10 unknowns. For each trial, the rank of the matrix was confirmed to be 10. Then for each of the 10 unknowns the 30 trial values were averaged and the standard deviations determined. This Table 2 (Continued) SHGG CT TH S b f a Listed in alphabetical order and by oligomer length. For self-complementary sequences only the top strand is given. For non-self-complementary duplexes, both strands are given in antiparallel orientation. Underlined residues are mismatched. Molecules listed as two-state had∆H° agreement within 10% by two different methods. Molecules listed as marginally non-two-state had∆H° agreement between 10 and 20% by two different methods. Molecules listed as non-two-state had∆H° disagreement greater than 20% by two different methods. Solutions are 1 M NaCl, 10 mM sodium cacodylate, 0.5 mM Na2EDTA, pH 7. Errors are standard deviations from the regression analysis of the melting data. Extra significant figures are given to allow accurate calculation of∆G°37 andTM. bCalculated for 10-4 M oligomer concentration for self-complementary sequences and 4× 10-4 M for non-self-complementary sequences. ∆S° ) (∆H° - ∆G°37)/310.15 (14) (σ∆G°37) 2 ) (σ∆H°) 2 + T 2(σ∆S°) 2 - 2T(R∆H°∆S°)σ∆H°σ∆S° (15) (σTm) 2 ) (σ∆H°TM/∆H°) 2 + (σ∆S°TM 2/∆H°)2 - 2TM 3R∆H°∆S°σ∆H°σ∆S°/(∆H°) 2 (16) G‚T Mismatches in DNA Biochemistry, Vol. 36, No. 34, 199710585 resampling analysis was performed for∆G°37,∆H°, and∆S°. The errors obtained from the resampling analysis have the advantage that no assumption about the magnitudes of the experimental errors or knowledge of the correct method of error propagation are required, and yet highly reliable error estimates are obtained (Efron & Tibshirani, 1993). 1H-NMR Spectroscopy.Oligomers were dissolved in 90% H2O and 10% D2O with 1 M NaCl, 10 mM disodium phosphate, and 0.1 mM Na2EDTA at pH 7. Sample concentrations were between 0.2 and 1.0 mM.1H-NMR spectra at 10°C were recorded using a Varian Unity 500 MHz NMR spectrometer. One dimensional exchangeable proton NMR spectra were recorded using the WATERGATE pulse sequence with “flip-back” pulse to suppress the water peak (Piotto et al., 1992; Lippens et al., 1995). Spectra were recorded with the carrier placed at the solvent frequency and with high-power and low-power pulse widths of 8.8 and 1700 ms, sweep width of 12 kHz, gradient field strength of 10.0 G/cm, and duration of 1 ms. 512 transients were collected for each spectrum. Data were multiplied by a 4.0 Hz line- broadening exponential function and Fourier transformed by a Silicon Graphics Indigo2Extreme computer with Varian VNMR software. No base line correction or solvent subtrac- tion was applied. 3-(Trimethylsilyl)propionic-2,2,3,3-d4 acid (TSP) was used as the internal standard for chemical shift reference. 1D-NOE difference spectra were acquired as described above, but with selective decoupling of individual resonances during the 1 s recycle delay. Each resonance was decoupled with a power sufficient to saturate<80% of the signal intensity so that spillover artifacts would be minimized. The spectra were acquired in an interleaved fashion in blocks of 16 scans to minimize subtraction errors due to long-term instrument drift, and 3200-6400 scans were collected for each FID. RESULTS Watson-Crick Nearest-Neighbor Parameters. Recently, two groups independently published improved nearest- neighbor parameters for predicting DNA duplex stability (SantaLucia et al., 1996; Sugimoto et al., 1996). The nearest- neighbor parameters derived by the two groups are similar in many respects; but, both the initiation parameters and the CG/GC neighbors are different [SantaLucia et al. reported ∆G°37(initiation) ) +1.82 kcal/mol and∆G°37(CG/GC)) -2.09 kcal/mol, whereas Sugimoto et al. reported+3.4 and -2.8 kcal/mol, respectively]. We have determined that these discrepancies are primarily due to two factors: (1) Sugimo- to’s regression analysis method did not produce the linear least squares fit, and (2) Sugimoto included data for the sequence CGCGTACGCGTACGCG (Raap et al., 1985) which is a clear outlier in the fit. We also reported this sequence in our paper (SantaLucia et al., 1996) but did not include it in the regression analysis since it has aTM of 91 °C (at a strand concentration of 1× 10-4 M) which leads to a large uncertainty in the derived thermodynamic parameters and also makes it unlikely that the two-state approximation is valid. When we removed the sequence CGCGTACGCG- TACGCG and performed a least squares fit on the remainder of Sugimoto’s data set (64 sequences), the initiation param- eter obtained is+2.34 kcal/mol and the CG/GC neighbor is -2.37 kcal/mol, which agree with what we reported (San- taLucia et al., 1996). Two important results of Sugimoto et al. (1996) are that (1) sequences with terminal T-A base pairs behave similarly as sequences with terminal G-C pairs, and (2) the helix initiation parameter for sequences with only A-T pairs appears to be the same as that for sequences with mixed G-C and A-T pairs. Our earlier conclusions for these parameters (SantaLucia et al., 1996) were incorrect due to insufficient/incorrect literature data for sequences with terminal A-T pairs. In order to provide a unified set of thermodynamic parameters, we have combined data from Sugimoto’s labora- tory (58 sequences), data from our laboratory (38 sequences), and data from the literature from other laboratories (37 sequences) for a total of 131 sequences (see Supporting Information). 108 of these sequences that melted with two- state thermodynamics were used to derive new nearest- neighbor parameters (Table 1). The errors reported in Table 1 are from a resampling analysis of the data in which 30 trials with 78 sequences randomly selected from the 108 sequences were used to calculate the nearest-neighbor parameters (see Materials and Methods). For each trial the rank of the stacking matrix was verified to be 12. All of the parameters given in Table 1, including the initiation parameter and the CG/GC neighbor, agree within the reported error of our previously published parameters (SantaLucia et al., 1996). The parameters given in Table 1 predict the ∆G°37, ∆H°, ∆S°, andTM of the 108 sequences with average deviations of 3.9%, 6.4%, 7.5%, and 1.8°C, respectively. A complete table with the experimental versus predicted thermodynamics of the unified data set of 131 sequences is provided in the Supporting Information. Thermodynamic Data.Plots ofTM-1 versus lnCT were linear (correlation coefficientg0.98) over the entire 80- 100-fold range in concentrations and are shown in Figure 1 and Supporting Information. Thermodynamic parameters derived from fits of individual melting curves and fromTM-1 versus lnCT plots are listed in Table 2. Sequences in which the∆H° from the two methods agree within 10% are listed as two-state transitions. Seven of 45 of the sequences show agreement of∆H° within 10-20% are listed as “marginally non-two-state” transitions. These sequences were also included in the regression analysis since they did not significantly affect the derived nearest-neighbor parameters. Sequences with∆H° differences greater than 20% are listed as non-two-state transitions and were not included in the regression analysis. For molecules with two-state transitions, thermodynamic data derived from averages of the fits and FIGURE 1: Reciprocal melting temperature vs lnCT plots for CGAGTCGATTCG (2), CGAGACGTTTCG (b), CGTGACGT- TACG (9), and CTCGGATCTGAG (O). 10586 Biochemistry, Vol. 36, No. 34, 1997 Allawi and SantaLucia from TM-1 vs lnCT plots are equally reliable (SantaLucia et al., 1996; Sugimoto et al., 1996); therefore, the averages of these parameters were used for the linear regression analysis (Table 3). Linear Regression Analysis of G‚T Mismatches in Terms of 10 Linearly Independent Sequences. Table 4 lists parameters for 10 uniquely determined trimer and tetramer sequences with G‚T mismatches obtained by multiple linear regression of the data listed in Table 3 (see Materials and Methods). The errors listed in Table 4 are from a resampling analysis of the data (see Materials and Methods). The resampling errors are within round-off of the errors obtained by propagating experimental errors in the SVD analysis (not shown). The parameters in Table 4 along with the Watson- Crick nearest-neighbor parameters (Table 1) predict the thermodynamics of the 45 sequences with two-state transi- tions in Table 3 with average deviations of 5.1%, 7.5%, 8.0%, and 1.4°C for∆G°37, ∆H°, ∆S°, andTM, respectively. This level of agreement between experiment and prediction (SantaLucia et al., 1996; Freier et al., 1986) indicates that the nearest-neighbor model is valid for DNA sequences containing G‚T mismatches. The 18 self-complementary sequences in Table 3 are predicted as well as the 27 non- self-complementary sequences by the parameters in Table 4, demonstrating the consistency of our approach. Linear Regression Analysis of G‚T Mismatches in Terms of 11 Non-Unique Dimers. An alternative presentation of the 10 linearly-independent parameters for G‚T mismatches is to use SVD to calculate the 11 G‚T nearest-neighbor dimer duplexes. The stacking matrix still has a rank of 10 and is thus singular, but SVD analysis can still provide a solution that is a least squares fit, but the solution is not unique (Press et al., 1989). The 11 nearest-neighbor dimer parameters (Table 5) make predictions that are equal within roundoff error of those made from the 10 linearly independent trimer and tetramer sequences (Table 4). In fact, the parameters in Table 4 are simple linear combinations of the parameters in Table 5. Conversely, the 10 parameters in Table 4 can not be used to derive the 11 parameters in Table 5 unless an 11th parameter is provided. The SVD analysis in terms of 11 parameters essentially assumes that this additional parameter is zero. It is important to note, however, that trends in the 11 parameters should not be considered physically relevant. The non-uniqueness of the 11 dimers is readily verified by noting that by adding a constant,C, to one of the 11 dimers and then subtracting or addingC or zero to the remaining 10 dimers subject to the constraints imposed by the 10 uniquely determined and linearly inde- pendent sequences (Table 4), an alternative 11 parameter solution is obtained that makes equal predictions for oligo- nucleotide duplexes, but the thermodynamic trends of the dimers themselves are different than those shown in Table 5. It is worth noting that the GG/TT neighbor is uniquely determined since GG/TT) GGGC/CTTG- GGC/CTG. The 11 parameters in Table 5 are useful because they are easier to apply than 10 linearly independent sequences since it is simpler to determine which dimers need to be added (analogous to eq 6) than it is to determine which trimers and tetramers need to be added and subtracted (analogous to eq 10) to predict the thermodynamics of an oligonucleotide duplex. Molecules with Non-Two-State Thermodynamics.Six sequences are listed in Tables 2 and 3 that melt with non- two-state thermodynamics. We presume that these molecules are able to form structures other than the desired duplex or random coil states, including hairpins and “slipped duplexes”. For four of these sequences, non-two-state behavior is manifested in differences greater than 20% for the van’t Hoff enthalpies derived from 1/TM vs ln CT plots and from the fits of individual melting curves (Table 2). Below, we show two sequences that melt with non-two-state (NTS) behav- ior: NTS-1, (CGTTGCGTAACG)2, and NTS-2, GCG- TACGCATGCG/CGCATGTGTACGC (Plum et al., 1995), despite good agreement between∆H° values derived by the two different van’t Hoff methods (Table 2). To test the hypothesis that NTS-1 is able to form a stable hairpin we synthesized and melted the mutant hairpin sequence HP-1, CGTTGCATAACG (underlined residues are in the loop) which is unlikely to form a duplex. The mutant sequence melted with a concentration independentTM of 58 °C (see Supporting Information) and with∆H°, ∆S°, and ∆G°37 of -31.2 kcal/mol,-94.1 eu, and-1.97 kcal/mol, respectively. We assume that the hairpin form of NTS-1, CGTTGCGTAACG, would have the same thermodynamic properties as HP-1. Figure 2 shows a simulation of the melting of NTS-1 using eqs 2-5. The simulation used parameters measured for HP-1 and the predicted thermo- dynamics for the duplex to random coil equilibrium (Table 3). The simulation clearly shows significant population of hairpin at temperatures near the duplexTM. The results of the above simulation can be used to calculate a simulated melting curve using the equation whereA(T) is the total temperature dependent absorbance, b is the optical pathlength, and�RC, �DH, and �Hairpin are extinction coefficient of random coil, double helix, and hairpin, respectively. The calculated melting curve so obtained is in good agreement with the observed melting curve for NTS-1 (not shown). An alternative explanation for why NTS-1 is not well predicted is that the nearest-neighbor model does not apply in this case. The origin of such an effect would be structural perturbations that propagate more than 1 base pair away from the mismatch. To test this possibility, we synthesized the non-self-complementary duplex CCATGCGTAACG/GG- TATGCGTTGC which has the same base pairs next to the mismatches and the same base pair composition further away from the G‚T pair as NTS-1, but does not have the potential to form hairpin structures. This duplex is well predicted (Table 3) by the parameters in Table 4. Thus, we reject non- nearest-neighbor effects as an explanation of the thermo- dynamics observed for NTS-1. We also investigated NTS-2 to determine why this sequence is not well predicted by the parameters in Table 4. Plum et al. (1995) observed the following thermodynamics for NTS-2: ∆H°(av of fits) ) -81.3 kcal/mol,∆H°(1/TM vs ln CT) ) -86.6 kcal/mol,∆H°(calorimetry)) -89.6, TM(5 × 10-6 M) ) 55.3 °C. We melted this duplex and obtained the parameters shown in Table 2 that are in excellent agreement with those measured by Plum et al. (1995). This agreement by our lab and the Breslauer lab eliminates the possibilities that this sequence is not well predicted due to differences in instrumental calibration or sample preparation. Despite the agreement between the UV melting data and A(T) ) b(�RC[A] + �DH2[A2] + �Hairpin[AH]) (17) G‚T Mismatches in DNA Biochemistry, Vol. 36, No. 34, 199710587 Table 3: Experimental and Predicted Thermodynamic Parameters of Oligonucleotides with G‚T Mismatchesa expt predexpt predexpt predb expt pred c c c d TMSHG 10588 Biochemistry, Vol. 36, No. 34, 1997 Allawi and SantaLucia calorimetric data, Plum et al. were careful not to conclude that this sequence melts with two state thermodynamics. We melted the individual single strands and found both strands were able to form stable concentration dependent structures (Figure 3), probably consisting of partially self-complemen- tary slipped duplexes. Therefore, discrepancy between experiment and prediction in this case is most likely due to the presence of the alternative structures formed by the single strands. The results for NTS-1 as well as those for NTS-2 suggest that caution is in order whenever the two-state approximation is applied. NMR Spectroscopy of Molecules with Non-Two-State Thermodynamics. The NMR of three non-two-state RNA sequences with G‚U mismatches were either broadened (suggesting intermediate conformational exchange) or showed extra resonances suggestive of slow exchange with hairpin or other species (He et al., 1991). The work of He et al. (1991) suggests that 1D-NMR is a good way to assess the validity of the two-state approximation. Figure 4 shows the imino region of the 1D-NMR spectra of NTS-1 and NTS-2. We do not find evidence for extra resonances or that peaks are excessively broad at 10°C. This suggests that at low temperatures hairpin or slipped duplex species are present in low concentrations or that their resonances are either at the same chemical shifts as the desired duplex or they are too broad to observe due to chemical exchange. However, as the temperature is raised, different resonances begin to broaden at different temperatures and the chemical shifts change with temperature (see Supporting Information), suggesting that the melting processes for the two duplexes are non-two state. Table 3 (Continued) expt predexpt predexpt predb expt pred c c c d TMSHG a Listed in alphabetical order and by oligomer length. Experimental values are the averages of theTM-1 versus lnCT and the curve fit parameters given in Table 2.b Sequences without a literature reference are from Table 2 of this work.c Standard errors for experimental∆G°37, ∆H°, and∆S° are assumed to be 4%, 8%, and 8%, respectively.dCalculated for 10-4 M oligomer concentration for self-complementary sequences and 4× 10-4 M for non-self-complementary sequences.eAboul-ela et al. (1985).f M. Arghavani, J. SantaLucia, Jr., and L. Romano, unpublished.g Leonard et al. (1990b).h Tibanyenda et al. (1984). Table 4: Thermodynamic Parameters for Ten Linearly Independent Sequences with Internal G‚T Mismatches in 1 M NaCla propagation sequence ∆H° (kcal/mol) ∆S° (eu) ∆G°37 (kcal/mol) AGC/TTG -3.4( 1.6 -11.4( 5.0 0.12( 0.15 ATC/TGG 0.6( 1.8 1.6( 2.9 0.15( 0.14 CGC/GTG -8.6( 1.4 -24.3( 4.5 -1.05( 0.10 CTC/GGG 0.4( 1.1 2.2( 3.3 -0.24( 0.13 GTC/CGG -1.2( 1.3 -2.4( 3.9 -0.51( 0.08 TGC/ATG -4.5( 1.0 -14.1( 3.4 -0.16( 0.12 TTC/AGG 1.9( 1.5 4.9( 3.5 0.42( 0.15 GGGC/CTTG 4.6( 2.8 14.1( 9.3 0.23( 0.05 GGTC/CTGG 10.6( 2.2 30.1( 5.1 1.31( 0.10 GTGC/CGTG -9.7( 3.8 -29.2( 10.7 -0.66( 0.23 a Errors are resampling standard deviations (see text). Table 5: Non-Unique Nearerst-Neighbor Thermodynamic Parameters of G‚T Mismatches in 1 M NaCla propagation sequence∆H° (kcal/mol) ∆S° (eu) ∆G°37 (kcal/mol) AG/TT 1.0 0.9 0.71 AT/TG -2.5 -8.3 0.07 CG/GT -4.1 -11.7 -0.47 CT/GG -2.8 -8.0 -0.32 GG/CT 3.3 10.4 0.08 GG/TTb 5.8 16.3 0.74 GT/CG -4.4 -12.3 -0.59 GT/TG 4.1 9.5 1.15 TG/AT -0.1 -1.7 0.43 TG/GT -1.4 -6.2 0.52 TT/AG -1.3 -5.3 0.34 a These parameters are a linear least-squares fit of the data for a singular matrix with a rank of 10. These parameters make predictions that are within roundoff error of the parameters from Table 4. Linear combinations of the parameters in this table give the parameters in Table 4. Trends in these parameters should not be considered physically relevant (see text).b The GG/TT nearest neighbor is uniquely deter- mined (see text). G‚T Mismatches in DNA Biochemistry, Vol. 36, No. 34, 199710589 NMR Spectroscopy of Molecules with Two-State Thermo- dynamics. Figure 5 shows the exchangeable imino region (9-15 ppm) of the 1D proton NMR spectrum of 10 sequences. Resonances were assigned using 1D-NOE dif- ference spectroscopy and the temperature dependent broad- ening of imino protons from terminal base pairs (Figure 6 and Supporting Information). Figure 6 shows the 1D-NOE difference spectra used to assign the sequence (CGT- GACGTTACG)2. The assignments for the other sequences listed in Table 6 were determined by the same methods (see Supporting Information). We have assumed that the T imino proton resonance is downfield of the G imino proton of the G‚T mismatch (Patel et al., 1984; Hare et al., 1986). In general, the imino protons from the G‚T mismatches resonate between 10 and 12 ppm and are as sharp as imino protons in Watson-Crick pairs. G‚T mismatches with different surrounding base pairs all show sharp imino resonances consistent with the formation of wobble pairs with stable hydrogen bonding in diverse contexts. Imino Proton Chemical Shift Predictions.We used chemical shift data in Table 6 to test whether the shielding parameters of Arter and Schmidt (1976) could be extended to apply to sequences with G‚T mismatches. We find that the Arter & Schmidt parameters make good predictions for the imino protons of DNA sequences with G‚T mismatches if the following assumptions are applied: (1) B-form structure is formed (the use of A-form parameters does not give good chemical shift predictions). (2) G‚T mismatches have the same shielding parameters as G‚C base pairs. (3) The unperturbed shifts of imino protons in G‚C and A‚T base pairs are 13.67 and 14.57 ppm, respectively. The unperturbed shifts of G and T imino protons in G‚T mismatches are 11.49 and 12.86 ppm, respectively. These numbers are based on a best fit analysis of the data in Table 6. Overall, the Arter & Schmidt shielding parameters with the above modifications predict the chemical shifts of G‚C and A‚T pairs with average deviations of 0.2 and 0.1 ppm, respectively. For G‚T mismatches, the average deviations for G and T chemical shifts are 0.3 and 0.5 ppm, respectively. DISCUSSION Applicability of the Nearest-Neighbor Model to G‚T Mismatches. If the nearest-neighbor model was not ap- propriate for G‚T mismatches, a single set of energies that predict all sequences could not be found. Table 3 compares the experimental results for 45 G‚T mismatch containing oligonucleotides with those predicted using G‚T mismatch parameters listed in Table 4 or 5 in conjunction with the Watson-Crick nearest-neighbors (Table 1). The parameters FIGURE 2: Calculated fraction of species formed by CGTTGCG- TAACG vs temperature at total strand concentration of 1× 10-4 M. The species represented are duplex (9), hairpin (2), and random coil (b). FIGURE 3: Normalized absorbance curves for the single strands of NTS-2 (A) GCGTACGCATGCG, and (B) CGCATGTGTACGC in 1 M NaCl, 10 mM sodium cacodylate, and 0.5 mM Na2EDTA, pH 7.0. For plot A, the curves shown are at concentrations of 2.1 × 10-4, 1.2× 10-4, 6.5× 10-5, 3.5× 10-5, and 1.7× 10-5 M. For plot B, the concentrations are 1.8× 10-4, 1.0× 10-4, 5.2× 10-5, 3.0× 10-5, and 1.4× 1-5 M. FIGURE 4: 500-MHz1H NMR spectra of the exchangeable imino region (9-15 ppm) at 10°C in 1 M NaCl, 10 mM disodium phosphate, and 0.1 mM Na2EDTA at pH 7.0 in 90% H2O/10% D2O of (A) CGTTGCGTAACG and (B) GCGTACGCATGCG/ CGCATGTGTACGC. Assignments are given in Table 6. 10590 Biochemistry, Vol. 36, No. 34, 1997 Allawi and SantaLucia listed in Tables 4 and 5 predict sequences with two-state transitions with average deviations of∆G°37 ) 5.1%,∆H° ) 7.5%,∆S° ) 8.0%, andTM ) 1.4 °C. Previously, we found that a nearest-neighbor model predicted DNA oligo- nucleotide thermodynamics with average deviations in∆G°37, ∆H°, ∆S°, andTM of 4%, 7%, 8%, and 2°C, respectively (SantaLucia et al., 1996). This indicates that the nearest- neighbor model applies equally well to oligonucleotides with only Watson-Crick pairs and those with both Watson-Crick and G‚T mismatches. The validity of the nearest neighbor model for G‚T mismatches is consistent with previous NMR work on G‚T mismatches that showed that structural perturbations from the mismatch are mainly in the vicinity of the mismatch and the nearest base pair (Patel et al., 1984; Hare et al., 1986). The agreement between the observed and predicted imino proton shifts (Table 6) also supports the nearest-neighbor model, since the predictions essentially assume a nearest-neighbor model. X-ray crystallographic data for G‚T mismatch containing sequences also support the notion that structural perturbations from the mismatch are localized (Hunter et al., 1987). Another way to test the applicability of the nearest neighbor model for nucleic acid thermodynamics is to synthesize oligonucleotides with different sequences but with the same nearest-neighbor composition (Kierzek et al., 1986; Sugimoto et al., 1994, 1995). Five pairs of sequences listed as melting with two-state thermodynamics in Table 3 have this property. For example, the duplexes CGTCTGTCC/ GCAGGCAGG and CGTCGGTCC/GCAGTCAGG have different sequences but the same nearest-neighbor composi- tion and their∆G°37, ∆H°, ∆S°, andTM values agree within 0.41 kcal/mol, 6.0 kcal/mol, 18.4 eu, and 0.9°C, respectively. The five pairs of sequences with the same nearest neighbors have average deviations from the mean for∆G°37, ∆H°, and FIGURE 5: 500-MHz1H NMR spectra of the exchangeable imino region (9-15 ppm) at 10°C in 1 M NaCl, 10 mM disodium phosphate, and 0.1 mM Na2EDTA at pH 7.0 in 90% H2O/10% D2O of (A) CGAGCATGTTCG, (B) CGTGACGTTACG, (C) CGTGTCGATACG, (D) CTCGGATCTGAG, (E) CTTGCATG- TAAG, (F) GCAGGTCTGC, (G) GCGATGTCGC, (H) GGAG- TGCTCC, (I) GACCGTGCAC/CTGGTGCGTG, and (J) GACG- TTGGAC/CTGCGGCCTG. Assignments are given in Table 6 (see also Supporting Information). FIGURE 6: Imino proton region (9-15 ppm) of the 1D-NOE difference spectra of CGTGACGTTACG at 10°C in 1 M NaCl, 10 mM disodium phosphate, and 0.1 mM Na2EDTA at pH 7.0 in 90% H2O/10% D2O. (A) Control spectrum (off-resonance irradiation at 15.0 ppm); (B-G) difference spectra between a control spectrum and spectra obtained with 1 s saturation at 13.9, 13.5, 13.0, 12.8, 12.7, 12.0, and 10.2 ppm, respectively. The saturated resonances are indicated by arrows while the observed NOEs are designated by asterisks. Assignments are shown above spectrum A. G‚T Mismatches in DNA Biochemistry, Vol. 36, No. 34, 199710591 ∆S° of 3%, 7%, and 8%, and an average difference inTM of 1.2 °C. These deviations are similar to those observed in RNA (Kierzek et al., 1986) and DNA (Sugimoto et al., 1994). The largest differences are observed for the duplexes CGTGTCTCC/GCACGGAGG and GGAGTCACG/CCTC- GGTGC which show deviations from the mean for∆G°37, ∆H°, and∆S° of 6%, 11%, and 12%, and aTM difference of 2.6 °C. These data indicate the limits of what can be expected from a nearest-neighbor model. Thus, our mis- match parameters in Tables 4 and 5 make predictions within the limits of the nearest-neighbor model. Context Dependence of G‚T Mismatch Thermodynamics. The data in Tables 4 or 5 can be used to predict the thermodynamics of G‚T mismatches in all 16 different nearest-neighbor contexts. The most stable context is CGC/ GTG which contributes-1.05 kcal/mol to duplex free energy at 37 °C. The least stable context is AGA/TTT which contributes+1.05 kcal/mol. The general trend for the nucleotide at the 5′ side of the G of a G‚T mismatch in order of decreasing stability is C> G > T g A. On the 3′ side of the G, the stability order is: Cg G > T g A. Interestingly, these trends are reflected in the parameters in Table 6: Observed and Predicted Chemical Shifts (ppm) of Exchangeable Imino Protons of Oligonucleotides with G‚T Mismatchesa aObserved chemical shifts in 90% H2O, 10% D2O, 1 M NaCl, 10 mM disodium phosphate, 0.1 mM Na2EDTA, pH 7.0, at 10°C. The predicted shifts using the parameters of Arter and Schmidt (1976) and assuming that the unperturbed chemical shift of imino protons in G-C and A-T pairs are 13.7 and 14.6 ppm, respectively. For G‚T mismatches, the unperturbed chemical shift of the imino protons for G and T were assumed to be 11.5 and 12.9 ppm, respectively. In addition, parameters for shielding by G‚T mismatches were assumed to be the same as those for G-C base pairs (see text). Chemical shift assignments given in parentheses are tentative.b For self-complementary sequences only the top strand is shown. For non-self-complementary sequences the two strands are listed in antiparallel orientation.cObserved chemical shifts in 80% H2O, 20% D2O, 0.1 M phsophate, 2.5 mM EDTA, pH 6.37, at-5 °C (Patel et al., 1984). 10592 Biochemistry, Vol. 36, No. 34, 1997 Allawi and SantaLucia Table 5, despite the fact that the parameters in Table 5 should not, strictly speaking, be interpreted physically (see Results). Effects of Terminal and Penultimate G‚T Mismatches. Preliminary data from our lab indicate that terminal G‚T mismatches contribute between-0.4 and-1.0 kcal/mol, depending on the neighboring base pair (Jenkins and SantaLucia, unpublished results). On the other hand, a single G‚T mismatch in the interior of a duplex contributes between +1.05 and-1.05 kcal/mol depending on the context. These data suggest that caution is required when predicting the thermodynamics of duplexes that have mismatches at the penultimate position, particularly when the terminal base pair is A-T. Consider the following self-complementary duplex structures: Our data predict that the structure on the right, without terminal A-T hydrogen bonding, is approximately 2.5 kcal/ mol more stablethan the structure on the left which has terminal A-T hydrogen bonding. This effect is presumably due to unfavorable steric interactions that occur when a G‚T mismatch is placed in the interior of a duplex. Comparison of G‚T Mismatch and Watson-Crick Base Pair Thermodynamics.A comparison between the 10 linearly independent sequences with G‚T mismatches with sequences with only Watson-Crick pairs in which the T in each G‚T mismatch is replaced by a C (i.e., G‚T versus G-C), revealed that there is a relationship that can be drawn (Figure 7). When double-mismatched nearest neighbors (GG/TT, GT/TG, and TG/GT) are excluded, a line with a correlation coefficient of 0.97 can be fit to the equation where∆G°37(GC) and∆G°37(GT) are the Watson-Crick and G‚T mismatch nearest neighbor free energies, respectively. On average, a G‚T mismatch contributes 3.5 kcal/mol less to duplex stability than an equivalent duplex with a G-C base pair. A poorer correlation is observed when comparing G‚T with A-T base pairs (R2 ) 0.56). One interpretation of this result is that the guanine stacking plays a more significant thermodynamic role in the G‚T mismatch than thymine stacking does. The agreement between experimental and predicted NMR chemical shifts provides evidence that stacking in G-C base pairs is similar to G‚T mismatches (see Results). Tandem G‚T mismatches do not correlate with Watson-Crick base pairs (not shown); this suggests that unique stacking interactions are present in tandem G‚T mismatches. Note that the imino proton chemical shifts of G‚T mismatches are predicted better for sequences with single mismatches than for sequences with tandem mis- matches (Table 6). Comparison of DNA G‚T Mismatch with RNA G‚U Mismatch Nearest-Neighbors.He et al. (1991) reported nearest-neighbor analysis of G‚Umismatches in RNA. They found that with the exception of sequences containing GGUC, that the nearest-neighbor analysis applied. Interest- ingly, we find that GGTC in DNA is not exceptional, and the sequence is well predicted by a single set of nearest- neighbor parameters. Figure 7 shows a plot of free energies of 7 linearly independent trimer sequences with G‚T mismatches in DNA vs the equivalent sequence with G‚U mismatches in RNA (excluding RNA G‚U tandem mismatches) (He et al., 1991). The data in Figure 7 can be fitted to a line (R2 ) 0.68) with the following equation: where∆G°37(GU) and∆G°37(GT) are the RNA and DNA trimer free energies, respectively. On average, a G‚T mismatch contributes 2.7 kcal/mol less to DNA duplex stability than an analogous RNA duplex with a G‚U mismatch. Differences observed for DNA versus RNA thermodynamics are most likely due to different stacking interactions observed in B-form versus A-form structures. For comparison, Watson-Crick G-C and A-T pairs in B-form DNA (Table 1) are also less stable than G-C and A-U pairs in A-form RNA (Freier et al., 1986) by 1.02 kcal/ mol, on average. Since G‚T and G‚U form similar hydrogen- bonded wobble pairs it is somewhat surprising how desta- bilizing the G‚T mismatches are in DNA. One possible explanation of why G‚U mismatches are stable in RNA is the presence of a water mediated hydrogen bond between the G-2-amino and the U-2′-hydroxyl oxygen, as revealed by X-ray crystallography (Holbrook et al., 1978, 1991; Hingerty et al., 1978). Another possible explanation for the instability of G‚T is unfavorable steric interactions involving the thymine methyl group. The relative instability of G‚T mismatches in DNA compared with G‚U mismatches in RNA suggests that in addition to DNA’s superior hydrolytic stability compared with RNA, DNA may also be inherently better suited than RNA for high-fidelity replication. ACKNOWLEDGMENT We thank David Hyndman (Advanced Gene Computing Technologies), Christine Chow, and Sandra Shaner for stimulating conversations and Meiko Ogura (Hitachi Chemi- cal Research) for synthesizing oligonucleotides. We thank Jeff McDowell and Douglas H. Turner for providing the program MELTWIN v2.1 for analysis of melting curves. FIGURE 7: Free energy comparison of seven linearly independent single G‚T mismatch sequences (Table 4) vs the equivalent sequences with G‚T mismatches replaced by G-C base pairs (9), A-T base pairs (b), and RNA G‚U mismatches (2). The lines shown are the least-squares fit of the data (R2 ) 0.97 for G-C, 0.56 for A-T, and 0.68 for G‚U) (see text for equations). ∆G°37(GC)) 0.89∆G°37(GT)- 3.52 (18) ∆G°37(GU)) 0.75∆G°37(GT)- 2.79 (19) G‚T Mismatches in DNA Biochemistry, Vol. 36, No. 34, 199710593 SUPPORTING INFORMATION AVAILABLE One table showing experimental versus predicted (using Table 1) thermodynamics of 131 sequences with Watson- Crick base pairs; 10 figures showing 1/TM vs lnCT plots for the 33 sequences presented in Table 2 which are not shown in Figure 1; 11 figures showing the 1D-NOE difference spectra used for peak assignments listed in Table 6; two figures showing the 1D-NMR spectra at different tempera- tures for NTS-1 and NTS-2; one figure showing normalized melting curves for HP-1; and two figures showing plots of ∆H° vs ∆S° (31 pages). Ordering information is given on any current masthead page. REFERENCES Aboul-ela, F., Koh, D., Tinoco, I., Jr., & Martin, F. H. (1985) Nucleic Acids Res. 13, 4811-4824. Albergo, D. D., Marky, L. A., Breslauer, K. J., & Turner, D. H. (1981)Biochemistry 20, 1409-1413. Arter, D. B., & Schmidt, P. G. (1976)Nucleic Acids Res. 6, 1437- 1447. Bevington, P. R. (1969)Data Reduction and Error Analysis for the Physical Sciences, pp 164-186, 187-203, McGraw-Hill, New York. Bhattacharyya, A., & Lilley, D. M. J. (1989)J. Mol. Biol. 209, 583-597. Borer, P. N., Dengler, B., Tinoco, I., Jr., & Uhlenbeck, O. C. (1974) J. Mol. Biol. 86, 843-853. Brown, T. (1995)Aldrichimica Acta 28, 15-20. Brown, T., & Brown, D. J. S. (1991) inOligonucleotides and Analogues(Eckstein, F., Ed.), pp 1-24, IRL Press, New York. Callidine, C. R., & Drew, H. R. (1984)J. Mol. Biol. 178,773- 782. Chou, S.-H., Flynn, P., & Reid, B. (1989)Biochemistry 28, 2422- 2435. Doktycz, M. J., Morris, M. D., Dormandy, S. J., Beattie, K. L., & Jacobson, K. B. (1995)J. Biol. Chem. 270,8439-8445. Efron, B., & Tibshirani, R. (1993)An Intoduction to the Bootstrap, Chapman & Hall, London. Fodor, S. P. A., Rava, R. P., Huang, X. C., Pease, A. C., Holmes, C. P., & Adams, C. L. (1993)Nature 364, 555-556. Freier, S. M. (1993) inAntisense Research and Applications (Crooke, S. T., & Lebleu, B., Eds.) pp 67-82, CRC Press, Boca Raton, FL. Freier, S. M., Kierzek, R., Jaeger, J. A., Sugimoto, N., Caruthers, M. H., Neilson, T., & Turner, D. H. (1986)Proc. Natl. Acad. Sci. U.S.A. 83, 9373-9377. Goldstein, R. F., & Benight, A. S. (1992)Biopolymers 32, 1679- 1693. Goodman, M. F., Creighton, S., Bloom, L. B., & Petruska, J. (1993) Crit. ReV. Biochem. Mol. Biol. 28, 83-126. Gray, D. M., & Tinoco, I., Jr. (1970)Biopolymers 9, 223-244. Hare, D., Shapiro, L., & Patel, D. J. (1986)Biochemistry 25, 7445- 7456. He, L., Kierzek, R., SantaLucia, J., Jr., Walter, A. E., & Turner, D. H. (1991)Biochemistry 30, 11124-11132. Hingerty, B., Brown, R. S., & Jack, A. (1978)J. Mol. Biol. 124, 523-534. Holbrook, S., Sussman, J. L., Warrant, R. W., & Kim, S.-H. (1978) J. Mol. Biol. 123, 631-660. Holbrook, S. R., Cheong, C., Tinoco, I., Jr., & Kim, S.-H. (1991) Nature 353, 579-581. Hunter, C. A. (1993)J. Mol. Biol. 230, 1025-1054. Hunter, W. N., Brown, T,. Kneale, G., Anand, N. N., Rabinovich, D., & Kennard, O. (1987)J. Biol. Chem. 262, 9962-9970. Ikuta, S., Takagi, K., Wallace, R. B., & Itakura, K. (1987)Nucleic Acids Res. 15, 797-811. Johnson, K. A. (1993)Annu. ReV. Biochem. 62, 685-713. Kawase, Y., Iwai, S., Inoue, H., Miura, K., & Ohtsuka, E. (1986) Nucleic Acids Res. 14, 7727-7736. Kierzek, R., Caruthers, M. H., Longfellow, C. E., Swinton, D., Turner, D. H., & Freier, S. M. (1986)Biochemistry 25, 7840- 7846. Klump, H. H. (1990) in Landolt-Bornstein, New Series, VII Biophysics, Vol. 1, Nucleic Acids, SubVol. c, Spectroscopic and Kinetic Data, Physical Data I(Saenger, W., Ed.) pp 241-256, Springer-Verlag, Berlin. Kunkel, T. A., Roberts, J. D., & Zakour, R. A. (1987)Methods Enzymol. 154, 367-382. Leonard, G. A., Booth, E. D., & Brown, T. (1990a)Nucleic Acids Res. 18, 5617-5623. Leonard, G. A., Thomson, J., Watson, W. P., & Brown, T. (1990b) Proc. Natl. Acad. Sci. U.S.A. 87, 9573-9576. Lippens, G., Dhalluin, C., & Wieruszeski, J.-M. (1995)J. Biomol. NMR 5, 327-331. Longfellow, C. E., Kierzek, R., & Turner, D. H. (1990)Biochem- istry 29, 278-285. Marky, L. A., & Breslauer, K. J. (1987)Biopolymers 26, 1601- 1620. McDowell, J. A., & Turner, D. H. (1996)Biochemistry 35, 14077- 14089. Mendelman, L. V., Boosalis, M. S., Petruska, J., & Goodman, M. F. (1989)J. Biol. Chem. 264, 14415-14423. Modrich, P., & Lahue, R. (1996)Annu. ReV. Biochem. 65, 101-133. Patel, D. J., Kozlowski, S. A., Ikuta, S., & Itakura, K. (1984)Fed. Proc. Fed. Am. Soc. Exp. Biol. 43, 2663-2670. Petersheim, M., & Turner, D. H. (1983)Biochemistry 22, 256- 263. Petruska, J., Goodman, M. F., Boosalis, M. S., Sowers, L. C., Cheong, C., & Tinoco, I., Jr. (1988)Proc. Natl. Acad. Sci. U.S.A. 85, 6252-6256. Piotto, M., Saudek, V., & Sklenar, V. (1992)J. Biomol. NMR 2, 661-665. Plum, G. E., Grollman, A. P., Johnson, F., & Breslauer, K. J. (1995) Biochemistry 34, 16148-16160. Press, W. H., Flannery, B. P., Teukolsky, S. A., & Vetterling, W. T. (1989)Numerical Recipes, pp 52-64, 498-520, Cambridge University Press, New York. Raap, J., van der Marel, G. A., van Boom, J. H., Joordens, J. J. M., & Hilbers, C. W. (1985)Fourth ConVersation in Biomo- lecular Stereodynamics(Sarma, R. H., Ed.), p 122a, Adenine Press, Guilderland, NY. Ratmeyer, L., Vinayak, R., Zhong, Y. Y., Zon, G., & Wilson, W. D. (1994)Biochemistry 33, 5298-5304. Richards, E. G. (1975) inHandbook of Biochemistry and Molecular Biology: Nucleic Acids(Fasman, G. D., Ed.) 3rd ed., Vol. 1, p 597, CRC Press, Cleveland, OH. Saiki, R. K., Gelfand, D. H., Stoffel, S., Scharf, S., Higuchi, R. H., Horn, G. T., Mullis, K. B., & Erlich, H. A. (1988)Science 239, 487-494. SantaLucia, J., Jr., Kierzek, R., & Turner, D. H. (1990)Biochemistry 29, 8813-8819. SantaLucia, J., Jr., Kierzek, R., & Turner, D. H. (1991)J. Am. Chem. Soc. 113, 4313-4322. SantaLucia, J., Jr., Allawi, H. T., & Seneviratne, P. A. (1996) Biochemistry 35, 3555-3562. Steger, G. (1994)Nucleic Acids Res. 22, 2760-2768. Sugimoto, N., Honda, K., & Sasaki, M. (1994)Nucleosides Nucleotides 13, 1311-1317. Sugimoto, N., Nakano, S., Katoh, M., Matsumura, A., Nakamuta, H., Ohmichi, T., Yonegama, M., & Sasaki, M. (1995)Biochem- istry 34, 11211-11216. Sugimoto, N., Nakano, S., Yoneyama, M., & Honda, K. (1996) Nucleic Acids Res. 24, 4501-4505. Tibanyenda, N., De Bruin, S. H., Haasnoot, C. A. G., van der Marel, G. A., van Boom, J. H., & Hilbers, C. W. (1984)Eur. J. Biochem. 139, 19-27. Vologodskii, A. V., Amirikyan, B. R., Lyubchenko, Y. L., & Frank- Kamenetskii, M. D. (1984)J. Biomol. Struct. Dyn. 2, 131-148. Wallace, R. B., Shaffer, J., Murphy, R. F., Bonner, J., Hirose, T., & Itakura, K. (1979)Nucleic Acids Res. 6, 3543-3557. Werntges, H., Steger, G., Riesner, D., & Fritz, H.-J. (1986)Nucleic Acids Res. 14, 3773-3790. Wolfram, S. (1992)MATHEMATICA, Version 2.1, Wolfram Research, Inc. Wu, M., McDowell, J. A., & Turner, D. H. (1995)Biochemistry 34, 3204-3211. BI962590C 10594 Biochemistry, Vol. 36, No. 34, 1997 Allawi and SantaLucia
# this function will calculate the entropy, enthalpy and energy released. def calc(str) : # Initialising the values with 0. h = 0 s = -1.4 g = 0.4 if (str[0] == 'A' or str[len(str) - 1] == 'A' or str[0] == 'T' or str[len(str) - 1] == 'T' ): initH = 2 * 2.3; initS = 2 * 4.1; initG = 2 * 1.03; if (str[0] == 'G' or str[len(str) - 1] == 'G' or str[0] == 'C' or str[len(str) - 1] == 'C'): initH = 2 * 0.1; initS = 2 * -2.8; initG = 2 * 0.98; #Taking first character of DNA temp = str[0] #Take pairs from first element to last element for i in range(1, len(str)): temp1 = f'{temp}{str[i]}' #using standard values from chart given in BP thermo paper if temp1 == 'AA' or temp1 == 'TT': h = h + (-7.9) s = s + (-22.2) g = g + (-1) elif temp1 == 'AT' : h = h + (-7.4) s = s + (-20.4) g = g + (-0.88) elif temp1 == 'TA' : h = h + (-7.2) s = s + (-21.3) g = g + (-0.58) elif temp1 == 'CA' : h = h + (-8.5) s = s + (-22.7) g = g + (-1.45) elif temp1 == 'GT' : h = h + (-8.4) s = s + (-22.4) g = g + (-1.44) elif temp1 == 'CT' : h = h + (-7.8) s = s + (-21) g = g + (-1.28) elif temp1 == 'GA' : h = h + (-8.2) s = s + (-22.2) g = g + (-1.30) elif temp1 == 'CG' : h = h + (-10.6) s = s + (-27.2) g = g + (-2.17) elif temp1 == 'GC' : h = h + (-9.8) s = s + (-24.4) g = g + (-2.24) elif temp1 == 'GG' or temp1 == 'CC': h = h + (-8.0) s = s + (-19.9) g = g + (-1.84) elif temp1 == 'AT' or temp1 == 'TG': h = h + (-2.5) s = s + (-8.3) g = g + (0.07) #Moving to next character of DNA temp = str[i] #In these adding initial values which are initial energy, Terminal energy and #Symmetry h = h + initH s = s + initS g = g + initG print (h) print (s) print (g) #Returning the value return h , s , g #calling the function to get values. calc('AATTGTGTGTCCACGTGCA')

Related Questions

Similar orders to Python DNA Free Energy, Enthalpy, and Energy Analyzer
4
Views
0
Answers
Make code and test for a pac man type game.
At the very least I would need the StudentAttackerController.Java code. The code would need to be tested by the “TestAgent - Scored” configuration that would run 100 trials and give an average score at the end. The average score for the code need to have a score 6500 or higher....
14
Views
0
Answers
Gameboard with hidden tiles; three guesses to pick all safe squares; gameboard based on files from user input (included)
For this assignment, you will be writing a game. The rules of the game are simple you have a game board with hidden tiles. You have to make three guesses if you pick all safe squares, you win, if you select a bad square, you lose. All game boards have are square, meaning both sides have the same length. The game board size will never be larger than 25 or below 1. The gameboard contains only 4 possible values (3 are requirements, the 4th is for bonus). If the square contains a 'S' it is safe. Meaning you have successfully made one guess. If the square contains a 'B' you lose. Meaning the game is over. If the square contains a 'R' you must pick again, this guess doesn’t count towards your guesses. If the square contains a 'C' it safely clears the area. (See bonus information in code) A game board file has the format ( for a size of 3. ) 3 SSS SRS SBS The game ends after successfully guessing three safe squares or guessing a losing square. After each guess, your program should display the contents of the gameboard square and print the corresponding message. Assignment Requirements Unlike previous examples, this assignment provides a starting point. Would you please c...
7
Views
0
Answers
C++ computer science programming project
Add the ability for your Canadian Experience scene to have two running machines. At some point in your animation you will indicate that a machine is to start running. You might have your character pretend to flip a switch, for example. From there on the machine will run continuously. You will create your code in the library MachineLib. It is an additional project in your solution. You will get your solution working using the MachineDemo project, also an additional project in your solution. MachineDemo will allow you to work on just the machines, getting them working first. Then you will integrate them into your project. I attach some of instructions...
18
Views
0
Answers
this assignment is to do it on Mysql workbench
use Mysql workbench to creat all queries see the watched file and follow everything on it it has everything you need to know. please let me know if you have any question...