Home
People
Research
Publications
Tutorials
Contact Us
Site Map
Quantum-Mechanical Calculation of Absolute Binding Free Energy of p56lck SH2 domain against phosphopeptide inhibitors 
 

SH2 domain

Protein tyrosine kinase p56lck (lymphoid T cell tyrosine kinase) is a member of the homologous family of Src kinases. It is primarily expressed in T-lymphocytes where it plays important role in triggering immune response via T-cell activation. The protein p56lck structurally consists of several domains including N-terminal domain representing a unique region, catalytic kinase SH1 domain, and two protein-protein binding domains SH2 and SH3. Among these particularly important is SH2 (Src homology 2) domain. It carries a fundamental role in signal transduction and binds with high affinity and specificity to peptide motifs containing the sequence pYEEI.

 

Biological role of SH2 domain is to convey signal transduction by converting an extracellular event into intracellular response. The signal is initiated by a phosphorylation at tyrosine site of receptor proteins attached to cell surface. Binding of SH2 domain to its specific phosphorylated protein partner initiates a downstream cascade of signal transduction events. The ability to control this chain of events has important medicinal implications. Modulating the ability of p56lck SH2 domain to form protein-protein complex with its binding partners has therapeutic utility in treatment of cancer, osteoporosis, and autoimmune diseases. Therefore, SH2 domains are attractive targets for the development of therapeutic agents against various human diseases.

 

SH2 domains have a remarkable degree of selectivity toward specific phosphotyrosyl motifs which helps them recognizing their target protein binding partners.  A peptide library screen is a powerful tool to study structural specificity of SH2 domain and to search for novel ligand structural motifs that can be eventually redesigned into non-peptide therapeutics. In typical wet lab experiments tens to hundreds peptides can be screened against the target. Although providing valuable information about binding affinity these experiments bring no information about structural aspects of the protein-ligand binding. Due to high cost of crystallographic experiments, only few peptides in the series can eventually undergo X-ray structural analysis to reveal the atomistic details of binding. Such structural information is extremely important in order to understand the binding mechanism and subsequently to prepare new rounds of screening experiments in the search of new potent therapeutics.

 

Calculation of Protein-Ligand Binding Free Energy

Complementing the screening and structural experiments are computational studies of binding free energy. The most rigorous methods for calculation of binding affinities involve alchemical or structural transformations such as free energy perturbation (FEP) and thermodynamic integration (TI) which are based on reversible decoupling the ligand from its surrounding. The accuracy of these methods relies on equilibrium sampling of the entire transformation path, from an initial to a final state. However, calculating the absolute binding affinity G of complexes of SH2 domain with peptides is not practical in the perturbative approach because of large magnitude of electrostatic interactions between the highly and oppositely charged phosphotyrosine ligand and SH2 domain and very large hydration free energy of the phosphopeptides. The net charge on Ac-pYEEI, a nano-molar binding peptide, is -5 electron units. The SH2 domain in its turn carries the net charge of +4 electron units. Thus it is apparent that the binding of SH2 domain with phosphopeptides is accompanied by very large electrostatic forces.

 

The limited applicability of TI and FEP methods and their very high computational cost inspire the development of alternative methods. A popular family of free energy calculation methods is represented by end-point MM-PBSA and MM-GBSA methods, which are computationally much more affordable, easy to use and provide reasonable accuracy. In these models only the initial and final states of the system are evaluated, thus leading to significant reduction in computational cost. The end-point methods are typically based on partitioning the free energy into a sum of enthalpic and entropic contributions. In this approach one performs a conventional MD simulation of the complex in a periodic water box with counterions, and the resulting trajectory is then postprocessed by removing solvent and periodicity, and calculating the average free energy over a series of static frames from complex, receptor, and ligand according to equation:

 

                                               (1)

 

Finally, the binding affinity is computed as difference in free energy of the complex and its constituents

 

                                               (2)

 

In these equations angular brackets denote average of thermodynamic entities over dynamics trajectory; T is simulation temperature; E represents internal energy of the molecular systems; Gpolar is electrostatic free energy term due to solvation effects treated via implicit solvent model; Gnonpolar is the non-polar term including dispersion interaction between solute and solvent and energetic penalty for creating cavity in the bulk liquid. The non-polar term is usually approximated via solvent accessible surface area (SASA) in the form

 

                                                              (3)

 

where ξ is a phenomenological surface tension coefficient treated as a parameter of the model.

 

The entropy, S, is computed from rotational and translational loss by the ligand and from vibrational term associated with change in intramolecular motion of the constituents upon binding.

 

                            (4)

 

                          (5)

 

Here Rgas is universal gas constant, kcal/(mol×K); T is simulation temperature, K; x2∆y2∆x2 is  variance of center of mass of ligand from its average position; α2β2γ2 is variance of rotation angle around x,y,z axes from the average orientation of ligand in the protein frame. (J. Swanson, Biophys. J. 2004, 86, 67-74) The vibrational entropy term is obtained from normal mode analysis.

 

Failure of MM-PBSA on SH2 domain

Targeting SH2 domain Woo and Roux reported binding affinity of -82.8 kcal/mol for Ac-pYEEI peptide by using MM-PBSA method and CHARMM force field,(H. Woo, Proc. Natl. Acad. Sci. U.S.A. 2005, 102, 6825-6830) the value which is an order of magnitude larger than the experimental binding affinity of -9.4 kcal/mol. This is the most serious failure of the MM-PBSA protocol. According to classical electrostatics the implicit Poisson-Boltzman solvent model provides the most accurate solution of the electrostatic problem of a solute placed in dielectric medium. The failure of MM-PBSA and MM-GBSA to meet this expectation on SH2 domain raises concerns about validity of this computational strategy. Specifically a single set of atomic charges seems to be inadequate for the studies that require simultaneous determination of hydration free energy and binding affinity. The electrostatic environment of protein-ligand interactions is presumably different from the condition of solute-solvent interactions, the fact which is typically attributed to electronic polarization effect. Particular sensitivity of MM-PBSA method to the internal inconsistency of electrostatic model of fixed charges originates from manipulation with absolute values of hydration free energy of interacting molecules and from adding and subtracting very large values which makes the resulting binding free energy highly susceptible to errors.

 

In order to improve the end-point free energy methods one should go beyond the fixed-charge approximation  Such computations are routinely affordable based on the semiempirical variational finite localized molecular orbital (VFL) approximation and linear scaling LocalSCF method using implicit solvent model COSMO.

 

Quantum-mechanical free energy calculations (MM-QMSA)

In the conventional end-point free energy methods the conformation sampling is typically performed at nanoseconds time-length by using molecular dynamics. Considering the possibility of formulating the end-point protocol based on QM platform it would be rather impractical to run the conformation sampling based on QM MD. The requirement to sample the conformational space for nanoseconds-length time-duration while retaining the computational efficiency of the end-point methods suggests using biomolecular force fields to generate the molecular dynamics trajectory.

 

Although direct QM rescoring of potential energy of the complex, protein, and ligand taken at their force field generated geometry seems the way to go, it may result in significant errors. The differences in minimum energy geometries generated by force field and semiempirical QM methods, as attributed to differences in their parameterization, are typically small, but lead to sensible energy differences. For example, 100 steps of geometry optimization of p56lck SH2 domain in complex with Ac-pYEEI peptide performed by PM3 COSMO method starting from Amber MD snapshot geometry produce 0.34 Å RMSD between the initial and final structures. Although this change in geometry is rather small it corresponds to a very large energy difference of 2220 kcal/mol representing the net energy gain due to the QM geometry optimization. This energy difference will be the primary source of error when QM energy scoring is performed directly on empirically generated geometry. Indeed, no such large energy strain would be present if a trajectory is generated at the corresponding QM MD level. Ignoring this fact would violate the energy fluctuation range imposed by Boltzmann distribution at the temperature of simulation. This stress if left untreated would ruin the free energy calculations, since those atomic configurations that cannot be sampled in QM MD at experimentally relevant time length cannot contribute to physical observables. To address this problem, QM energy calculations over empirically generated snapshots should always include geometry relaxation at the QM level. The geometry optimization would minimally adjust the position in configuration space from the point on the force field energy hypersurface to the closest point on QM hypersurface.

 

Based on conceptual similarity with MM-PBSA and MM-GBSA protocols, the new method of calculation of binding free energy is termed MM-QMSA method. A result of its application to 5 complexes is presented in Table 1. QM calculations using default COSMO radii are labeled as PM3 std. The PM3 opt section of the table presents binding free energy results derived from the use of optimized COSMO radii. QM calculations were performed on 100 Amber MD snapshots by subjecting complex, protein and ligand to 100-steps geometry optimization under implicit solvent model COSMO.

 

Accuracy Analysis of the End-Point Methods

 

Table 1. Binding free energy from MM-PBSA, MM-GBSA and MM-QMSA calculations, kcal/mol

Ligand

ΔGPBSA

ΔGGBSA

ΔGPM3std

ΔGPM3opt

ΔGexp

Ac-pYEEI

-72.3±1.6

-60.7±1.8

9.4±2.2

-9.1±2.1

-9.4

Ac-pYEEG

-72.2±2.0

-57.7±2.2

11.2±2.7

-6.1±2.4

-7.9

Ac-pYEEA

-80.1±8.8

-61.2±5.9

10.8±2.2

-7.3±2.1

-7.8

Ac-pYEAI

-73.7±12.0

-57.4±8.4

9.8±2.2

-7.9±1.8

-8.2

Ac-pYAEI

-79.8±3.2

-65.3±2.7

7.4±2.5

-9.6±2.2

-8.7

 

As one can see from Table 1, there is a significant improvement over the  MM-PBSA and MM-GBSA data due to QM calculations. Although the PM3 std method with default COSMO radii leads to positive values of ΔGbind, the computed values nevertheless reproduce remarkably well the relative trend in experimental binding affinities. Additional improvement to the theoretical predictions comes from the use of optimized COSMO radii. The corresponding PM3 opt results show the best agreement with the experimental data. The optimized COSMO radii are derived by fitting to experimental hydration free energy of 98 small molecules.

 

This improvement in accuracy due to QM calculations comes at surprisingly low computational cost. A single QM COSMO job of geometry optimization of about 1700 atoms’ system takes about 8 hours of 1-CPU time on 2.2GHz AMD Opteron processor while performing 100 steps of geometry optimization. This computation requires 800MB of computer memory (RAM) per single-processor job. We completed the entire series of 5 complexes in 8 hours by sending each QM job to a dedicated processor, having 1000 jobs executed in parallel. The QMSA part of the protocol takes less time than it is necessary to run the force field MD in order to accumulate 10 ns trajectory. In addition to that the QM part in the MM-QMSA calculation is easier to set up than a standard force field MD simulation. All this makes the MM-QMSA method attractive for virtual screening experiments. The QM protocol can complement MM-PBSA and MM-GBSA in the situations when the latter fail or when the study may benefit from additional validation.

 

Copyright (c) 2010 Victor Anisimov