Database of predicted protein-ligand binding/docking score

Database of predicted protein-ligand binding/docking score

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

Is there a database of computationally determined "binding scores" across protein-ligand pairs in the PDB?

My phrase "binding score" is deliberately generic, since any such score would be fine. In particular, some docking energy, i.e. scoring function, would be great. (Don't worry, I am aware that these docking energies are not perfect :) ).

I am looking for comprehensive coverage and the same score across bound pairs, meaning that experimental measures are obviously impossible; however, I am hoping there are people who ran some docking scoring function across all the protein-ligand pairs in the PDB, or at least some large number of them.

I am aware of the many database which hold experimentally determined binding data of protein-ligand structural complexes. For instance, Binding Moad, Binding DB, PDB-Bind, and SC_PDB are all good for what they are meant to do. However, they have (1) poor coverage (i.e. not all the complexes have affinity data or they don't include some complexes) and (2) use heterogeneous scores/measures (e.g. IC50, KD, etc… ).

This is not the same as this question, since I am looking for comprehensive coverage and computationally determined scores, not experimentally determined ones.

How to use X-Score?

The basic function of X-Score is to compute the binding score of a given ligand molecule (or multiple ligand molecules) to a target protein. The protein is required to be stored in a PDB file, and the ligand molecule(s) should be stored in a Sybyl/Mol2 file. All the parameters needed to run X-Score are assembled in an input file (click here to view an example). You are supposed to edit this file to meet your own purpose.

To run X-Score, simply use this file as input:

Example: xscore name_of_the_input_file

The parameters specified in the input file will be explained in detail below. You can find an example input file under the " example /" directory.

X-Score needs the three-dimensional structure of the protein-ligand complex to calculate its binding constant. The structure could be either experimentally determined or modeled by a docking program. Since most today's molecular docking programs will keep the protein rigid while docking the ligand(s), for the sake of efficient computing, X-Score requires the protein and the ligand(s) to be stored in two separate files.

The three-dimensional structure of the protein is supposed to be stored in the PDB format. When preparing this PDB file, please remember: (1) Add polar hydrogen atoms. Add all hydrogen atoms will not hurt but only polar hydrogen atoms are needed in computation. (2) Remove any ligand or other organic cofactors. (3) Remove all the water molecules. (4) If a metal ion exists inside the binding site and is believed to be important for ligand binding, keep it as part of the protein. X-Score considers this kind of metal ions in computation. According to the PDB convention, a metal ion should be described by a line started by "HETATM". Please be cautious because not every software writes standard PDB file! Some of them, such as SYBYL v6.8, are even unable to export metals to a PDB file. If this happens, you may want to manually add a line for the metal to the PDB file. (5) Occasionally, there are some non-natural residues on the protein, X-Score will neglect them if it cannot find appropriate parameters for them in the parameter libraries.

The parameter RECEPTOR_PDB_FILE in the input file specifies the path and the name of this file.

Sometimes there are an organic cofactor binding together with the ligand molecule inside the binding pocket, such as CoA, NADH, and etc., and maybe you want to keep it at its place when evaluating binding affinities of the ligand molecules. X-Score does provide such an option: it will treat the cofactor as part of the protein. Since this kind of cofactors are usually not formed by any standard building blocks, the PDB format is not proper for presenting them. But for instead, you can save the cofactor in SYBYL MOL2 format and specify its path and name with the COFACTOR_MOL2_FILE parameter in the input file. Note that: (1) Atom types and bond types in this molecule should be assigned correctly according to SYBYL's definition. (2) All hydrogen atoms should be added onto it. (3) This cofactor molecule should share the same coordinate system with the protein and the ligand. (4) If the cofactor is originally covalently bound to the protein, you need to remove that bond to keep the cofactor as a separate molecule. This modification will not affect X-Score's computation.

The primary thing that should be kept in mind is: the ligand molecules must be pre-docked into the binding pocket of the target protein. X-Score is not a docking program. It can only calculate the binding affinities for given protein-ligand complexes.

X-Score requires the ligands to be stored in the SYBYL Mol2 format. Naturally, we recommend SYBYL for such preparation. Please make sure that the atom type and the bond type assignments are correct according to the Tripos force field conventions. X-Score is able to identify and correct some common errors in atom typing and bond typing but certainly cannot handle all the possible situations. Other molecular modeling software may also support the Mol2 format. There are also some programs, such as Babel, which are specially designed for converting different formats. However, our experience is that format conversion is not always carried out in a flawless way.

All hydrogen atoms need to be added to the ligand molecules. Atomic charges are not necessary for X-Score computation.

If there is more than one ligand molecule, all of them should be packed one after one in this file. This is often referred as a "multiple" Mol2 file. Since handling a very large file will probably slow down your computer significantly, we do not recommend you to pack too many molecules in one file. A generally acceptable limit is 100,000 molecules (approximately several hundred MB in terms of the file size). If you have to process even more ligands, you may split them into several Mol2 files and run X-Score for each of them respectively.

The parameter LIGAND_MOL2_FILE in the input file specifies the path and the name of this file.

In the input file, all the lines started with "#" are notations and are neglected by the program.

The parameter FUNCTION should be set to " SCORE ". This tells the program to perform X-Score computation.

As we have mentioned in the Introduction section, there are three scoring functions implemented in X-Score, i.e. HPScore, HMScore, and HSScore. You can find three switches in the input file, i.e. APPLY_HPSCORE , APPLY_HMSCORE , and APPLY_HSSCORE . You may set any of them as " YES " or " NO " to choose the combination you like. If all the three scoring functions are switched on, typically X-Score can process

10,000 molecules an hour on an SGI Octane2/R12000/360MHz workstation.

Another feature of X-Score is the option of pre-screening the ligands by molecular properties. This is also well known as "Lipinski rules" in drug design, which are some crude judgments of "drug-likeness". Many approaches have suggested that, by applying such chemical rules, false positives observed in virtual screening can be effectively reduced. There are nine parameters in the input file to set such chemical rules:

A general-purposed set of these chemical rules could be: molecular weight between 200 to 600 LogP between 1 and 5 Number of donor atoms below 6 and number of acceptor atoms below 6. Here logP values are calculated by using the XLOGP2 algorithm .

All the results will be summarized in a text table. The OUTPUT_MER_FILE parameter in the input file specifies this table. The first line of this table is a title line. Every following line denotes for a single ligand molecule (click here to view an example) . The meaning of each column is:

  • The 1st column: rank of the ligand. All the ligands are ranked in a decreasing order according to the average predicted binding affinities
  • The 2nd column: molecular formula
  • The 3rd column: molecular weight
  • The 4th column: LogP value
  • The 5th column: docking energy given by DOCK (kcal/mol), if the input ligand Mol2 file is generated by DOCK
  • The 6th column: binding affinity given by HPScore (in pKd units)
  • The 7th column: binding affinity given by HMScore (in pKd units)
  • The 8th column: binding affinity given by HSScore (in pKd units)
  • The 9th column: the average predicted binding affinities (in pKd units), calculated by averaging all the enabled scoring functions
  • The last column: name of the molecule, as extracted from the Mol2 file

This table is organized in the SYBYL MERGE format. If you open a spreadsheet in SYBYL, you can import this table directly. But since this table is a standard comma-seperated text file, you can also use any other spreadsheet programs, such as Excel and Origin, to load this table.

X-Score also allows you to extract the best-ranked candidates and save each of them in a separate Mol2 file for the convenience of further analysis. The last two parameters in the input file are denoted for this:

There is one more parameter in the input file: CALCULATE_ATOM_BIND_SCORE. It can be set to " YES " or " NO ". If it is set to "YES", the program will calculate the contribution of each individual atom to the overall binding affinity of the ligand molecule. These values will be written in the Mol2 file as the atomic charges when the ligand is saved. Therefore, you can inspect them by displaying atomic charges when you view the molecule. Turning on CALCULATE_ATOM_BIND_SCORE has very little impact on the speed of computation .

Illustration of Atomic Binding Score (in pKd units)

This "atomic binding score" usually gives you a good idea of which portion of the ligand molecule contributes more to the binding affinity. Accordingly, you can optimize the molecule by enhancing the "good" parts or eliminating the "bad" parts. We, as well as many users, have found this concept useful for structure-based drug design.

The standard way for running X-Score, which has been described above, is suitable for scoring multiple ligand molecules against a given target. This is typically seen in a virtual database screening application. But sometimes the user just wants to score one particular ligand against its target and get a fast feedback. X-Score provides a shortcut for this purpose:

xscore the_protein_PDB_file the_ligand_Mol2_file

xscore the_protein_PDB_file the_cofactor_Mol2_file the_ligand_Mol2_file

In such cases, the following parameters are automatically set by the program as:


Exploring protein-ligand interactions is essential to drug discovery and chemical biology in navigating the space of small molecules and their perturbations on biological networks. Such interactions are essential to developing novel drug leads, predicting side-effects of approved drugs and candidates, and de-orphaning phenotypic hits. Therefore, the accurate and extensive validation of protein-ligand interactions is central to drug development and disease treatment. Experimentally determining and analysing protein-ligand interactions can be challenging 1,2 , often involving complex pull-down experiments and orthogonal validation assays. Therefore, multiple efforts have been dedicated to developing rapid computational strategies to predict protein-ligand interactions for prioritizing experiments and streamlining the experimental deconvolution of the interaction space. For example, docking simulations, in which the 3D-structure of the target is used to evaluate how well individual candidate ligands bind to a structure, have been productively applied to identify novel interactions between clinically relevant targets and small molecules 3,4 . Appreciably, docking simulations are unfeasible when 3D structures of targets (e.g., those derived from crystallization and X-ray diffraction experiments) are not available, as exemplified by many G protein-coupled receptors (GPCRs), which are membrane-spanning proteins that are inherently difficult to crystallize. Conversely, ligand-based methods (e.g., fingerprint similarity searching, pharmacophore models, and machine learning approaches) are increasingly applied in research and development for the prediction of on- and off-target interactions, but often require large amounts of available ligand data to achieve the desired predictive accuracy. Another widely used computational strategy is text mining, which uses databases of scientific literature such as PubMed 5 . Text mining relies on keyword searching and is limited in its capability to detect novel bindings. The process can be further complicated by the redundancy of compound or protein names in the literature 6 .

Recently, to circumvent the shortcomings of the ligand- and target-based methods and to benefit from all available information, computational chemogenomics (or proteochemometric modelling) has emerged as an active field of predictive modelling. Here, the study of protein-ligand interactions simultaneously combines the protein target and ligand information with machine learning approaches to provide valuable insights into the interaction space. For example, several methods exist that are capable of predicting target protein families and binding sites based on the known structures of a set of ligands 7,8,9,10 . However, with scant information about the actual proteins, predicted interactions are, at best, only between the known ligands and different protein families. Some approaches, which are target-centric, make full use of the protein features, but fail to predict interactions of orphan ligands as the latter have no known links to any proteins 11 . Several methods have been proposed to consider both the protein sequences and ligand chemical structures simultaneously in prediction 12,13,42 .

We hypothesised that chemogenomic modelling could profit from including not only information on the ligand and protein similarity but also explicitly on the pharmacological interaction space and hence the relationship between the ligands and the proteins (Fig. 1a). The combined information is composed of three sub-spaces, the shape of which resembles a bow tie, hence the name bow-pharmacological space. It covers a protein space that encodes protein sequence features, a ligand space that contains the fingerprints of chemical compounds, and an interaction space, coded by known interactions that connect the protein and ligand. Furthermore, we describe a novel prediction model by applying Bayesian Additive Regression Trees (BART) and other machine learning methods on these combined features from protein, ligand, and interaction information. Feature selection as well as subsampling experiments highlighted the utility of all the available descriptor subspaces and hence of the bow-pharmacological space (BOW space) newly developed here. Compared to other classical machine learning algorithms, the BART algorithm outperformed all tested methods and demonstrated good prediction power (94–99% accuracy on different datasets). Furthermore, BART can provide a quantitative description of the likelihood of predicted interactions and thereby provide an important measure of predictive uncertainty. In addition to retrospective analysis, we also highlight one exemplary prediction for a novel ligand of the KIF11 protein that was successfully validated using a docking simulation and subsequently confirmed by a crystallography study executed by an independent research group.

Bow-pharmacological space. (a) The bow-pharmacological space spans three subspaces: protein space in blue, ligand space in green, and interaction space in pink. Filled circles represent proteins and triangles represent ligands. Protein–ligand pairs of known interactions from published databases are denoted as “known” whereas those not curated in the databases are denoted as “new.” Solid lines indicate known interactions in the interaction space while dashed lines illustrate three kinds of unknown interactions ( ① unknown protein with known ligand, ② known protein with unknown ligand, ③ unknown protein with unknown ligand). (b) Features in bow-pharmacological space.



50 protein-ligand complexes were selected from three different databases to ensure a wide range of systems with good resolution: 20 structures were taken from the Ligand-Protein Database ( [15], 12 from the training database of X-Score [16], and 18 structures were added from the Protein Data Bank [17]. All but four structures have a resolution of 2.5 Å or better (Table 1, Supplementary Material). The dataset is non-redundant with each protein and ligand represented only once.

Docking and scoring

Ligands extracted from Protein Data Bank files were protonated and transformed into mol2 format using InsightII (Accelrys, Inc., San Diego, CA). To ensure thorough conformational sampling of each ligand, low-energy conformations were generated for each molecule using Omega version 2 (OpenEye Scientific Software, Santa Fe, NM). SLIDE [18�] version 3.1 was used to dock the conformers into the binding sites of their corresponding target proteins from the complex structures. Protein side chains and ligands were treated flexibly during the docking. The number of docked ligand orientations obtained with this protocol was between 71 and 1000, with an average of 622 dockings per complex. By default, each docked pose was scored with SLIDE’s recently updated scoring function AffiScore, which is a weighted sum of hydrophobic contacts, hydrogen bonds, salt bridges, metal interactions, unsatisfied and repulsive polar interactions (Zavodszky MI, Tonero ME, He L, Arora S, Namilikonda S, and Kuhn LA, unpublished data). This was followed by rescoring all poses using the original implementation of DrugScore [21] and X-Score [16]. When target side-chains were rotated during ligand docking, the changed protein conformations were used when scoring the docked ligands with all three scoring functions.

Correlation-Based Scoring Enhancement

The correlation-based scoring enhancement method has been described previously [13]. It is summarized briefly in the following: Taking one complex at a time, Pearson correlation coefficients are calculated between score and root mean square deviation (RMSD) for each docked pose used as a reference state. The resulting correlation coefficient is then assigned to the reference pose as the new correlation-based score (CBScore). The calculation of the Pearson correlation coefficient assumes a linear relationship between score and RMSD, yet empirical observation confirms a non-linear relationship ( Figure 1 ). A comparison of the correlation coefficients of scores vs. RSMD with correlation coefficients of scores vs. lnRMSD confirmed that scores correlated better with the logarithm of the RSMD ( Figure 2 ). Consequently, the original method was modified to calculate this score as the correlation between the scores and the natural logarithm of the RMSD in this study. The formula for calculating this modified CBScore for pose i is:

where dij is the RMSD between poses i and j, sj is the original score of pose j, and N is the number of docked poses. It is worth noting that for a given pose i, the correlation-based score CBScorei does not depend on the original score si of that pose.

An exponential function of the type y=P1+P2*(1-exp(-x/P3)) (solid line) provides a better fit to the DrugScore versus RMSD data than a linear function (dashed line) as illustrated for the complex 1eed (A). As a consequence, a linear correlation exists between the scores and the natural logarithm of the RMSD (B).

Correlation coefficients between score and RSMD vs. correlation coeffcients between score and lnRSMD for AffiScore (A), DrugScore (B), and X-Score (C). Data points above the diagonal represent cases in which scores correlate better with the logarithm of the RSMD than with RMSD.

VoteDock: Consensus docking method for prediction of protein–ligand interactions

Molecular recognition plays a fundamental role in all biological processes, and that is why great efforts have been made to understand and predict protein–ligand interactions. Finding a molecule that can potentially bind to a target protein is particularly essential in drug discovery and still remains an expensive and time-consuming task. In silico, tools are frequently used to screen molecular libraries to identify new lead compounds, and if protein structure is known, various protein–ligand docking programs can be used. The aim of docking procedure is to predict correct poses of ligand in the binding site of the protein as well as to score them according to the strength of interaction in a reasonable time frame. The purpose of our studies was to present the novel consensus approach to predict both protein–ligand complex structure and its corresponding binding affinity. Our method used as the input the results from seven docking programs (Surflex, LigandFit, Glide, GOLD, FlexX, eHiTS, and AutoDock) that are widely used for docking of ligands. We evaluated it on the extensive benchmark dataset of 1300 protein–ligands pairs from refined PDBbind database for which the structural and affinity data was available. We compared independently its ability of proper scoring and posing to the previously proposed methods. In most cases, our method is able to dock properly approximately 20% of pairs more than docking methods on average, and over 10% of pairs more than the best single program. The RMSD value of the predicted complex conformation versus its native one is reduced by a factor of 0.5 Å. Finally, we were able to increase the Pearson correlation of the predicted binding affinity in comparison with the experimental value up to 0.5. © 2010 Wiley Periodicals, Inc. J Comput Chem 32: 568–581, 2011

Additional supporting information may be found in the online version of this article.

Filename Description
JCC_21642_sm_SuppTable1.doc1.2 MB Supporting Information Table 1.
JCC_21642_sm_SuppTable2.doc36.5 KB Supporting Information Table 2.
JCC_21642_sm_SuppTable3.doc39 KB Supporting Information Table 3.
JCC_21642_sm_SuppTable4.doc35 KB Supporting Information Table 4.

Please note: The publisher is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.

AutoDock and AutoDock Vina

Two docking methods have been developed in parallel, to respond to two different needs. Development began with AutoDock 2,3,5,21,22 , and it continues to be the platform for experimentation in docking methods. AutoDock Vina was developed more recently to fulfill the need for a turnkey docking method that doesn’t require extensive expert knowledge from users 1 . It is highly optimized to perform docking experiments using well-tested default methods. Both methods are currently freely available. AutoDock Vina is fast and effective for most systems, while AutoDock is available for systems that require additional methodological enhancements.

Both methods are designed to be generic computational docking tools, accepting coordinate files for receptor and ligand, and predicting optimal docked conformations. Typically, users start with receptor coordinates from crystallography or NMR spectroscopy, and ligand coordinates generated from SMILES strings or other methods.

Because the search methods are stochastic, a set of optimal docked conformations is predicted, then typically clustered spatially to analyze consistency of the results. Highly clustered results are an indication that the conformational search procedure is exhaustive enough to ensure coverage of the accessible conformational space. Due to the stochastic nature of the search, the method cannot ensure that a global minimum has been found. For this reason, it is important to use re-docking experiments with known complexes of similar conformational complexity to evaluate the docking protocol being used.

AutoDock and AutoDockVina currently employ several simplifications that affect the results that are obtained. The most significant simplification is the use of a rigid receptor. This approximation reduces the size of the conformational space, allowing it to be searched reliably, and reduces the computational effort of scoring each trial conformation. When applying these docking methods to a given receptor it is important to consider the possible effects of this limitation, and if the system includes significant receptor motion, a number of methods may be employed, including:

Using receptor structures taken from receptor-ligand complexes, where there is some expectation that the receptor is in the relevant conformation.

Docking to a collection of different receptor structures, which cover the expected range of flexibility in the receptor. These may be obtained from multiple structural determinations or simulation.

Use of explicit receptor side chain flexibility during docking, if information is available on relevant side chains (described in the protocol).

The scoring methods also employ a variety of simplifications that will affect the results. The AutoDock Vina scoring function is highly approximate, with spherically symmetric hydrogen bond potentials, implicit hydrogens, and no electrostatic contribution. It has been demonstrated to perform well with ligands with typical biological size and composition. The AutoDock force field includes physically based contributions, including a directional hydrogen-bonding term with explicit polar hydrogens, and electrostatics. If these contributions are important in a particular system, AutoDock would be the appropriate tool. In addition, the parameterization of the AutoDock scoring function is available to the user, to allow tuning for particular systems if desired. For instance, methods for incorporating explicit solvents and for predicting conformations of covalent complexes were developed by modifying the AutoDock potentials 23,24

SFCscore: Scoring functions for affinity prediction of protein–ligand complexes

Current address: Institute of Pharmacy and Food Chemistry, University of Würzburg, Am Hubland, D-97074 Würzburg, Germany.

Department of Pharmaceutical Chemistry, Philipps-Universität Marburg, D-35032 Marburg, Germany

Paul Sanschagrin's current address is Schrodinger, Inc., 120 W 45th St, New York, NY 10036, USA.

Department of Pharmaceutical Chemistry, Philipps-Universität Marburg, D-35032 Marburg, Germany

Current address: Sanofi-Aventis Deutschland GmbH, Chemical Sciences, Drug Design, Industriepark Höchst, D-65926 Frankfurt am Main, Germany.

Department of Pharmaceutical Chemistry, Philipps-Universität Marburg, D-35032 Marburg, Germany

Department of Pharmaceutical Chemistry, Philipps-Universität Marburg, Marbacher Weg 6, D-35032 Marburg, Germany===Search for more papers by this author

Department of Pharmaceutical Chemistry, Philipps-Universität Marburg, D-35032 Marburg, Germany

Current address: Institute of Pharmacy and Food Chemistry, University of Würzburg, Am Hubland, D-97074 Würzburg, Germany.

Department of Pharmaceutical Chemistry, Philipps-Universität Marburg, D-35032 Marburg, Germany

Paul Sanschagrin's current address is Schrodinger, Inc., 120 W 45th St, New York, NY 10036, USA.

Department of Pharmaceutical Chemistry, Philipps-Universität Marburg, D-35032 Marburg, Germany

Current address: Sanofi-Aventis Deutschland GmbH, Chemical Sciences, Drug Design, Industriepark Höchst, D-65926 Frankfurt am Main, Germany.

Department of Pharmaceutical Chemistry, Philipps-Universität Marburg, D-35032 Marburg, Germany

Department of Pharmaceutical Chemistry, Philipps-Universität Marburg, Marbacher Weg 6, D-35032 Marburg, Germany===Search for more papers by this author


Empirical scoring functions to calculate binding affinities of protein–ligand complexes have been calibrated based on experimental structure and affinity data collected from public and industrial sources. Public data were taken from the AffinDB database, whereas access to industrial data was gained through the Scoring Function Consortium (SFC), a collaborative effort with various pharmaceutical companies and the Cambridge Crystallographic Data Center. More than 850 complexes were obtained by the data collection procedure and subsequently used to setup different training sets for the parameterization of new scoring functions. Over 60 different descriptors were evaluated for all complexes, including terms accounting for interactions with and among aromatic ring systems as well as many surface-dependent terms. After exploratory correlation and regression analyses, stepwise variable selection procedures and systematic searches, the most suitable descriptors were chosen as variables to calibrate regression functions by means of multiple linear regression or partial least squares analysis. Eight different functions are presented herein. Cross-validated r 2 (Q 2 ) values of up to 0.72 and standard errors (sPRESS) generally below 1.15 pKi units suggest highly predictive functions. Extensive unbiased validation was carried out by testing the functions on large data sets from the PDBbind database as used by Wang et al. (J Chem Inf Comput Sci 200444:2114–2125) in a comparative analysis of other scoring functions. Superior performance of the SFCscore functions is observed in many cases, but the results also illustrate the need for further improvements. Proteins 2008. © 2008 Wiley-Liss, Inc.

The Supplementary Material referred to in this article can be found online at

Filename Description
prot22058-Supporting_Information_R2.pdf86.5 KB Supporting Information file prot22058-Supporting_Information_R2.pdf

Please note: The publisher is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.



The input to the COACH-D server can be either the amino acid sequence or the three-dimensional (3D) structure of a query protein. In addition, the users can submit their own ligand of interest as well. When the amino acid sequence of the protein is submitted, the I-TASSER Suite ( 4) will be used to generate one 3D structure model first. The structure is then used for the ligand-binding sites prediction and the subsequent molecular docking. We would like to point out that except ConCavity, other component algorithms in COACH-D were designed for monomer structures. Thus, only the first chain will be extracted when oligomers are submitted. We plan to extend the algorithm so that it can work for oligomers in future. An option is provided to protect the users’ personal data by checking on the checkbox of ‘Keep my results private’. A password is then assigned to the users to access the modeling results. In general, it takes 2–5 h to complete the modeling for a structure submission with ∼300 residues.


One predicted 3D structure model for the submission with amino acid sequence.

The top five protein–ligand binding pockets and the binding residues in each pocket.

The top five protein–ligand complex structures with the input ligand.

The top five protein–ligand complex structures with the ligands from the PDB template structures.

A summary of ligands that are possible to bind the protein.

All these modeling results are put together into a single tarball, which can be downloaded to a local computer for use. All ligand-binding poses from AutoDock Vina are also put into the tarball. A confidence score (c-score) in the range of [0, 1] is provided to judge the reliability of each prediction. Please refer to the COACH article for more information about the scoring function of c-score ( 7).

Figure 2 illustrates the modeling results for an example submission with a protein structure and a ligand. Explanation about the meaning of each column in the table can be viewed by hovering the mouse pointer over the corresponding question sign. For this example, the first prediction is highly confident, as reflected by the high c-score. A total of 12 residues were predicted to be involved in the ligand binding. The total number of templates used for making this prediction is 329 (i.e. the ‘Cluster size’ shown in the figure) and the one with the highest similarity to the query structure is from the PDB template 1lwxA. The representative ligand AZD (3′-Azido-3′-Deoxythymidine-5′-Diphosphate) was docked into the predicted binding pocket. The complex structures are visualized based on the 3Dmol library ( 20). The default view is for the complex structure built with the input ligand, which can be switched to other complex structures by clicking on the corresponding ‘View’ button under the ‘Pose t ’ and ‘Pose u ’ columns. All complex structures can be downloaded for further analysis and customized visualization with other molecular graphics systems. The docking energies for the complex structures are listed under the ‘Energy t ’ and the ‘Energy u ’ columns.

The output page for each submission to the COACH-D server. The visualization of the complex structure is obtained by the 3Dmol library ( 20). The protein structure is shown in grey surface and orange cartoon. The ligand binding poses are shown in magenta balls and sticks. The consensus binding residues are highlighted in blue sticks.

The output page for each submission to the COACH-D server. The visualization of the complex structure is obtained by the 3Dmol library ( 20). The protein structure is shown in grey surface and orange cartoon. The ligand binding poses are shown in magenta balls and sticks. The consensus binding residues are highlighted in blue sticks.

Access options

Get full journal access for 1 year

All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.

Get time limited or full article access on ReadCube.

All prices are NET prices.

Computational biotechnology: Prediction of competitive substrate inhibition of enzymes by buffer compounds with protein–ligand docking

In vitro enzymatic activity highly depends on the reaction medium. One of the most important parameters is the buffer used to keep the pH stable. The buffering compound prevents a severe pH-change and therefore a possible denaturation of the enzyme. However buffer agents can also have negative effects on the enzymatic activity, such as competitive substrate inhibition. We assess this effect with a computational approach based on a protein–ligand docking method and the HYDE scoring function. Our method predicts competitive binding of the buffer compound to the active site of the enzyme. Using data from literature and new experimental data, the procedure is evaluated on nine different enzymatic reactions. The method predicts buffer–enzyme interactions and is able to score these interactions with the correct trend of enzymatic activities. Using the new method, possible buffers can be selected or discarded prior to laboratory experiments.


► Buffers can inhibit enzyme activity by binding competitively at the active site. ► The inhibitory potential of buffers can be predicted with chemoinformatic methods. ► We apply protein–ligand docking, typically used in drug-design, to this problem. ► Evaluation of the method on nine enzymes shows prediction of the correct trend.