A benchmark database for variations

Home | Instructions | Datasets | Citing | Disclaimer |

2. Variation datasets affecting protein stability

These benchmark datasets with variations affecting stability of the protein have been collected from literature. For these datasets,  residue-residue level mappings from the structure entries in PDB to the sequence entries in the UniProt are available for download as excel tables below.

Dataset 1

This dataset contains 1784 mutations from 80 proteins with experimentally determined ??G values in ProTherm (ProTherm update Dec. 19, 2008). It consists of 1,154 positive cases of which 931 are destabilizing (??G =0.5 kcal/mol), 222 are stabilizing (??G = -0.5 kcal/mol), and 631 neutral cases (0. 5 kcal/mol= ??G = -0.5 kcal/mol).

Download: Dataset 1
Khan S, Vihinen M. Performance of protein stability predictors. Hum Mutat. 2010, 31(6):675-684.
Kumar MD, Bava KA, Gromiha MM, Prabakaran P, Kitajima K, Uedaira H, Sarai A: ProTherm and ProNIT: thermodynamic databases for proteins and protein-nucleic acid interactions. Nucleic Acids Res 2006, 34(Database issue):D204-206.   PUBMED  

Dataset 2

This dataset of 2156 variations was made from a list of 964 single mutations ( Guerois et al. 2002) and from a set of 2972 single variations obtained from the ProTherm database (Kumar et al., 2006) after filtering for duplicate entries. NMR determined structures are excluded from this dataset and only the average ??G value was given when several ??G values were present for a single variation.

Download: Dataset 2

Reference: Potapov V, Cohen M, Schreiber G. Assessing computational methods for predicting protein stability upon mutation: good on average but not in the details. Protein Eng Des Sel. 2009, 22(9):553-560.   PUBMED  

Dataset 3

This dataset is composed of two sub datasets.  One is the training dataset containing 339 mutants experimentally studied in nine proteins and the other is the test dataset containing 625 variants from ProTherm.

Training dataset: 339 variants from 9 proteins.  Download: Dataset 3(a)
Blind test dataset: 625 variants from 28 proteins. Download: Dataset 3(b)

Reference: Guerois R, Nielsen JE, Serrano L. Predicting changes in the stability of proteins and protein complexes: a study of more than 1000 mutations. J Mol Biol. 2002, 320(2):369-387.   PUBMED  

Dataset 4

This dataset is derived from the July 2003 release of ProTherm and contains two sub datasets. The first one, S1615, was used for training/testing the neural network system. The second one, S388, was used as the test and contains 388 variations collected only at physiological conditions. S388 is a subset of S1615. Only single variations with ??G in Protherm and structures deposited in PDB are present in the datasets.

  1. Training dataset: S1615 - 1615 variants from 42 proteins. Download: Dataset 4 (a)
  2. Test dataset - S388 (subset of the first) - 338 variants from 17 proteins. Download: Dataset 4(b)

References: Capriotti E, Fariselli P, Casadio R. A neural-network-based method for predicting protein stability changes upon single point mutations. Bioinformatics. 2004, 20 Suppl 1:i63-68.   PUBMED  

Dataset 5

This dataset consists of stability affecting variants taken from ProTherm database (Updated: February 22, 2013). The correctness and quality of each variant was checked manually. Several variants from the ProTherm database failed the quality criteria and were excluded. In total, the dataset contains 1,564 variations from 99 proteins, 77% of which came from ProTherm. The remaining variants have been corrected from the versions present in ProTherm or are new additions. This dataset has been used to train and test a novel tool, PON-Tstab, for predicting effect of variant on stability.

  1. Training dataset: Dataset used for training and testing PON-TstabPON-Tstab dataset

References: A manuscript describing the dataset and the tool, PON-Tstab, has been submitted for publication. In, the mean time, please contact the authors for more detail and use the following link for citation. http://structure.bmc.lu.se/PON-Tstab/