PON-Tstab

Stability of biomolecules, especially of proteins, is of great interest and significance. Protein stability has been the major target for protein engineering, mainly to increase the stability, but sometimes also to destabilize proteins . Effects on stability are among the most common consequences for disease-related variations , thus this phenomenon is of interest for variation interpretation to explain the effects of harmful variants.

We performed a thorough check of the details in ProTherm and corrected numerous problems. In the end we had less than 50% of the original number of variants left. Out of these, 77% came from ProTherm, the rest are either corrected or new variants. With this high quality dataset we trained a novel machine learning predictor for amino acid substitution effects on stability and established a new baseline for variant stability prediction method performance.

'
Single variation prediction Multiple variations prediction About Disclaimer
how to use the predictor

Submission format for PON-Tstab

1. For single variation prediction, the following information should be supplied:
  • Protein name;
  • Variation, e.g. K21A;
  • Temperature, default value is 25 if not supplied;
  • PH value, the default value is 7 if not supplied;
  • Protein sequence, in FASTA format.
2. For multiple variations prediction, two files are required with the format as follows:
  • Protein sequences: (in FASTA format),e.g.

    >gi115114

    MKMSRLCLSVALLVLLGTLAASTPGCDTSNQAKAQRPDFCLEPPYTGPCKARIIRYFYNAKAGLCQTFVYGGCRAKRNNFKSAEDCMRTCGGAIGPWENL

  • Protein variations file with the information in each line: variation_number, variation, protein_name, temperature, PH. e.g.

    1 G71S gi115114 72.60 4.60

    2 R36A gi115114 39.00 7.00

    3 P37A gi115114 39.00 7.00

    4 D38A gi115114 39.00 7.00

Result file format

The result file contains following columns.
1. Variation id
2. Variation
3. Protein id
4. Predicted stability change
5. Predicted probability

Download PON-Tstab training and test dataset

Dataset used for training and testing PON-Tstab
PON-Tstab training and test data

Alignment of protein sequences present in the training and test data

The zipped file contains multiple sequence alignment of sequences similar to the proteins containing variants in the training and test data.
Sequence alignment of proteins in PON-Tstab data

How to cite?
We have submitted a manuscript describing the dataset and the tool for publication. Meanwhile, please use the following link cite PON-Tstab: http://structure.bmc.lu.se/PON-Tstab/