A benchmark database for variations

Home | Instructions | Datasets | Citing | Disclaimer |

11. Dataset for severe and non-severe disease-causing variations


This dataset consists of amino acid substitutions that lead to severe or non-severe disease phenotypes. The variations were collected from the published literature and several databases. The datasets were used to train PON-PS, a method for predicting the severity due to amino acid substitutions ( The training dataset consists of 885 mild, 463 moderate, and 1179 severe disease-causing amino acid substitutions from 83 proteins. The test dataset consists of 143 mild, 38 moderate, and 220 severe disease-causing amino acid substitutions from 8 proteins.

  1. Download: PON-PS training and test datasets

References: Niroula A, Vihinen M. 2017. Predicting severity of disease-causing variants. Hum Mutat 38(4):357-364.  PUBMED