| Home | Instructions | Datasets | Citing | Disclaimer | |
DATASET 1
Datasets for residue side chain clashes. 7796 variations PDB in F1 and 350 variations from 5 test datasets in F2.
Reference: Čalyševa, J., & Vihinen, M. (2017). PON-SC - program for identifying steric clashes caused by amino acid substitutions. BMC bioinformatics, 18(1), 531. doi:10.1186/s12859-017-1947-7. PUBMED
DATASET 2
Semi-automatically derived and hand-curated collection of proteins, which possess an amino acid that has been changed by a SNV and 3D atomic coordinates are available in the PDB. F1 contains a benchmark dataset of 374 unique human variants, each corresponding to a different PDB entry.
Reference: Bhattacharya, R., Rose, P. W., Burley, S. K., & Prlić, A. (2017). Impact of genetic variation on three dimensional structure and function of proteins. PloS one, 12(3), e0171355. doi:10.1371/journal.pone.0171355. PUBMED
DATASET 3
1,965 disease-causing and 2,134 neutral variants (Ittisoponpisan et al., 2019)
Reference: Bhattacharya, R., Rose, P. W., Burley, S. K., & Prlić, A. (2017). Can Predicted Protein 3D Structures Provide Reliable Insights into whether Missense Variants Are Disease Associated? J Mol Biol, 431(11), e0171355. doi:10.1016/j.jmb.2019.04.009. PUBMED DATASET 46025 disease-associated and 4536 neutral variants (Gao et al., 2015)
Reference: Mu Gao, Hongyi Zhou, Jeffrey Skolnick (2015). Insights into Disease-Associated Mutations in the Human Proteome through Protein Structural Analysis Structure, 23(7):1362-9. doi:10.1016/j.str.2015.03.028 PUBMED DATASET 5Membrane protein datasets with a total of 2058 variants in F1.
Reference: Tommaso Orioli, Mauno Vihinen, Benchmarking subcellular localization and variant tolerance predictors on membrane proteins, BMC Genomics;20(Suppl 8):547. doi: 10.1186/s12864-019-5865-0. PUBMED DATASET 2Training dataset 485 variants, 347 pathogenic, 138 bening Test dataset 54 variants, 38 pathogenic, 16 benign
Reference: Douglas E V Pires, Carlos H M Rodrigues, David B Ascher, mCSM-membrane: predicting the effects of mutations on transmembrane proteins, Nucleic Acids Res;48(W1):W147-W153. doi: 10.1093/nar/gkaa416. PUBMEDDATASET 3
2624 pathogenic and 196 705 non-pathogenic variants used to train TMSNP a transmembrane protein variant predictor (Garcia-Recio et al., 2021)
Reference: Tommaso Orioli, Mauno Vihinen, Benchmarking subcellular localization and variant tolerance predictors on membrane proteins, BMC Genomics;20(Suppl 8):547. doi: 10.1186/s12864-019-5865-0. PUBMED
DATASET 4
Reference: Fang Ge, Yi-Heng Zhu, Jian Xu, Arif Muhammad, Jiangning Song, Dong-Jun Yu, MutTMPredictor: Robust and accurate cascade XGBoost classifier for prediction of mutations in transmembrane proteins, Comput Struct Biotechnol J. 2021 Nov 19;19:6400-6416. doi: 10.1016/j.csbj.2021.11.024. PUBMED
Last updated: 2022-02-25 by Niloofar Shirvanizadeh.