PON-P2

PON-P2 predicts the pathogenicity (harmfulness) of amino acid substitutions. It is a machine learning-based approach and utilizes amino acid features, Gene Ontology (GO) annotations, evolutionary conservation, and if available, annotations of functional sites. Note that, PON-P2 is NOT a meta-predictor. PON-P2 estimates the reliability of predictions and groups the variants into pathogenic, neutral and unknown classes. Read more

Performance of PON-P2 has been extensively tested. For details, see here. Performance of PON-P2 on additional datasets such as predictSNPSelected and SwissVarSelected datasets are also available here. PON-P2 has been shown to work also on cancer variants. PON-P2 predictions for amino acid substitutions in COSMIC (v68) and data published in Harmful somatic amino acid substitutions affect key pathways in cancers is publicly available here.

PON-P2 was the best performing method in a recent comparison and outperformed protein-specific predictors in 85% of the proteins (Riera et. al. 2016).

NEWS: PON-P2 prediction for total Human Proteome is available here.

'
Home News Instructions Disclaimer Useful Links Cancer variant predictions

Instructions for submitting queries

PON-P2 allows users to submit queries in three formats.

1) Identifier submission
2) Genomic submission
3) Sequence submission

Identifier submission
Protein or gene identifier(s) and variation(s) are required in fasta-like format. Ensembl gene identifier, NCBI gene ID and UniProtKB/Swiss-Prot accession can be used as identifiers. When using Ensembl or NCBI gene identifier(s), the variation(s) have to be mapped to the longest isoform of the gene. The identifier should be preceded by greater than sign (>). Only one variation should be placed in one line. Multiple variations in a single protein or in multiple proteins can be submitted in a single query. Alternatively, a file containing identifier(s) and variation(s) in the same format can be uploaded.
Example:
>ENSG00000165816 #Ensembl gene identifier
I75F #reference amino acid,position in the sequence(1 based),variant amino acid
V366M
>Q16518 #UniProtKB/Swiss-Prot accession identifier
P363T
R44Q
>151194 #NCBI gene identifier
T9N
P111Q


The variations in UniProtKB/Swiss-Prot accession and the longest isoform of NCBI gene are mapped to longest isoform of corresponding Ensembl gene. If the variations could not be mapped, they are reported in error log file and we recomend users to submit these variations again using Sequence submission service.

Genomic submission
This format requires variation(s) at genomic level. The users are required to submit chromosome number, chromosome location, strand, reference nucleotide and variant nucleotide in the format mentioned in the example below. Each line should contain only one variation. Users can either paste the variation(s) in the text box or upload a file containing the variation(s).
Example:
3:49044874,+1,C,T
11:108368994,-1,G,C
16:11363014,-1,A,G


Users can also upload a Variant Call Format (VCF) file directly using VCF file submission link. PON-P2 filters non-synonymous variations from VCF file and makes predictions for them.

Note: The chromosome location and reference alleles have to be provided in reference to the Genome Reference Consortium human genome (build 37) (GRCh37).

Sequence submission
This format requires users to submit fasta-format amino acid sequence(s) and variation(s) corresponding to the sequence(s). Each sequence should have a header line starting with greater than sign (>) followed by description. The sequence in upper-case characters follows the header line. No characters except the universal 20 amino acid codes are accepted in the sequence(s). The variation(s) corresponding to a sequence should contain the same header line as the sequence. Variation(s) follow the header line and only one variation is allowed per line. The sequence(s) and variation(s) can be pasted in the correponding text-boxes or separate files containing sequence(s) and variation(s) can be submitted.
Example sequences:
>ADA_HUMAN
MAQTPAFDKPKVELHVHLDGSIKPETILYYGRRRGIALPANTAEGLLNVIGMDKPLTLPD
FLAKFDYYMPAIAGCREAIKRIAYEFVEMKAKEGVVYVEVRYSPHLLANSKVEPIPWNQA
EGDLTPDEVVALVGQGLQEGERDFGVKARSILCCMRHQPNWSPKVVELCKKYQQQTVVAI
DLAGDETIPGSSLLPGHVQAYQEAVKSGIHRTVHAGEVGSAEVVKEAVDILKTERLGHGY
HTLEDQALYNRLRQENMHFEICPWSSYLTGAWKPDTEHAVIRLKNDQANYSLNTDDPLIF
KSTLDTDYQMTKRDMGFTEEEFKRLNINAAKSSFLPEDEKRELLDLLYKAYGMPPSASAG
QNL
>Retinal pigment
MSIQVEHPAGGYKKLFETVEELSSPLTAHVTGRIPLWLTGSLLRCGPGLFEVGSEPFYHL
FDGQALLHKFDFKEGHVTYHRRFIRTDAYVRAMTEKRIVITEFGTCAFPDPCKNIFSRFF
SYFRGVEVTDNALVNVYPVGEDYYACTETNFITKINPETLETIKQVDLCNYVSVNGATAH
PHIENDGTVYNIGNCFGKNFSIAYNIVKIPPLQADKEDPISKSEIVVQFPCSDRFKPSYV
HSFGLTPNYIVFVETPVKINLFKFLSSWSLWGANYMDCFESNETMGVWLHIADKKRKKYL
NNKYRTSPFNLFHHINTYEDNGFLIVDLCCWKGFEFVYNYLYLANLRENWEEVKKNARKA
PQPEVRRYVLPLNIDKADTGKNLVTLPNTTATAILCSDETIWLEPEVLFSGPRQAFEFPQ
INYQKYCGKPYTYAYGLGLNHFVPDRLCKLNVKTKETWVWQEPDSYPSEPIFVSHPDALE
EDDGVVLSVVVSPGAGQKPAYLLILNAKDLSEVARAEVEINIPVTFHGLFKKS


Variation examples:
>ADA_HUMAN
R101H #reference amino acid,position in the sequence(1 based),variant amino acid
R101L
S291L
>Retinal pigment
G75R
R97P

Email:
Users are required to submit a valid email address where the results will be sent when they are ready.

How to cite?
Niroula A, Urolagin S, Vihinen M (2015) PON-P2: Prediction Method for Fast and Reliable Identification of Harmful Variants. PLoS ONE 10(2):e0117380.doi:10.1371/journal.pone.0117380

Articles citing PON-P2
List of articles citing PON-P2

Performance of PON-P2


Performance of PON-P2 on additional datasets

We estimated the performance using some additional data. We predicted the variations in predictSNPSelected and SwissVarSelected described in Grimm et al.. The datasets are available in VariBench.


predictSNPSelected
TP TN FP FN Unknown PPV NPV Sensitivity Specificity Accuracy MCC
All variations predicted by PON-P2 5,124 3,173 345 590 6,445 0.94 0.84 0.90 0.90 0.90 0.79
Variations not in PON-P2 training data 5,116 3,173 341 590 5,575 0.94 0.84 0.90 0.90 0.90 0.79
Variations in proteins not in PON-P2 training data 1,385 1,243 186 210 2,126 0.88 0.86 0.87 0.87 0.87 0.74
SwissVarSelected
TP TN FP FN Unknown PPV NPV Sensitivity Specificity Accuracy MCC
All variations predicted by PON-P2 1,566 3,412 818 773 5,221 0.66 0.82 0.67 0.81 0.74 0.47
Variations not in PON-P2 training data 1,551 3,194 818 773 5,036 0.65 0.81 0.67 0.80 0.74 0.46
Variations in proteins not in PON-P2 training data 737 1,751 417 414 2,596 0.64 0.81 0.64 0.81 0.73 0.45

Note: The datasets were used to evaluate the performance of MutationTaster2, PolyPhen-2, Mutation Assessor, CADD, SIFT, LRT, FatHMM-U and FatHMM-W by Grimm et al.. The performance scores of the methods are presented in Supplementary Table S1. Accuracy and MCC for PON-P2 are higher than the compared methods even for variations in proteins that were not present in PON-P2 training dataset (circularity-free dataset).

If you have any queries, please feel free to contact us.