PON-P2

PON-P2 predicts the pathogenicity (harmfulness) of amino acid substitutions. It is a machine learning-based approach and utilizes amino acid features, Gene Ontology (GO) annotations, evolutionary conservation, and if available, annotations of functional sites. Note that, PON-P2 is NOT a meta-predictor. PON-P2 estimates the reliability of predictions and groups the variants into pathogenic, neutral and unknown classes. Read more

Performance of PON-P2 has been extensively tested. For details, see here. Performance of PON-P2 on additional datasets such as predictSNPSelected and SwissVarSelected datasets are also available here. PON-P2 has been shown to work also on cancer variants. PON-P2 predictions for amino acid substitutions in COSMIC (v68) and data published in Harmful somatic amino acid substitutions affect key pathways in cancers is publicly available here.

PON-P2 was the best performing method in a recent comparison and outperformed protein-specific predictors in 85% of the proteins (Riera et. al. 2016).

NEWS: PON-P2 prediction for total Human Proteome is available here.

'
Home News Instructions Disclaimer Useful Links Cancer variant predictions

PON-P2 API

PON-P2 can be accessed programmatically by using an application programming interface (API) or through Variant Effect Predictor (VEP) by using a plugin PON_P2. PON-P2 API allows users to submit variants in a Variant Call Format (VCF) file. Currently, PON-P2 predicts pathogenicity of variations in human genome reference version hg19. We have added a new feature to the API. The users can submit variations in the human genome reference hg38 which will be subsequently mapped to hg19 using liftover tool before making predictions.

Note: We are still testing our API service and VEP Plugin. Therefore, we kindly request our users to try them and let us know in case of any problem.

Identifier submission Genomic submission Sequence submission PON-P2
API
VEP
Plugin

Requirements for API:

1. Python

To use PON-P2 API, you need python installed in your computer. Python is pre-installed on most Linux distributions and MAC OS X. For Windows, you can download python from python.org.


2. SUDS client for SOAP

We use Simple Object Access Protocol (SOAP) to provide PON-P2 API. To connect with PON-P2 API, you need to install Suds web services client from here.


3. PON-P2 API client script

We have prepared a python script to submit variants in VCF file and receive PON-P2 predictions in a text file. You can download the script from here.
Download python script to run PON-P2 API

Variation submission

PON-P2 API accepts variant submission in VCF file only. The program requires at least the first line in the VCF file starting with '##fileformat=VCFv' and the first five columns of the 'header line' and the 'data lines'. When a VCF file is submitted, the program preprocesses the file and submits only single nucleotide substitutions for prediction. Very big files can take long time to upload and the process may break. Therefore, the VCF file is splitted into smaller files of approximately 20 MB and they are submitted to PON-P2.

To submit variants in a VCF file by using the API, run the following command:


python PON-P2_API_client.py inputFile.vcf hg19 output/directory/path


PON-P2_API_client.py: The downloaded python script from this page.
inputFile.vcf: Complete path to the input VCF file. The file must contain the first line in a VCF file. Read more about VCF file here or here.
genome assembly: Either hg19 or hg38.
output/directory/path: Complete path to the directory where the results should be saved.

PON-P2 results file

After submitting the variants in the VCF file, the jobs are queued in the PON-P2 queue system. When the predictions for the submitted jobs are completed, the results are saved in a file in the output directory. If the input file is too big, there will be more than one results file, one for each jobid and one combined result file. The combined result file contains two parts:

1) Information section
This section contains information about PON-P2. Each line in this section starts with #.

2) Prediction section
This section contains the results of PON-P2 prediction for variants leading to amino acid substitutions. The first line of this section contains the column names and the subsquent lines contain prediction for one variation each.