PON-Del predictor for short protein deletions


PON-Del is a predictor for short (1-10 amino acid) sequence retaining deletions. It was trained on an extensive set of variations and showed superior performance compared to other tools. After evaluating multiple frameworks, LightGBM was selected as the final model.

PON-Del Overview

Input deletions

Deletion type
Note: Highly recommend to use protein-level variant. If submit with transcript or genomic, we will use transvar to convert to protein level and it may miss some cases. Maximum 1000 lines allowed. Deletions starting at position 1 will return Pathogenic because deletion of the first amino acid (usually methionine) may prevent normal protein expression.
Prediction log

              

Prediction results

Column Descriptions:
  • Input: Original input coordinates in the format you provided
  • RefSeq Protein: Corresponding RefSeq protein identifier
  • Deletion Start/End: Protein positions of the deletion
  • Probability: Prediction score (0-1, higher = more pathogenic)
  • Prediction: P = Pathogenic (probability > 0.5), B = Benign (probability > 0.5), U = Uncertain (bootstrap P>0.05).
  • P value: Bootstrap P-value (P>0.05 = Uncertain)
Note: Deletions starting at position 1 return Pathogenic because deletion of the first amino acid (usually methionine) may prevent normal protein expression and are beyond the scope of the current prediction model. If you need binary prediction, just set probability > 0.5 as pathogenic otherwise benign, neglect the p.

Single amino acid deletions

This page provides precalculated results for all possible single amino acid deletions in proteins coded by MANE transcripts.

Deletions starting at position 1 return Pathogenic because deletion of the first amino acid (usually methionine) may prevent normal protein expression.

You can search the data in several different ways.

Select identifier

1. Choose identifier type
2. Enter ID

Note:
PON-Del is developed based on MANE selectedRefSeq protein identifiers.

If you select a different identifier type, the corresponding RefSeq protein will be displayed.

The one-to-one mapping is defined by the MANE v1.4.

Predicted deletion pathogenicity

Heatmap for predicted pathogenicity

About

Data and Code

The datasets used for training and testing the tool are available here .

The training data and code are available on GitHub .

Citing PON-Del

A manuscript describing the predictor has been submitted. In the meantime use URL for citation.

Contact

The tool was developed by Haoyang Zhang and Muhammad Kabir, supervised by Mauno Vihinen.

If you have any problems, please contact Haoyang (haoyang.zhang@med.lu.se) or Mauno (mauno.vihinen@med.lu.se).