PON-Del predictor for short protein deletions


PON-Del is a predictor for short (1-10 amino acid) sequence retaining deletions. It was trained on an extensive set of variations and showed superior performance compared to other tools. After evaluating multiple frameworks, LightGBM was selected as the final model.

PON-Del Overview

Input deletions

Deletion type
Note: Highly recommend to use protein-level variant. If submit with transcript or genomic, we will use transvar to convert to protein level and it may miss some cases. Maximum 1000 lines allowed. Deletions starting at position 1 will return Pathogenic because deletion of the first amino acid (usually methionine) may prevent normal protein expression.
Prediction log

              

Prediction results

Note: Deletions starting at position 1 return Pathogenic because deletion of the first amino acid (usually methionine) may prevent normal protein expression and are beyond the scope of the current prediction model.
Column Descriptions:
  • Input: Original input coordinates in the format you provided
  • RefSeq Protein: Corresponding RefSeq protein identifier
  • Deletion Start/End: Protein positions of the deletion
  • Predicted Probability: Two-state model prediction score (0-1, higher = more pathogenic)
  • Predicted Label: Two-state predicted label: P = Pathogenic, B = Benign
  • Predicted Probability (CV): Three-state prediction score (0-1, higher = more pathogenic)
  • Predicted Probability Std (CV): Three-state prediction standard deviation
  • Predicted Probability P (CV): Three-state predictions P-value (P>0.05 = Uncertain)
  • Predicted Probability Label (CV): Three-state predicted label: P = Pathogenic, B = Benign, U = Uncertain

Single amino acid deletions

This page provides precalculated results for all possible single amino acid deletions in proteins coded by MANE transcripts.

Deletions starting at position 1 return Pathogenic because deletion of the first amino acid (usually methionine) may prevent normal protein expression.

You can search the data in several different ways.

Select identifier

1. Choose identifier type
2. Enter ID

Note:
PON-Del is developed based on MANE selectedRefSeq protein identifiers.

If you select a different identifier type, the corresponding RefSeq protein will be displayed.

The one-to-one mapping is defined by the MANE v1.4.

Predicted deletion pathogenicity

Heatmap for predicted pathogenicity

About

Datasets for PON-Del

The datasets used for training and testing the tool are available here: data_pondel.csv

Citing PON-Del

A manuscript describing the predictor has been submitted. In the meantime use URL for citation.

Contact

If you have any problems, please contact Haoyang (haoyang.zhang@med.lu.se) or Mauno (mauno.vihinen@med.lu.se).