A benchmark database for variations


Home | Instructions | Datasets | Citing | Disclaimer |


Variation datasets affecting mRNA splice sites

This dataset contains 13 MLH1 and 6 MSH2 gene variants identified by DHPLC and sequencing of MLH1 and MSH2 exonic regions in patients totaling to 19 variants.  The variation positions in this dataset have been mapped to RefSeq mRNA and RefSeq protein accessions when applicable.

Download: mlh1 msh2 variants
Reference: Arnold S, Buchanan DD, Barker M, Jaskowski L, Walsh MD, Birney G, Woods MO, Hopper JL, Jenkins MA, Brown MA et al. Classifying MLH1 and MSH2 variants using bioinformatic prediction, splicing assays, segregation, and tumor characteristics.  Hum. Mutat. 2009, 30, 757-770.   PUBMED  


Links to the DBASS3 and DBASS5 databases

DBASS3 is a database with information on the human disease-causing mutation induced aberrant 3' splice sites. This database contains 307 (152 in exons and 155 in introns). DBASS5 is a similar database, but with information on the human disease-causing mutation induced aberrant 5' splice sites. It contains 577 records (277 in exons and 300 in introns). Both of the databases are regularly updated and publicly accessible.

http://www.som.soton.ac.uk/research/geneticsdiv/dbass5/
http://www.som.soton.ac.uk/research/geneticsdiv/dbass3/

References

Buratti E, Chivers M, Kralovicova J, Romano M, Baralle M, Krainer AR, Vorechovsky I:Aberrant 5' splice sites in human disease genes: mutation pattern, nucleotide structure and comparison of computational tools that predict their utilization. Nucleic Acids Res. 2007, 35(13):4250-4263.   PUBMED  

Vorechovsky I. Aberrant 3' splice sites in human disease genes: mutation pattern, nucleotide structure and comparison of computational tools that predict their utilization. Nucleic Acids Res. 2006, 34(16):4630-4641.   PUBMED  


In silico prediction of splice-altering single nucleotide variants in the human genome

In silico tools have been developed to predict variants that may have an impact on pre-mRNA splicing. The major limitation of the application of these tools to basic research and clinical practice is the difficulty in interpreting the output. Most tools only predict potential splice sites given a DNA sequence without measuring splicing signal changes caused by a variant. Another limitation is the lack of large-scale evaluation studies of these tools. We compared eight in silico tools on 2959 single nucleotide variants within splicing consensus regions (scSNVs) using receiver operating characteristic analysis. The Position Weight Matrix model and MaxEntScan outperformed other methods. Two ensemble learning methods, adaptive boosting and random forests, were used to construct models that take advantage of individual methods. Both models further improved prediction, with outputs of directly interpretable prediction scores. We applied our ensemble scores to scSNVs from the Catalogue of Somatic Mutations in Cancer database. Analysis showed that predicted splice-altering scSNVs are enriched in recurrent scSNVs and known cancer genes. We pre-computed our ensemble scores for all potential scSNVs across the human genome, providing a whole genome level resource for identifying splice-altering scSNVs discovered from large-scale sequencing studies.

Download: Supplementary_Table_S1-S6.xlsx
Reference: Jian, X, Boerwinkle, E, Liu, X, 2014. In silico prediction of splice-altering single nucleotide variants in the human genome. Nucleic Acids Res. 42: 13534-13544  PUBMED  


Guidelines for splicing analysis in molecular diagnosis derived from a set of 327 combined in silico/in vitro studies on BRCA1 and BRCA2 variants

Assessing the impact of variants of unknown significance (VUS) on splicing is a key issue in molecular diagnosis. This impact can be predicted by in silico tools, but proper evaluation and user guidelines are lacking. To fill this gap, we embarked upon the largest BRCA1 and BRCA2 splice study to date by testing 272 VUSs (327 analyses) within the BRCA splice network of Unicancer. All these VUSs were analyzed by using six tools (splice site prediction by neural network, splice site finder (SSF), MaxEntScan (MES), ESE finder, relative enhancer and silencer classification by unanimous enrichment, and human splicing finder) and the predictions obtained were compared with transcript analysis results. Combining MES and SSF gave 96% sensitivity and 83% specificity for VUSs occurring in the vicinity of consensus splice sites, that is, the surrounding 11 and 14 bases for the 5' and 3' sites, respectively. This study was also an opportunity to define guidelines for transcript analysis along with a tentative classification of splice variants. The guidelines drawn from this large series should be useful for the whole community, particularly in the context of growing sequencing capacities that require robust pipelines for variant interpretation.

Download: humu_22101_sm_SuppInfo.pdf
Reference: Houdayer, C et al., 2012. Guidelines for splicing analysis in molecular diagnosis derived from a set of 327 combined in silico/in vitro studies on BRCA1 and BRCA2 variants. Hum Mutat. 33: 1228-1238  PUBMED