Protein Structure and
Bioinformatics Group

Prof. Mauno Vihinen

Lund University

News from PSB

2019-08-02 New publication:

Ekong R, Vihinen M, 2019
Checklist for gene/disease-specific variation database curators to enable ethical data management.
Hum Mutat Accepted Articles. doi: 10.1002/humu.23881

Databases with variant and phenotype information are essential for advancing research and improving the health and welfare of individuals. These resources require data to be collected, curated and shared among relevant specialties to maximise impact. The increasing generation of data which must be shared both nationally and globally for maximal effect presents important ethical and privacy concerns. Database curators need to ensure that their work conform to acceptable ethical standards. A Working Group of the Human Variome Project had the task of updating and streamlining ethical guidelines for locus‐specific/gene variant database curators. In this article, we present practical and achievable steps which should assist database curators in carrying out their responsibilities within acceptable ethical norms.

2018-11-30 New publication:

Schaafsma GCP, Vihinen M, 2018
Representativeness of variation benchmark datasets.
BMC Bioinformatics 2018 19:461. doi: 10.1186/s12859-018-2478-6

Benchmark datasets are essential for both method development and performance assessment. These datasets have numerous requirements, representativeness being one. In the case of variant tolerance/pathogenicity prediction, representativeness means that the dataset covers the space of variations and their effects.
We performed the first analysis of the representativeness of variation benchmark datasets. We used statistical approaches to investigate how proteins in the benchmark datasets were representative for the entire human protein universe. We investigated the distributions of variants in chromosomes, protein structures, CATH domains and classes, Pfam protein families, Enzyme Commission (EC) classifications and Gene Ontology annotations in 24 datasets that have been used for training and testing variant tolerance prediction methods. All the datasets were available in VariBench or VariSNP databases. We tested also whether the pathogenic variant datasets contained neutral variants defined as those that have high minor allele frequency in the ExAC database. The distributions of variants over the chromosomes and proteins varied greatly between the datasets.
None of the datasets was found to be well representative. Many of the tested datasets had quite good coverage of the different protein characteristics. Dataset size correlates to representativeness but only weakly to the performance of methods trained on them. The results imply that dataset representativeness is an important factor and should be taken into account in predictor development and testing.

2018-08-24 New publication:

Teku GN, Vihinen M, 2018
Pan-cancer analysis of neoepitopes.
Sci Rep. 2018 Aug 24;8(1):12735. doi: 10.1038/s41598-018-30724-y

Somatic variations are frequent and important drivers in cancers. Amino acid substitutions can yield neoantigens that are detected by the immune system. Neoantigens can lead to immune response and tumor rejection. Although neoantigen load and occurrence have been widely studied, a detailed pan-cancer analysis of the occurrence and characterization of neoepitopes is missing. We investigated the proteome-wide amino acid substitutions in 8-, 9-, 10-, and 11-mer peptides in 30 cancer types with the NetMHC 4.0 software. 11,316,078 (0.24%) of the predicted 8-, 9-, 10-, and 11-mer peptides were highly likely neoepitope candidates and were derived from 95.44% of human proteins. Binding affinity to MHC molecules is just one of the many epitope features. The most likely epitopes are those which are detected by several MHCs and of several peptide lengths. 9-mer peptides are the most common among the high binding neoantigens. 0.17% of all variants yield more than 100 neoepitopes and are considered as the best candidates for any application. Amino acid distributions indicate that variants at all positions in neoepitopes of any length are, on average, more hydrophobic than the wild-type residues. We characterized properties of neoepitopes in 30 cancer types and estimated the likely numbers of tumor-derived epitopes that could induce an immune response. We found that amino acid distributions, at all positions in neoepitopes of all lengths, contain more hydrophobic residues than the wild-type sequences implying that the hydropathy nature of neoepitopes is an important property. The neoepitope characteristics can be employed for various applications including targeted cancer vaccine development for precision medicine.

2018-08-02 New publication:

Teku GN, Vihinen M, 2018
Simulation of the Dynamics of Primary Immunodeficiencies in B Cells.
Front Immunol. 2018 Aug 2;9:1785. doi: 10.3389/fimmu.2018.01785. eCollection 2018

Primary immunodeficiencies (PIDs) are a group of over 300 hereditary, heterogeneous, and mainly rare disorders that affect the immune system. Various aspects of immune system and PID proteins and genes have been investigated and facilitate systems biological studies of effects of PIDs on B cell physiology and response. We reconstructed a B cell network model based on data for the core B cell receptor activation and response processes and performed semi-quantitative dynamic simulations for normal and B cell PID failure modes. The results for several knockout simulations correspond to previously reported molecular studies and reveal novel mechanisms for PIDs. The simulations for CD21, CD40, LYN, MS4A1, ORAI1, PLCG2, PTPRC, and STIM1 indicated profound changes to major transcription factor signaling and to the network. Significant effects were observed also in the BCL10, BLNK, BTK, loss-of-function CARD11, IKKB, MALT1, and NEMO, simulations whereas only minor effects were detected for PIDs that are caused by constitutively active proteins (PI3K, gain-of-function CARD11, KRAS, and NFKBIA). This study revealed the underlying dynamics of PID diseases, confirms previous observations, and identifies novel candidates for PID diagnostics and therapy.

2018-01-01 New publication:

Yang Y et al., 2018
NDDVD: an integrated and manually curated Neurodegenerative Diseases Variation Database.
Database (Oxford). 2018 Jan 1;2018. doi: 10.1093/database/bay018

Neurodegenerative diseases (NDDs) are associated with genetic variations including point substitutions, copy number alterations, insertions and deletions. At present, a few genetic variation repositories for some individual NDDs have been created, however, these databases are needed to be integrated and expanded to all the NDDs for systems biological investigation. We here build a relational database termed as NDDVD to integrate all the variations of NDDs using Leiden Open Variation Database (LOVD) platform. The items in the NDDVD are collected manually from PubMed or extracted from the existed variation databases. The cross-disease database includes over 6374 genetic variations of 289 genes associated with 37 different NDDs. The patterns, conservations and biological functions for variations in different NDDs are statistically compared and a user-friendly interface is provided for NDDVD at:

2017-02-16 Update VariSNP datasets:

The VariSNP benchmark datasets were updated using the dbSNP xml datasets from the NCBI ftp website (NCBI), which were last modified in November 2016 and contain variants from update build 149 (GRCh38p7)

2017-02-06 Update VariO:

Variation Ontology VariO was updated to version 1.05, with some minor changes and corrections. Some terms were removed:
VariO:0166 gene structure variation
VariO:0167 gene fusion
VariO:0168 gene deletion
VariO:0169 complete gene deletion
VariO:0170 partial gene deletion
VariO:0205 uncharacterized chromosomal variation
Some new terms were introduced:
VariO:0166 antigen receptor gene rearrangement
VariO:0167 effect on DNA form
VariO:0168 somatic hypermutation
VariO:0169 class switch recombination
VariO:0170 antigen receptor gene conversion
VariO:0205 dinucleotide expansion
VariO:0378 DNA transposon
VariO:0379 LINE
VariO:0380 SINE
VariO:0388 LTR
VariO:0389 pentanucleotide expansion
VariO:0390 effect on DNA double helix
VariO:0391 plasmid
VariO:0392 insertion sequence
VariO:0393 effect on DNA pseudoknot
VariO:0394 effect on DNA cruciform
VariO:0395 self-cleavage by ribozyme activity
VariO:0403 group I intron
VariO:0404 group II intron
VariO:0405 dicentric translocation
VariO:0406 dicentric isoduplication
VariO:0407 edited DNA
VariO:0408 RNA chimera
VariO:0409 frameshifted RNA
VariO:0410 spliced RNA
VariO:0411 alternatively spliced RNA
VariO:0412 effect on catalytic DNA activity
VariO:0413 effect on A DNA
VariO:0414 effect on B DNA
VariO:0415 effect on C DNA
VariO:0416 effect on L DNA
VariO:0417 effect on S DNA
VariO:0418 effect on D DNA
VariO:0419 effect on H DNA
VariO:0420 effect of four-stranded DNA
VariO:0421 effect on Z DNA
VariO:0422 effect on intramolecular DNA triple helix
VariO:0423 effect on intermolecular DNA triple helix
VariO:0424 effect on DNA-RNA hybrid
VariO:0425 effect on RNA triplex helix
VariO:0426 effect on four-stranded RNA
VariO:0427 type of chromosomal amplification
VariO:0428 genome variation
VariO:0429 complex genomic variation
VariO:0430 nucleotide expansion
VariO:0431 effect on R loop
VariO:0432 effect on T loop
VariO:0433 effect on D loop

2017-01-10 New publication:

Niroula A, Vihinen M, 2017
Predicting severity of disease-causing variants.
Hum Mutat 38: 357-364 doi: 10.1002/humu.23173

Most diseases, including those of genetic origin, express a continuum of severity. Clinical interventions for numerous diseases are based on the severity of the phenotype. Predicting severity due to genetic variants could facilitate diagnosis and choice of therapy. Although computational predictions have been used as evidence for classifying the disease-relevance of genetic variants, special tools for predicting disease severity in large scale are missing. Here, we manually curated a dataset containing variants leading to severe and less severe phenotypes and studied the abilities of variation impact predictors to distinguish between them. We found that these tools cannot separate the two groups of variants. Then, we developed a novel machine learning-based method, PON-PS (, for classification of amino acid substitutions associated with benign, severe, and less severe phenotypes. We tested the method using an independent test dataset and variants in four additional proteins. For distinguishing severe and non-severe variants, PON-PS showed an accuracy of 61% in the test dataset which is higher than for existing tolerance prediction methods. PON-PS is the first generic tool developed for this task. The tool can be used together with other evidence for improving diagnosis and prognosis and for prioritization of preventive interventions, clinical monitoring, and molecular tests.

2016-11-10 New publication:

Vihinen M, 2016
How to define pathogenicity, health and disease?
Hum Mutat 38: 129-136 doi 10.1002/humu.23144

Scientific and clinical communities produce ever increasing amounts of data and details about health and disease. Our ability to understand and utilize this information is limited due to imprecise language and lack of well-defined concepts. This problem involves also the principal concepts of health, disease and pathogenicity. Here, a systematic model is presented for pathogenicity, as well as for health and disease. It has three components: extent, modulation and severity, which jointly define the continuum of pathogenicity. The model is population based, and once implemented can be used for numerous purposes such as diagnosis, patient stratification, prognosis, finding phenotype-genotype –correlations or explaining adverse drug reactions. The new model has several benefits including health economy by allowing evidence based personalized/precision medicine.

2017-01-09 New publication:

Viennas E, Komianou A, Mizzi C, Stojiljkovic M, Mitropoulou C, Muilu J, Vihinen M, Grypioti P, Papadaki S, Pavlidis C, Zukic B, Katsila T, van der Spek PJ, Pavlovic S, Tzimas G, Patrinos GP. 2017
Expanded national database collection and data coverage in the FINDbase worldwide database for clinically relevant genomic variation allele frequencies.
Nucleic Acids Res. 45: D846-D853

FINDbase ( is a comprehensive data repository that records the prevalence of clinically relevant genomic variants in various populations worldwide, such as pathogenic variants leading mostly to monogenic disorders and pharmacogenomics biomarkers. The database also records the incidence of rare genetic diseases in various populations, all in well-distinct data modules. Here, we report extensive data content updates in all data modules, with direct implications to clinical pharmacogenomics. Also, we report significant new developments in FINDbase, namely (i) the release of a new version of the ETHNOS software that catalyzes development curation of national/ethnic genetic databases, (ii) the migration of all FINDbase data content into 90 distinct national/ethnic mutation databases, all built around Microsoft's PivotViewer ( software (iii) new data visualization tools and (iv) the interrelation of FINDbase with DruGeVar database with direct implications in clinical pharmacogenomics. The abovementioned updates further enhance the impact of FINDbase, as a key resource for Genomic Medicine applications.

2017-01-09 New publication:

Hamasy A, Wang Q, Blomberg KE, Mohammad DK, Yu L, Vihinen M, Berglöf A, Smith CI. 2017
Substitution scanning identifies a novel, catalytically active ibrutinib-resistant BTK cysteine 481 to threonine (C481T) variant.
Leukemia 31: 177-185

Irreversible Bruton tyrosine kinase (BTK) inhibitors, ibrutinib and acalabrutinib have demonstrated remarkable clinical responses in multiple B-cell malignancies. Acquired resistance has been identified in a sub-population of patients in which mutations affecting BTK predominantly substitute cysteine 481 in the kinase domain for catalytically active serine, thereby ablating covalent binding of inhibitors. Activating substitutions in the BTK substrate phospholipase Cγ2 (PLCγ2) instead confers resistance independent of BTK. Herein, we generated all six possible amino acid substitutions due to single nucleotide alterations for the cysteine 481 codon, in addition to threonine, requiring two nucleotide substitutions, and performed functional analysis. Replacement by arginine, phenylalanine, tryptophan or tyrosine completely inactivated the catalytic activity, whereas substitution with glycine caused severe impairment. BTK with threonine replacement was catalytically active, similar to substitution with serine. We identify three potential ibrutinib resistance scenarios for cysteine 481 replacement: (1) Serine, being catalytically active and therefore predominating among patients. (2) Threonine, also being catalytically active, but predicted to be scarce, because two nucleotide changes are needed. (3) As BTK variants replaced with other residues are catalytically inactive, they presumably need compensatory mutations, therefore being very scarce. Glycine and tryptophan variants were not yet reported but likely also provide resistance.

2016-11-02 New publication:

Vihinen M, 2016
Establishment of an international database for genetic variants in esophageal cancer.
Ann NY Acad Sci 1381: 45-49

The establishment of a database has been suggested in order to collect, organize, and distribute genetic information about esophageal cancer. The World Organization for Specialized Studies on Diseases of the Esophagus and the Human Variome Project will be in charge of a central database of information about esophageal cancer-related variations from publications, databases, and laboratories; in addition to genetic details, clinical parameters will also be included. The aim will be to get all the central players in research, clinical, and commercial laboratories to contribute. The database will follow established recommendations and guidelines. The database will require a team of dedicated curators with different backgrounds. Numerous layers of systematics will be applied to facilitate computational analyses. The data items will be extensively integrated with other information sources. The database will be distributed as open access to ensure exchange of the data with other databases. Variations will be reported in relation to reference sequences on three levels--DNA, RNA, and protein-whenever applicable. In the first phase, the database will concentrate on genetic variations including both somatic and germline variations for susceptibility genes. Additional types of information can be integrated at a later stage.

2016-09-15 New publication:

Vihinen M, 2016
Both generic and protein-specific tolerance predictors are needed.
Hum Mutat 37: 989

2016-06-27 New publication:

Yang Y, Niroula A, Shen B, Vihinen M, 2016
PON-Sol: prediction of effects of amino acid substitutions on protein solubility.
Bioinformatics 32: 2032-2034 doi: 10.1093/bioinformatics/btw066

Solubility is one of the fundamental protein properties. It is of great interest because of its relevance to protein expression. Reduced solubility and protein aggregation are also associated with many diseases.
We collected from literature the largest experimentally verified solubility affecting amino acid substitution (AAS) dataset and used it to train a predictor called PON-Sol. The predictor can distinguish both solubility decreasing and increasing variants from those not affecting solubility. PON-Sol has normalized correct prediction ratio of 0.491 on cross-validation and 0.432 for independent test set. The performance of the method was compared both to solubility and aggregation predictors and found to be superior. PON-Sol can be used for the prediction of effects of disease-related substitutions, effects on heterologous recombinant protein expression and enhanced crystallizability. One application is to investigate effects of all possible AASs in a protein to aid protein engineering.
PON-Sol is freely available at The training and test data are available at

2016-06-09 Update VariSNP datasets:

The VariSNP benchmark datasets were updated using the dbSNP xml datasets from the NCBI ftp website (NCBI), which were last modified in April 2016 and contain variants from update build 147 (GRCh38p2)

2016-03-21 New publication:

Niroula A, Vihinen M, 2016
PON-mt-tRNA: a multifactorial probability-based method for classification of mitochondrial tRNA variations.
Nucl. Acids Res. 44: 2020-2027 doi: 10.1093/nar/gkw046

Transfer RNAs (tRNAs) are essential for encoding the transcribed genetic information from DNA into proteins. Variations in the human tRNAs are involved in diverse clinical phenotypes. Interestingly, all pathogenic variations in tRNAs are located in mitochondrial tRNAs (mt-tRNAs). Therefore, it is crucial to identify pathogenic variations in mt-tRNAs for disease diagnosis and proper treatment. We collected mt-tRNA variations using a classification based on evidence from several sources and used the data to develop a multifactorial probability-based prediction method, PON-mt-tRNA, for classification of mt-tRNA single nucleotide substitutions. We integrated a machine learning-based predictor and an evidence-based likelihood ratio for pathogenicity using evidence of segregation, biochemistry and histochemistry to predict the posterior probability of pathogenicity of variants. The accuracy and Matthews correlation coefficient (MCC) of PON-mt-tRNA are 1.00 and 0.99, respectively. In the absence of evidence from segregation, biochemistry and histochemistry, PON-mt-tRNA classifies variations based on the machine learning method with an accuracy and MCC of 0.69 and 0.39, respectively. We classified all possible single nucleotide substitutions in all human mt-tRNAs using PON-mt-tRNA. The variations in the loops are more often tolerated compared to the variations in stems. The anticodon loop contains comparatively more predicted pathogenic variations than the other loops. PON-mt-tRNA is available at

2016-03-15 New publication:

Niroula A, Vihinen M, 2016
Variation Interpretation Predictors: Principles, Types, Performance and Choice.
Hum Mutat. 37: 579-597 doi: 10.1002/humu.22987

Next-generation sequencing methods have revolutionized the speed of generating variation information. Sequence data have a plethora of applications and will increasingly be used for disease diagnosis. Interpretation of the identified variants is usually not possible with experimental methods. This has caused a bottleneck that many computational methods aim at addressing. Fast and efficient methods for explaining the significance and mechanisms of detected variants are required for efficient precision/personalized medicine. Computational prediction methods have been developed in three areas to address the issue. There are generic tolerance (pathogenicity) predictors for filtering harmful variants. Gene/protein/disease-specific tools are available for some applications. Mechanism and effect-specific computer programs aim at explaining the consequences of variations. Here, we discuss the different types of predictors and their applications. We review available variation databases and prediction methods useful for variation interpretation. We discuss how the performance of methods is assessed and summarize existing assessment studies. A brief introduction is provided to the principles of the methods developed for variation interpretation as well as guidelines for how to choose the optimal tools and where the field is heading in the future.

2016-02-26 New publication:

Vihinen M, Hancock, JM, Maglott, DR, Landrum, MJ, Schaafsma, GCP, Taschner, PEM, 2016
Human Variome Project Quality Assessment Criteria for Variation Databases.
Hum Mutat. 37: 549-558 doi: 10.1002/humu.22976

Numerous databases containing information about DNA, RNA, and protein variations are available. Gene-specific variant databases (locus-specific variation databases, LSDBs) are typically curated and maintained for single genes or groups of genes for a certain disease(s). These databases are widely considered as the most reliable information source for a particular gene/protein/disease, but it should also be made clear they may have widely varying contents, infrastructure, and quality. Quality is very important to evaluate because these databases may affect health decision-making, research, and clinical practice. The Human Variome Project (HVP) established a Working Group for Variant Database Quality Assessment. The basic principle was to develop a simple system that nevertheless provides a good overview of the quality of a database. The HVP quality evaluation criteria that resulted are divided into four main components: data quality, technical quality, accessibility, and timeliness. This report elaborates on the developed quality criteria and how implementation of the quality scheme can be achieved. Examples are provided for the current status of the quality items in two different databases, BTKbase, an LSDB, and ClinVar, a central archive of submissions about variants and their clinical significance.

2016-02-03 Update VariSNP datasets:

The VariSNP benchmark datasets were updated using the dbSNP xml datasets from the NCBI ftp website (NCBI), which were last modified in January 2016 and contain variants from update build 146 (GRCh38p2)

2016-01-18 New publication:

Schaafsma GCP, Vihinen M, 2016
VariOtator, a software tool for variation annotation with the Variation Ontology.
Hum Mutat. Hum Mutat. 37: 344-349. doi: 10.1002/humu.22954.

The Variation Ontology (VariO) is used for describing and annotating types, effects, consequences and mechanisms of variations. To facilitate easy and consistent annotations, the online application VariOtator was developed. For variation type annotations VariOtator is fully automated, accepting variant descriptions in Human Genome Variation Society (HGVS) format, and generating VariO terms, either with or without full lineage, i.e. all parent terms. When a coding DNA variant description with a reference sequence is provided, VariOtator checks the description first with Mutalyzer and then generates the predicted RNA and protein descriptions with their respective VariO annotations. For the other sublevels - function, structure and property - annotations cannot be automated, and VariOtator generates annotation based on provided details. For VariO terms relating to structure and property, one can use attribute terms as modifiers and Evidence Code (ECO) terms for annotating experimental evidence. There is an online batch version, and stand-alone batch versions to be used with a Leiden Open Variation Database (LOVD) download file. A SOAP web service allows client programs to access VariOtator programmatically. Thus, systematic variation effect and type annotations can be efficiently generated to allow easy use and integration of variations and their consequences.

2015-12-03 Update VariSNP datasets:

The VariSNP benchmark datasets were updated using the dbSNP xml datasets from the NCBI ftp website (NCBI), which were last modified in October 2015 and contain variants from update build 144 (GRCh38)

2015-09-09 New publication:

Niroula A, Vihinen M, 2015
Classification of amino acid substitutions in mismatch repair proteins using PON-MMR2.
Hum Mutat. 36: 1128-1134. doi: 10.1002/humu.22900.

2015-08-26 PON-P2 prediction data available:

Prediction data from PON-P2 for amino acid substitutions in COSMIC (v68) are available here.

2015-08-20 Update VariO:

Variation Ontology VariO was updated to version 1.04, with some minor changes and corrections. Three new terms were introduced:
VariO:0017 nonsynonymous variation
VariO:0343 synonymous variation
VariO:0363 effect on RNA G-quadruplex

2015-08-19 New publication:

Vihinen M, 2015
Muddled genetic terms miss and mess the message.
Trends Genet. 31:423-425. doi: 10.1016/j.tig.2015.05.008

A critical aspect of science is the clear communication of complicated matters. However, language is often ambiguous, and the message can get lost in the telling. In particular, genetic terms can have different meanings for different people. Here, I discuss this problem and suggest remedies to clarify the message.

2015-08-19 New publication:

Wuttge DM, Carlsen AL, Teku G, Steen SO, Wildt M, Vihinen M, Hesselstrand R, Heegaard NH, 2015
Specific autoantibody profiles and disease subgroups correlate with circulating micro-RNA in systemic sclerosis.
Rheumatology (Oxford). 2015 Jul 10. pii: kev234. [Epub ahead of print]

2015-08-19 New publication:

Niroula A, Vihinen M, 2015
Harmful somatic amino acid substitutions affect key pathways in cancers.
BMC Med Genomics: 8(1):53 doi:10.1186/s12920-015-0125-x

Cancer is characterized by the accumulation of large numbers of genetic variations and alterations of multiple biological phenomena. Cancer genomics has largely focused on the identification of such genetic alterations and the genes containing them, known as 'cancer genes'. However, the non-functional somatic variations out-number functional variations and remain as a major challenge. Recurrent somatic variations are thought to be cancer drivers but they are present in only a small fraction of patients.
We performed an extensive analysis of amino acid substitutions (AASs) from 6,861 cancer samples (whole genome or exome sequences) classified into 30 cancer types and performed pathway enrichment analysis. We also studied the overlap between the cancers based on proteins containing harmful AASs and pathways affected by them.
We found that only a fraction of AASs (39.88 %) are harmful even in known cancer genes. In addition, we found that proteins containing harmful AASs in cancers are often centrally located in protein interaction networks. Based on the proteins containing harmful AASs, we identified significantly affected pathways in 28 cancer types and indicate that proteins containing harmful AASs can affect pathways despite the frequency of AASs in them. Our cross-cancer overlap analysis showed that it would be more beneficial to identify affected pathways in cancers rather than individual genes and variations.
Pathways affected by harmful AASs reveal key processes involved in cancer development. Our approach filters out the putative benign AASs thus reducing the list of cancer variations allowing reliable identification of affected pathways. The pathways identified in individual cancer and overlap between cancer types open avenues for further experimental research and for developing targeted therapies and interventions.

2015-05-21 New publication:

Vihinen M, 2015
No more hidden solutions in bioinformatics.
Nature 521: 261 doi:10.1038/521261a

2015-04-23 New publication:

Vihinen M, 2015
The Importance of Proper Testing of Predictor Performance.
Hum Mutat. 36(5): iii-iv

2015-04-22 Update VariO:

Variation Ontology VariO was updated to version 1.03, with some minor changes and corrections.

2015-04-22 Update VariSNP datasets:

Due to the presence of some 'Pathogenic/Likely pathogenic' entries in the VariSNP benchmark datasets, updated 2015-04-09, these sets were updated by taking out these entries. The cds-indel, downstream-variant-500B, frameshift-variant and stop-lost sets were not affected.

2015-04-09 Update VariSNP datasets:

The VariSNP benchmark datasets were updated using the dbSNP xml datasets from the NCBI ftp website (NCBI), which were last modified in November 2014 and contain variants from update build 142 (GRCh38)

2015-03-31 New publication:
Väliaho J, Faisal I, Ortutay, C, Smith, CIE, Vihinen M, 2015
Characterization of all possible single nucleotide change-caused amino acid substitutions in the kinase domain of Bruton tyrosine kinase.
Hum Mutat. 36: 638-647

Knowledge about features distinguishing deleterious and neutral variations is crucial for interpretation of novel variants. Bruton tyrosine kinase (BTK) contains among the human protein kinases the highest number of unique disease-causing variations, still it is just 10% of all the possible single nucleotide substitution-caused amino acid variations. In the BTK kinase domain (BTK-KD) can appear altogether 1495 such variants. We investigated them all with bioinformatic and protein structure analysis methods. Most disease-causing variations affect conserved and buried residues disturbing protein stability. Minority of exposed residues is conserved, but strongly tied to pathogenicity. 67% of the variations are predicted to be harmful. In 39% of the residues, all the variants are likely harmful, while in 10% of sites all the substitutions are tolerated. Results indicate the importance of the entire kinase domain, involvement in numerous interactions, and intricate functional regulation by conformational change. These results can be extended to other protein kinases and organisms. This article is protected by copyright. All rights reserved.

2015-03-31 New publication:
Smith, TD, Vihinen M
Standard development at the Human Variome Project.
Database (2015) Vol. 2015: article ID bav024; doi:10.1093/database/bav024
Database 2015 bav024

The Human Variome Project (HVP) is a world organization working towards facilitating the collection, curation, interpretation and free and open sharing of genetic variation information. A key component of HVP activities is the development of standards and guidelines. HVP Standards are systems, procedures and technologies that the HVP Consortium has determined must be used by HVP-affiliated data sharing infrastructure and should be used by the broader community. HVP guidelines are considered to be beneficial for HVP affiliated data sharing infrastructure and the broader community to adopt. The HVP also maintains a process for assessing systems, processes and tools that implement HVP Standards and Guidelines. Recommended System Status is an accreditation process designed to encourage the adoption of HVP Standards and Guidelines. Here, we describe the HVP standards development process and discuss the accepted standards, guidelines and recommended systems as well as those under acceptance. Certain HVP Standards and Guidelines are already widely adopted by the community and there are committed users for the others.

Updated 2019-08-02 by Gerard Schaafsma

Standards and guidelines:
HVP Guidelines
Guidelines for prediction tools
Curating gene variant databases
HVP Country Nodes
Recommendations for LSDBs

Group members

News from PSB:

Lund University
Medical Faculty
Department of Experimental Medical Science

--> Home

ImmunoDeficiency Resource (IDR)
Immunome Knowledge Base (IKB)

Bioinformatics services:
B-Cell Proteome
Bioinformatics benchmarks
PID classification

Standards and guidelines:
HVP Guidelines
Guidelines for prediction tools
Curating gene variant databases
HVP Country Nodes
Recommendations for LSDBs

Group members

News from PSB:

Lund University
Medical Faculty
Department of Experimental Medical Science

Lund University, Sweden 2019 ©