ProCon: localization and visualization of Protein Conservation (version 2.0)



 


User guide for ProCon V2.0

ProCon is tool for locating and visualization of evolutionary conservation in protein sequences. The method can identify three types of conservation, namely identity (type I), physicochemical similarity (II), and covariant conservation (III). The conservative sites of type I and II are located with entropy calculation and the third type is identified by calculation of mutual information. The interacting networks formed by covariant pairs can also be identified. All the three types of conservation can be visualized in a representative protein structure. The tool performs exhaustive analysis results of which can be used e.g. for identifying different types of conservative residues, studying protein-protein interactions, explaining consequences of disease-causing mutations and mutant design for protein engineering.

 


1 Configure java runtime environment

Download program first (Versions available for Windows and MacOS).
The program is implemented in Java. To run the program, please make sure the Java Runtime Environment (JRE) or Java Develop Kit (JDK) with a version later than 1.6 is installed before hand. (You may download it from here)

The program can be run on both Windows and MacOS, attention that MacOS later than version10.6 embeds JDK, so users do not need to install it again.

 


2 Run ProCon Software

  1. release the downloaded file ProCon-V2.0.rar to a proper file route.

  2. the source file is in the "\bin" document and open it.

  3. double click the "ProCon-V2.0.jar" file, the program will be started.

  4. we use menus for inputtings and settings and use the 9 tag pages to check the running results.

 


Figure1 Graphic user interface of ProCon and open a FASTA-format file

 


3 Import a FASTA-format sequence file

Click the menu "File->open a FASTA file "( or use shortcut Ctrl + O) to open the file wizard. Then choose a FASTA format file to open.

An example file called "eample-input.txt" is available in the default "examples" document. The data is from PH domain (Shen 2004), which contains 161 sequences with 333 residues in each sequence. A dialog will show this information after opening the input file. The system calculates the results according to the default parameter values. These parameters can also be modified (see next chapter).

 

 

 

 

Figure2 Information dialog shown after opening a FASTA format sequence file

 


4 Configure parameters

Click the menu "settings->Gap Percentage" to set this parameter, the default value is 10%, which means that the positions with gap frequency higher than 10% will not be counted while calculating. Users can change this value and then click "apply" button to recalculate. (Figure 3) The detailed result is shown in "information content" tag page.

 

 

Figure3 Set gap percentage parameter.

 


Click the menu "settings->p values" to choose the p1 and p2 parameters,the default values are 0.01 and 0.05, which are used as thresholds for dividing the covariant amino acids. Users can choose other p values in the list and then recalculate. (Figure 4) The results can be found in "covariant aas" and "covariance distribution" tag pages.

 

 

Figure4 Set p value parameters

 


This function is for calculating the third type of conservation. Click the menu "settings->aa groups" reset the amino-acids groups, the default group is dividing the 20 aas into 6 groups according to physicochemical properties. Also the tool supplies another 3 grouping methods (polarity, essentiality and side chain properties, respectively), users can click one directly and then apply it. (Figure 5A) Or users can set their own grouping methods by 1) inputting number of groups and name for each group firstly (Figure 5B); 2) setting group for each amino acid (Figure 5A); 3) confirming the new grouping method finally (Figure 5C).

 

Figure5 Set aa groups

 


5 View the results

There are 9 tag pages in the software frame, the first page is "alignment" which displays the input file with sequence names and contents. A chosen sequence will be shown and you can click the "apply current sequence as reference" button to apply it as reference for mutations. The default reference is the first sequence.

 

Attention: If you change the reference sequence, please change the corresponding representative protein´s PDB ID in the structure page as well (see Figure13). Otherwise the result in the structure page cannot show.

Figure 6 Alignment tag page

 


The "information content" page displays the information at different positions using column charts. The red columns show information for 20 alphabets while blue ones show information for 6 alphabets as default (or different alphabets according to users´ own aa group settings, see 3.3). The green columns below show gap frequency at corresponding positions. The information will not be shown at the positions whose gap frequency is higher than set gap percentage (default value is 10%; it can be changed in settings menu, see 3.2). A scroll bar is used to choose positions. The diagrams in this page can be saved by right-clicking the mouse.

 

Figure7 Information content page

 


The "information statistics" page displays the statistics result for distribution information based on either 6-alphabet (or according to user ´s own aa group setting) or 20-alphabet. Some parameters for statistics, like information threshold and interval value can also be set in this page.

 

Figure8 Information statistics page

 


The " aa distribution " page displays frequency of amino acid distribution in a specific position using column charts. The top diagram shows results based on 20 amino acids and the diagram below shows results based on the default 6 alphabets (or according to user´s own aa group setting). A scroll bar is used to choose positions.

 

Figure9 amino acid distribution page

 


The " covariant aa " page displays the covariant amino acid pairs with maximum mutual information values. Users can choose to check mutual information according to either 20-alphabet or 6-alphabet (the aa group can also be set by users in settings menu). Attention that this selection will affect the following three tag pages: " covariance distribution " , " triplets " and " structure " .

 

The covariant pairs are listed in two tables since the mutual information are divided by the p values as thresholds. (Users can set p values in "settings" menu, see 4.2). Click a specific covariant pair in the tables to find mutual information details in both a table and a diagram.

 

Figure10 Covariant amino acid page

 


The "covariance distribution" page shows the distribution of mutual information using a column diagram. The diagram corresponds to either 20 or 6 alphabets; it is chosen in covariant aa page.

 

Figure11 covariance distribution page

 


The "triplets" page shows the triplets corresponding to different p values. The associations of these sites are shown in a diagram generated by an embedded tool called "graphviz" and stored as a *.dot file in the "\output" document as well.

 

For MacOS, clicking one of the radio buttons will call the graphviz software directly. (Attention: the default graphviz file stored in "\bin" route is for macOS Lion, for other operation system versions please download proper graphviz version from here)

 

For Windows OS, clicking one of the radio buttons will show the generated diagram. You may click "view the diagram in Windows paint" button to edit or save the diagrams yourself.

 

Figure12 triplet page

 


In the "structure " page, the representative protein ´s PDB ID is required firstly (for the example PH domain sequences in "example-input.txt " and choose the first sequence as reference in "alignment " page, the representative protein ´s PDB ID is 1B55; for other sequences or other reference, you should choose a corresponding protein´s structure), then this protein structure will be downloaded form the PDB database. The PDB file can also be downloaded manually and stored in the "\PDB " route with the name PDBID.pdb beforehand especially when there is any Internet problem.

 

The embedded software called Jmol is used here for visualize protein structure. Right clicking the mouse can change visual features. Figure 13 displays the downloaded protein structure and Figure 14 highlights the chosen triplet or covariant pair in the structure.

 

Figure13 Download PDB file and display in Jmol

 


 

 

Figure14 Check a specific triplet or covariant pair in protein structure

 


© 2005 - 2015   
Center for Systems Biology, Soochow University, Suzhou, China  
and  
Protein Structure and Bioinformatics Group, Lund University, Sweden