pccx – phylogeny based coverage calculation and extension – is a simple application for target selection for structural genomics. It enables to:
It is implemented in Java as part of the FORESTER package. Currently, three scoring methods are implemented:
» forester.jar version 4.0
FORESTER at sourceforge.net
java -cp path/to/forester.jar org.forester.tools.pccx [options] <phylogen(y|ies) infile> [external node name 1] [name 2] ... [name n]
-d: 1/distance based scoring method (instead of branch counting based)
-ld: -ln(distance) based scoring method (instead of branch counting based)
-x[=<n>]: optimally extend coverage by <n> external nodes. Use none, 0, or negative value for complete coverage extension.
-o=<file>: write output to <file>
-i=<file>: read (new-line separated) external node names from <file>
-p=<file>: write output as annotated phylogeny to <file> (only first phylogeny in phylogenies infile is used)
Annotated phylogenies (branches are colored as follows: green - maximum coverage, red - minimum coverage, black - arithmetic mean of coverage scores) can be viewed with ATV (version 4.00 ALPHA 5 or greater), to ensure that the colored branches are displayable, please use an appropriate configuration file for ATV (»example configuration file).
For the examples, a phylogeny based on the Malate/L-lactate dehydrogenase alignment from Pfam 21.0 is used (»Ldh_2.nhx).
As of 2007-05-25, the following seven sequences from this family have a structure in PDB: 1s20, 1nxu (DLGD_ECOLI); 1rfm (COMC_METJA); 1v9n (MDH_PYRHO); 1vbi (Q746L8_THET2); 1wtj, 2cwf (Q4U331_PSESM); 1xrh (ALLD_ECOLI); and 1z2i (Q7CRW4_AGRT5).
% java -cp path/to/forester.jar org.forester.tools.pccx Ldh_2.nhx DLGD_ECOLI COMC_METJA MDH_PYRHO Q746L8_THET2 Q4U331_PSESM ALLD_ECOLI Q7CRW4_AGRT5 -p=Ldh_2_b7.nhx
Output:
Options: scoring method: sum of 1/branch-segment-sum
Normalized score: 0.1497663297543091
Raw score : 33.84719052447385
Wrote annotated phylogeny to "Ldh_2_b7.nhx"
In this annotated phylogeny, branches are colored accoring to coverage: green - maximum coverage, red - minimum coverage, black - arithmetic mean of coverage socores:
% java -cp path/to/forester.jar org.forester.tools.pccx -d Ldh_2.nhx DLGD_ECOLI COMC_METJA MDH_PYRHO Q746L8_THET2 Q4U331_PSESM ALLD_ECOLI Q7CRW4_AGRT5
Output:
Options: scoring method: sum of 1/branch-length-sum [for self: 1/branch-length] [min branch length: 0.0010]
Normalized score: 0.12868805358848912
Raw score : 7623.40971285036
% java -cp path/to/forester.jar org.forester.tools.pccx -x=10 Ldh_2.nhx DLGD_ECOLI COMC_METJA MDH_PYRHO Q746L8_THET2 Q4U331_PSESM ALLD_ECOLI Q7CRW4_AGRT5 -p=Ldh_2_b7_x10.nhx
Output:
Options: scoring method: sum of 1/branch-segment-sum
Printing 10 names to extend coverage in an optimal manner:
before:
Normalized score: 0.1497663297543091
Raw score : 33.84719052447385
0
Q3PGX6_PARDE
0.16718837131297096
1
Q6D702_ERWCT
0.18360557829584423
2
Q1V2K0_9RICK
0.1942380873796807
3
Q7PI68_ANOGA
0.2046462755533554
4
Q5QTW6_IDILO
0.21426464391066183
5
Q2T3J0_BURTA
0.22353129908439665
6
Q5WAN1_BACSK
0.23244837758112122
7
Q8UIX7_AGRT5
0.2404779463407785
8
Q323Z3_SHIBS
0.2481616097766543
9
Q8YB95_BRUME 0.25576625930608254
after:
Normalized score: 0.25576625930608254
Raw score : 57.803174603174654
Wrote annotated phylogeny to "Ldh_2_b7_x10.nhx"
In this annotated phylogeny, branches are colored accoring to coverage: green - maximum coverage, red - minimum coverage, black - arithmetic mean of coverage socores:
As for the examples above, a phylogeny based on the Malate/L-lactate dehydrogenase alignment from Pfam 21.0 is used.
The graph was produced with gnuplot.
Brenner S.E. (2000). Target selection for structural genomics. Nature Structural Biology, 7, 967 - 969. [Nature Structural Biology]
Rodrigues A.P.C., Grant B.J., and Hubbard R.E. (2006). sgTarget: a target selection resource for structural genomics. Nucleic Acids Research, 34, W225-W230. [Nucleic Acids Research]
Christian M Zmasek
Burnham Institute for Medical Research | cmzmasek yahoo com
Copyright © 2007 Christian M Zmasek | Last updated 2007-05-02