SDI is an algorithm to infer gene duplications on a gene tree. For more information see
Zmasek and Eddy (2001). A simple algorithm to infer gene duplication and speciation events on a gene tree. Bioinformatics, 17, 821-828.
A algorithm/tool based on SDI is SDI R which allows to infer duplications on unrooted/erroneously rooted gene trees and at the same time rooting them by
minimizing the sum of inferred duplications.
SDI has been implemented in Java as part of the
forester libraries.
Source code is available at [forester] at sourceforge.net
java -cp path\to\forester.jar org.forester.application.sdi [options] <gene tree in phyloXML format> <species tree in phyloXML format> [outfile]
Options:
-s: to strip the species tree prior to duplication inference
-g: to use GSDI algorithm instead of SDIse algorithm (under development, not recommended)
-m: use most parimonious duplication model for GSDI:
assign nodes as speciations which would otherwise be assiged
as unknown because of polytomies in the species tree
Species tree:
In phyloXML format, with taxonomy data in appropriate fields.
Gene tree:
In phyloXM format, with taxonomy and sequence data in appropriate fields.
!! WARNING: GSDI algorithm is under development (and possibly not correct), please use SDIse instead !!
Download a example species tree ("arthropoda_species_tree.xml") and a
example gene tree ("sample_gene_tree.xml").
Then, run SDI with the following command line:
% java -cp path\to\forester.jar org.forester.application.sdi sample_gene_tree.xml arthropoda_species_tree.xml sdi_out.xml
The important point to keep in mind is that there must be at least one sub-element of the 'taxonomy' element which allows to
match the sequences in the gene tree with a taxonomy in the species tree. In this example this sub-element of the 'taxonomy' element
is 'code'.
Christian M Zmasek
phylosoft -at- gmail -dot- com
Last updated 2010.03.01
www.phyloxml.org | Archaeopteryx | www.phylosoft.org