Wednesday, April 17, 2013

When is a tree structure a phylogeny?


I have noted before today that many people seem to treat non-biological phylogenetic attributes as being analogous to genotypes whereas most such data are much more similar to phenotypes (eg. False analogies between anthropology and biology; The Music Genome Project is no such thing). This inappropriate analogy can lead to problems, such as incorrect conclusions regarding familial relationships.

In a similar vein, another problem is the appropriation of the word "phylogeny" to refer to non-evolutionary types of tree. A web search for phylogeny will lead you to many sites where the tree structure being referenced is very unlike an evolutionary history.

Systematists have long dealt with this issue as manifest in the confusion between classification and phylogeny. Biological classification is usually treated as most informative (eg. explanatory, predictive) when based on a phylogeny, but a phylogeny is not automatically a classification, and a classification is not automatically a phylogeny.

The best known example is the NCBI Taxonomy, as used by the GenBank database. This is one of the most commonly used classification schemes today, but in bioinformatics it is frequently used as a phylogeny as well as a classification. This is in spite of the fact that NCBI offers the following disclaimer:
The NCBI taxonomy database is not a primary source for taxonomic or phylogenetic information. Furthermore, the database does not follow a single taxonomic treatise but rather attempts to incorporate phylogenetic and taxonomic knowledge from a variety of sources, including the published literature, web-based databases, and the advice of sequence submitters and outside taxonomy experts. Consequently, the NCBI taxonomy database is not a phylogenetic or taxonomic authority and should not be cited as such.
The issue here is that the classification is hierarchical and can therefore be expressed as a tree, and the same can said of the nested relationships in a phylogeny. However, not all trees are phylogenies, and the NCBI Taxonomy is a classification that is not necessarily phylogenetic.

More recently, the word phylogeny has been adopted by the computational word to refer to many hierarchical clustering patterns. For example, consider this definition from FreeBase:
The phylogeny pattern is a major pattern within ontology / schema modelling, and is prevalent in many schemas in Freebase. Commonly related are the parent-child pattern and the containment pattern.
In other words, parent-child patterns are phylogenetic, which is literally true as far as it goes, but a two-level hierarchy fits this pattern without being anything more than a trivial phylogeny in the biological sense. An example is the Wikipedia music entries (eg. Rock music), which have a genre and several subgenres, along with fusion genres — this produces a shallow but broad "tree". Indeed, FreeBase has this to say about their own attempt to implement this idea:
One issue is that the some of the data in the music genre hierarchy in Freebase seems to attempt to show a genealogy of genres, rather than family groupings, which is counter to the way that parent and child Media genres are defined.
This seems to be a rather confused set of analogies involving families and genealogies. The false analogy between a tree and a phylogeny seems to have created this confusion. A genealogy expresses family groups (as does a phylogeny), but not all of those potential groups need be expressed in a classification.

It seems to me that it would be simpler for the computational world to refer to a hierarchy rather than a phylogeny.

No comments:

Post a Comment