Thursday, July 3, 2014
Are genotype or phenotype data more tree-like?
I recently wrote a manuscript comparing the tree-likeness of phylogenetic data in biology and anthropology (see Are phylogenetic patterns the same in anthropology and biology?). While doing so, I also made a comparison of genotype and phenotype data within biology.
The comparison is based on maximum-parsimony analyses of the data, using the (ensemble) Retention Index (RI) as the measure of tree-likeness. If RI = 1 then all of the characters are compatible with the same tree, whereas if RI = 0 then none of them are pairwise compatible. As the graph shows, the genotype data are considerably less tree-like than are the phenotype data (mean RI ≈ 0.5 versus 0.7, respectively).
It would be interesting to know whether other people have observed this pattern. If it is general, then what causes it? Are the phenotype characters being chosen (subconsciously or not) because they show nested grouping patterns (which lend themselves automatically to a tree representation)? Or do the genotype data inherently have more stochastic variation? Does this mean that we should always be using phylogenetic networks for the representation of genotype data?
You can read the manuscript if you want the details of the analyses. Briefly, the initial collections of datasets were taken from Collard et al. (Evolution and Human Behavior 27: 169-184; 2006) — the graphed data are taken from the paper as I never managed to get the original datasets from the authors. I then supplemented this information with phenotype datasets from TreeBase (total of n=31) and miscellaneous genotype datasets from the literature (n=15). All of the datasets refer to vertebrates and insects (with one phenotype dataset from spiders). My parsimony analyses used the parsimony ratchet and PAUP*.