Wednesday, October 8, 2014

Thoroughbred horses and reticulate pedigrees

I noted recently that the best documented human genealogies are those for the various Anabaptist populations (including the Mennonites, Hutterites and Amish) (The importance of the Amish for reticulate genealogies). They have mostly closed populations (ie. marriages occur solely within a population), and they are thus inbred, and most importantly they maintain detailed written genealogies. This makes them ideal for genealogical studies involving reticulation, including being a source of "known" reticulate histories for testing network algorithms.

If we move outside of Homo sapiens then a genealogy that is equally well documented (if not better) is that of English Thoroughbred horses. This breed was developed as a result of the enthusiasm of the British aristocracy for racing in the 17th century. Thoroughbred pedigree records are regarded as the most comprehensive records detailing ancestral relationships among domestic animal breeds, and they have been formally catalogued since the appearance of the first edition of the General Stud Book in 1791.

As noted by Binns et al. (2011):
The Thoroughbred horse breed was established in England in the early 1700s based on crosses between stallions of Arabian origin and indigenous mares. The founder population was small, with all current males tracing back to one of three stallions, the Godolphin Arabian, the Byerley Turk and the Darley Arabian; in contrast, on the female side, about 70 foundation mares have been identified. A stud book for Thoroughbred horses was initiated in 1791, and pedigree records for the breed, which now number about five hundred thousand horses, are maintained by Thoroughbred registries worldwide.
For the males, the story is continued by Bower et al. (2012):
All living Thoroughbreds trace paternally to just three stallions imported into England in the late 17th and early 18th centuries: Byerley Turk (1680s), Darley Arabian (1704) and Godolphin Arabian (1729). Furthermore, a small number of stallions exerted disproportionate influence on early Classic races resulting in their greater popularity at stud. Therefore, the Thoroughbred gene pool has been restricted by small foundation stock and subsequent limited paternal contributions as a result of sire preference and selection. [Our] historic samples were related largely via the Darley Arabian sire line to which 95% of all living Thoroughbreds can be traced in their paternal lineage.
Actually, 95% of living Thoroughbreds trace their male lineage to Eclipse (1764), a great-great grandson of the Darley Arabian, so that it is Eclipse who appears as the progenitor in most published genealogies (eg. see the one below). Information about these early males is available at this Thoroughbred Heritage page.

Females have been of less interest to horse breeders, and so in many cases we do not know who they were, and in many others we have only a generic name (eg. "Miss Darcy's pet mare", "old Montagu mare", "royal mare", etc). This means that in modern horses there is a high level of mtDNA diversity due to multiple female lineages but there is very little sequence diversity on the Y chromosome (Wallner et al. 2013). Nevertheless, Hill et al. (2002) have tried to trace the influence of the early females on current genotypes, singling out 19 of them as having large influence (on the mitochondrial genealogy), while Bower et al. (2011) provide a broader analysis. Information about these early females is available at this Thoroughbred Heritage page.

The relevance of this information for genealogy studies is that it tells us the Thoroughbred genealogy is effectively closed (little outside breeding), and it is thoroughly documented. This is potentially another source of known reticulate genealogies.

Of particular interest to horse breeders is inbreeding (see Binns et al. 2012). In suitable doses this is seen as a Good Thing, because it can produce the homozygous appearance of desirable racing characteristics. However, inbreeding should not be too recent. For example, if we look at the list of the Blood-Horse Top 100 Thoroughbreds of the 20th Century then none of them have inbreeding in the previous generation and only one has inbreeding in the one before that. However, 54% of the horses have inbreeding in the fourth ancestral generation, and 18% in each of the third and fifth generations. Only 9 horses had no inbreeding during the five previous generations.

For this reason, the standard version of horse genealogies only goes back five generations. This is the stage at which the inbreeding coefficient becomes <1% — inbreeding earlier than five generations has no practical effect on homozygosity. There are potentially 32 ancestors in the 5th generation, contributing 1/32=3% of the DNA on average. This inbreeding is of interest to us because it creates extensive reticulation in horse genealogies.

Pedigree data are readily available at sites like Pedigree Online. Pedigrees are usually drawn as treemaps (see the blog post Trees, treemaps and networks) with horses being repeated as often as necessary to be able to draw the network as a tree (see the blog post Reducing networks to trees). Here is a typical example, for the horse Maddox, without recent inbreeding. Males are in blue and females pink, with the parents at the left and their ancestors proceeding to the right.

Here is an example, for the horse Induna Mkubwa, with inbreeding in the 3rd+4th ancestral generations (highlighted in purple) and also in the 4th+5th generations (in green). Note that the horse Be My Chief is also inbred, in his 4th ancestral generation (in green).

Clearly, this second genealogy should more properly be drawn as a reticulating network. Once this sort of thing is done the reticulations become obvious. Here is an example network for the horse known as Roberto. The horses are numbered in the manner conventional for human pedigrees, with the males on the left of each pair. This is about as complex as it gets for these horses; and you will note that there are only two-thirds of the "expected" number of ancestors.

Finally, here is an example network from the paper by Bower et al. (2012), covering a longer time period but restricted to selected male horses (ie. the female lineages that lead to the reticulation are not named).

Thanks to Induna Mkubwa for the idea for this post.


Binns MM, Boehler DA, Bailey E, Lear TL, Cardwell JM, Lambert DH (2012) Inbreeding in the Thoroughbred horse. Animal Genetics 43: 340-342.

Bower MA, Campana MG, Whitten M, Edwards CJ, Jones H, Barrett E, Cassidy R, Nisbet RE, Hill EW, Howe CJ, Binns M. (2011) The cosmopolitan maternal heritage of the Thoroughbred racehorse breed shows a significant contribution from British and Irish native mares. Biology Letters 7: 316-320.

Bower MA, McGivney BA, Campana MG, Gu J, Andersson LS, Barrett E, Davis CR, Mikko S, Stock F, Voronkova V, Bradley DG, Fahey AG, Lindgren G, MacHugh DE, Sulimova G, Hill EW (2012) The genetic origin and history of speed in the Thoroughbred racehorse. Nature Communications 3: 643.

Hill EW, Bradley DG, Al-Barody M, Ertugrul O, Splan RK, Zakharov I, Cunningham EP (2002) History and integrity of thoroughbred dam lines revealed in equine mtDNA variation. Animal Genetics 33: 287-294.

Wallner B, Vogl C, Shukla P, Burgstaller JP, Druml T, Brem G (2013) Identification of genetic variation on the horse Y chromosome and the tracing of male founder lineages in modern breeds. PLoS One 8: e60015.

No comments:

Post a Comment