From Comparative Phylogenetics in R
Revision as of 17:25, 11 December 2007 by Bls16 (talk)
Jump to: navigation, search

The commands referenced below are all part of special phylogenetic packages in R, not the basic R install. Be sure that you have installed and loaded the packages containing the commands referenced below before continuing. For example:


This loads the package ape and its required packages, gee, lattice and ade4, into your R session.

To execute some of the worked examples below yourself, save the sample Geospiza phylogeny and dataset to your working directory and load them into memory using these commands

  geotree <-"geospiza.nex")
  geodata <- read.table("geospiza.txt")

How do I designate a specific taxon to be the root of my phylogeny?

The general syntax is

  rootedtree <- root(phylogeny, outgroup)

The Geospiza tree is already rooted at taxon "olivacea". This command will reroot the tree at taxon "fusca" and save the rerooted tree as a new phylo object "rerootedgeotree".

  rerootedgeotree <- root(geotree, "fusca")

You can also just modify the existing phylo object.

  geotree <- root(geotree, "fusca")

Note that rerooting produces a basal trichotomy . . . essentially this command roots the tree at the node subtending taxon fusca, not the taxon itself.

The root command will also root the tree at a group of taxa so long as that group is monophyletic. Otherwise, an error is returned.

How can I resolve polytomies in my phylogeny?

The multi2di command will break polytomies in random order.

  dichotomousphylogeny <- multi2di(phylogeny, random = TRUE)

You can also choose to break polytomies in the order that taxa appear in the list of tip names.

  dichotomousphylogeny <- multi2di(phylogeny, random = FALSE)

How can I collapse very short branches into polytomies?

Use the di2multi command. The basic syntax is

  collapsedphylogeny <- di2multi(phylogeny, tolerancevalue)

Branches shorter than the tolerance value will be collapsed into polytomies. If unspecified, the tolerance value defaults to 10E-8. For example:

  collapsedgeotree <- di2multi(geotree, 0.03)

This should produce two polytomies in the Geospiza phylogeny.

How can I see the length of the branches in my phylogeny?

The vector of branch lengths for phylo object "phylogeny" is contained in phylogeny$edge.length. For the Geospiza dataset, you can view this vector by typing:


How can I change the lengths of the branches in my phylogeny?

Use the compute.brlen function. This is the basic syntax

  phylogeny <- compute.brlen(phylogeny, method, power)

Method can equal Grafen (the default), a vector, or a user defined function (for example, to generate random lengths). If a vector shorter than the number of branches is given as the value of the argument method, that vector is iterated over the branches in sequence. If power is provided, it exponentiates the branch lengths by the specified power.

The following syntax will ultrametricize the Geospiza phylogeny using Grafen's method

  geotree <- compute.brlen(geotree, method="Grafen")

While this syntax will set all branch lengths equal to one

  geotree <- compute.brlen(phylogeny, 1)

And this will alternate branch lengths of one and two throughout the phylogeny.

  geotree <- compute.brlen(geotree, c(1, 2))

How can I see the list of taxa represented in my phylogeny?

How can I remove taxa from my phylogeny?

How can I see a plot of my phylogeny?

Is there a shorthand way to refer to a specific list of taxa (for example, all members of a particular clade)?

One approach is to concatenate all the taxon names into a named vector. For example, using the Geospiza dataset.

  cladeA = c("pauper", "psittacula", "parvulus")

Note that you need to enclose the taxon names in quotes, otherwise R will look for objects in memory named pauper, psittacula and parvulus.

How can I identify all the branches belonging to a particular subclade?

The general syntax is:

  branchlist <- which.edge(phylogeny, group)

In the case of the Geospiza example, the branches that unite the species "pauper", "psittacula", and "parvulus" (CladeA defined above) are given by:

  branchlist <- which.edge(geotree, cladeA)

This returns a list of integers, which identify the rows in the edge matrix of the Geospiza phylogeny that belong to the specified clade, which we stored as "Geotree".

You can see what the edge matrix looks like by typing:


By doing so, we can see that the edge matrix is a N by 2 list of the N branches in the phylogeny. Each node in the phylogeny is assigned a numbner, with each branch being defined by the numbers of the nodes bracketing it.

You can extract just the portion of the edge matrix containing the branches in cladeA like this:

  abranches<-geotree$edge[branchlist, ]

Remember that "branchlist" in this worked example is a vector of integers returned by which.edge.

How can I identify the node representing the most recent common ancestor of a pair of taxa?

Credit: Most of the information on this page is paraphrased from the book Analysis of Phylogenetics and Evolution with R (Paradis, 2006).