HowTo/DataTreeManipulation

From Comparative Phylogenetics in R
Revision as of 16:56, 11 December 2007 by Bls16 (talk)
Jump to: navigation, search

The commands referenced below are all part of special phylogenetic packages in R, not the basic R install. Be sure that you have installed and loaded the packages containing the commands referenced below before continuing. For example:

  library(ape)

This loads the package ape and its required packages, gee, lattice and ade4, into your R session.

To execute some of the worked examples below yourself, save the sample Geospiza phylogeny and dataset to your working directory and load them into memory using these commands

  geotree <- read.nexus("geospiza.nex")
  geodata <- read.table ("geospiza.txt")

How do I designate a specific taxon to be the root of my phylogeny?

The general syntax is

  rootedtree <- root(phylogeny, outgroup)

The Geospiza tree is already rooted at taxon "olivacea". This command will reroot the tree at taxon "fusca" and save the rerooted tree as a new phylo object.

  rerootedgeotree <- root(geotree, "fusca")

You can also just modify the existing phylo object.

  geotree <- root(geotree, "fusca")

Note, however, that rerooting produces a basal trichotomy . . . essentially this command roots the tree at the node subtending taxon fusca, not the taxon itself.

How can I resolve polytomies in my phylogeny?

How can I collapse very short branches into polytomies?

How can I see the length of the branches in my phylogeny?

How can I change the lengths of the branches in my phylogeny?

How can I see the list of taxa represented in my phylogeny?

How can I remove taxa from my phylogeny?

How can I see a plot of my phylogeny?

Is there a shorthand way to refer to a specific list of taxa (for example, all members of a particular clade)?

One approach is to concatenate all the taxon names into a named vector. For example, using the Geospiza dataset.

  cladeA = c("pauper", "psittacula", "parvulus")

Note that you need to enclose the taxon names in quotes, otherwise R will look for objects in memory named pauper, psittacula and parvulus.

How can I identify all the branches belonging to a particular subclade?

The general syntax is:

  branchlist <- which.edge(phylogeny, group)

In the case of the Geospiza example, the branches that unite the species "pauper", "psittacula", and "parvulus" (CladeA defined above) are given by:

  branchlist <- which.edge(geotree, cladeA)

This returns a list of integers, which identify the rows in the edge matrix of the Geospiza phylogeny that belong to the specified clade, which we stored as "Geotree".

You can see what the edge matrix looks like by typing:

  geotree$edge

By doing so, we can see that the edge matrix is a N by 2 list of the N branches in the phylogeny. Each node in the phylogeny is assigned a numbner, with each branch being defined by the numbers of the nodes bracketing it.

You can extract just the portion of the edge matrix containing the branches in cladeA like this:

  abranches<-geotree$edge[branchlist, ]

Remember that "branchlist" in this worked example is a vector of integers returned by which.edge.

How can I identify the node representing the most recent common ancestor of a pair of taxa?


Credit: Most of the information on this page is paraphrased from the book Analysis of Phylogenetics and Evolution with R (Paradis, 2006).