Difference between revisions of "HowTo/Divergence Time Estimation"

From Comparative Phylogenetics in R
Jump to: navigation, search
(How do I estimate divergence times using nonparametric rate smoothing (NPRS))
Line 1: Line 1:
 
__TOC__
 
__TOC__
  
Many of the comparative methods require a ultrmetric tree. Currently, there are many programs available that estimate divergence times  (e.g., [http://paup.csit.fsu.edu/ PAUP*], [http://abacus.gene.ucl.ac.uk/software/paml.html paml], [http://beast.bio.ed.ac.uk/ BEAST], [http://loco.biosci.arizona.edu/r8s/ r8s][http://statgen.ncsu.edu/thorne/multidivtime.html multidivtime]). Different methods making different assumptions about how the tree may be parameterized with respect to branching times. The other available way to estimate divergence times is in R with the [http://pbil.univ-lyon1.fr/R/ape/ ape] package. In each of the following examples you will need a rooted tree with branch lengths. Likewise, your tree will need to be dichotomous (i.e., with no polytomies), therefore it might need a little massaging. See the page on [https://www.nescent.org/wg_phyloinformatics/R_Hackathon/DataTreeManipulation Tree & Data manipulation].
+
Many of the comparative methods require a ultrmetric tree. Currently, there are many programs available that estimate divergence times  (e.g., [http://paup.csit.fsu.edu/ PAUP*], [http://abacus.gene.ucl.ac.uk/software/paml.html paml], [http://beast.bio.ed.ac.uk/ BEAST], [http://loco.biosci.arizona.edu/r8s/ r8s][http://statgen.ncsu.edu/thorne/multidivtime.html multidivtime], [http://www.math.su.se/PATHd8/ PATHd8]). Different methods making different assumptions about how the tree may be parameterized with respect to branching times. The other available way to estimate divergence times is in R with the [http://pbil.univ-lyon1.fr/R/ape/ ape] package. In each of the following examples you will need a rooted tree with branch lengths. Likewise, your tree will need to be dichotomous (i.e., with no polytomies), therefore it might need a little massaging. See the page on [https://www.nescent.org/wg_phyloinformatics/R_Hackathon/DataTreeManipulation Tree & Data manipulation].
  
 
== How do I estimate divergence times using nonparametric rate smoothing (NPRS) ==
 
== How do I estimate divergence times using nonparametric rate smoothing (NPRS) ==
Line 75: Line 75:
  
 
== How do I estimate divergence times using mean path length (MPL) ==
 
== How do I estimate divergence times using mean path length (MPL) ==
Another method available in '''ape''' is the mean path length method of Britton et al. (2002, 2007). This is achieved by issuing the command:
+
Another method available in '''ape''' is the mean path length method of Britton et al. (2002, [http://www.informaworld.com/smpp/content~content=a782130970~db=all~jumptype=rss 2007]). This is achieved by issuing the command:
  
 
<pre>
 
<pre>

Revision as of 11:25, 14 December 2007

Many of the comparative methods require a ultrmetric tree. Currently, there are many programs available that estimate divergence times (e.g., PAUP*, paml, BEAST, r8smultidivtime, PATHd8). Different methods making different assumptions about how the tree may be parameterized with respect to branching times. The other available way to estimate divergence times is in R with the ape package. In each of the following examples you will need a rooted tree with branch lengths. Likewise, your tree will need to be dichotomous (i.e., with no polytomies), therefore it might need a little massaging. See the page on Tree & Data manipulation.

How do I estimate divergence times using nonparametric rate smoothing (NPRS)

The first step for each of these methods is to load the functions from the ape package:

library(ape)

next, you will want to read in your rooted tree with branch lengths with the read.tree command:

mytree <- read.tree(file="PATH_TO_FILE")

or

mytree <- read.nexus(file="PATH_TO_FILE")

depending on how your tree is formatted. The variable mytree is now an object of class phylo. This tree can be used with all of the following examples. The first is to transform your branch lengths using nonparametric rate smoothing (NPRS; see Sanderson, 1997). This is achieved by issuing the command:

chronogram(mytree)


This command takes three additional subcommand:

scale-- This assigns a age to the root of the tree.

expo-- This defines the exponent of the exponential function.

minEdgeLength-- Minimum edge length in the phylogram (default value: 1e-06). If any branches in the tree are shorter then this value, the will be assigned it.

It is then possible to view the tree by passing the chronogram argument to the plot function:

plot(chronogram(mytree))


Likewise, you can save the tree to file by passing the chronogram argument to the write.tree function:

write.tree(chronogram(mytree), file="/Users/cbell/tree")

where "/Users/cbell/tree" is the path to where I want the file to be saved.

How do I estimate divergence times using penalized likelihood (PL)

This next function estimates the node ages of a tree using semi-parametric method based on penalized likelihood (Sanderson 2002).

chronopl(phy, lambda, node.age = 1, node = "root", CV = FALSE)

The branch lengths of the input tree are interpreted as (mean) numbers of substitutions where 'phy' is an object of class "phylo"; lambda equals a value of the smoothing parameter; node.age is a numeric values specifying the fixed node ages; 'node' is the numbers of the nodes whose ages are given by node.age; "root" is a short-cut the number of the node; and 'CV' is whether to perform cross-validation (see Sanderson, 2002).

Determining an appropriate 'lambda' value is the crux of the matter. This is where the cross-validation procedure comes in.

l <- 10^(-1:6)
cv <- numeric(length(l))

for (i in 1:length(l))
    cv[i] <- sum(attr(chronopl(mammals, lambda = l[i], CV=TRUE), "D2"))
plot(l, cv)

How do I estimate divergence times using mean path length (MPL)

Another method available in ape is the mean path length method of Britton et al. (2002, 2007). This is achieved by issuing the command:

chronoMPL(phy)

This function has two sub commands:

se-- a logical specifying whether to compute the standard-errors of the node ages (TRUE by default).

test--a logical specifying whether to test the molecular clock at each node (TRUE by default).

The tests performed if test = TRUE is a comparison of the MPL of the two subtrees originating from a node; the null hypothesis is that the rate of substitution was the same in both subtrees (Britton et al. 2002). The test statistic follows, under the null hypothesis, a standard normal distribution. The returned P-value is the probability of observing a greater absolute value (i.e., a two-sided test). No correction for multiple testing is applied: this is left to the user.


Credit: Some of the information on this page is paraphrased from the book Analysis of Phylogenetics and Evolution with R" (Paradis, 2006) or from documentation within ape itself.