# Difference between revisions of "HowTo/Divergence Time Estimation"

(→How do I estimate divergence times using nonparametric rate smoothing (NPRS)) |
(→References) |
||

(20 intermediate revisions by 7 users not shown) | |||

Line 1: | Line 1: | ||

__TOC__ | __TOC__ | ||

− | Many of the comparative methods require a | + | Many of the comparative methods require a ultrametric tree. Different methods making different assumptions about how the tree may be parameterized with respect to branching times. To estimate divergence times in R you need the [http://pbil.univ-lyon1.fr/R/ape/ ape] package. In each of the following examples you will need a rooted tree with branch lengths. Likewise, your tree will need to be dichotomous (i.e., with no polytomies), therefore it might need a little massaging. See the page on [https://www.nescent.org/wg_phyloinformatics/R_Hackathon/DataTreeManipulation Tree & Data manipulation]. There are also many other programs available that estimate divergence times (e.g., [http://paup.csit.fsu.edu/ PAUP*], [http://abacus.gene.ucl.ac.uk/software/paml.html paml], [http://beast.bio.ed.ac.uk/ BEAST], [http://loco.biosci.arizona.edu/r8s/ r8s][http://statgen.ncsu.edu/thorne/multidivtime.html multidivtime], [http://www.math.su.se/PATHd8/ PATHd8]). |

== How do I estimate divergence times using nonparametric rate smoothing (NPRS) == | == How do I estimate divergence times using nonparametric rate smoothing (NPRS) == | ||

Line 11: | Line 11: | ||

</pre> | </pre> | ||

− | next, you will want to read in your rooted tree with branch lengths with the '''read.tree''' command: | + | next, you will want to read in your rooted tree with branch lengths equal or proportional to number of base pair substitutions with the '''read.tree''' command: |

<pre> | <pre> | ||

Line 26: | Line 26: | ||

<pre> | <pre> | ||

− | chronogram(mytree) | + | chronotree <- chronogram(mytree) |

</pre> | </pre> | ||

Line 36: | Line 36: | ||

expo-- This defines the exponent of the exponential function. | expo-- This defines the exponent of the exponential function. | ||

− | minEdgeLength-- Minimum edge length in the phylogram (default value: 1e-06). If any branches in the tree are shorter then this value, | + | minEdgeLength-- Minimum edge length in the phylogram (default value: 1e-06). If any branches in the tree are shorter then this value, then they will be assigned it. |

It is then possible to view the tree by passing the '''chronogram''' argument to the '''plot''' function: | It is then possible to view the tree by passing the '''chronogram''' argument to the '''plot''' function: | ||

Line 55: | Line 55: | ||

== How do I estimate divergence times using penalized likelihood (PL) == | == How do I estimate divergence times using penalized likelihood (PL) == | ||

− | This next function estimates the node ages of a tree using semi-parametric method based on penalized likelihood (Sanderson 2002). | + | This next function estimates the node ages of a tree using a semi-parametric method based on penalized likelihood (Sanderson 2002). |

<pre> | <pre> | ||

Line 61: | Line 61: | ||

</pre> | </pre> | ||

− | The branch lengths of the input tree are interpreted as (mean) numbers of substitutions where 'phy' is an object of class "phylo" | + | The branch lengths of the input tree are interpreted as (mean) numbers of substitutions where 'phy' is an object of class "phylo." lambda equals a value of the smoothing parameter; node.age is a numeric values specifying the fixed node ages; 'node' is the numbers of the nodes whose ages are given by node.age; "root" is a short-cut for the number of the node; and 'CV' is whether to perform cross-validation (see [http://mbe.oxfordjournals.org/cgi/content/full/19/1/101 Sanderson, 2002]). One thing to keep in mind, however, is that the likelihood function is calculated based on a [http://en.wikipedia.org/wiki/Poisson_distribution Poisson] approximation using the number of substitutions observed to perform the calculations. This is not a problem if parsimony was used to calculate branch lengths, where the branch length represents the inferred number of changes that occurred along any given branch. If maximum likelihood was used to infer branch lengths, the values are in expected substitutions per site. In the program r8s, the user provides the number of sites used to infer the branch lengths to convert branch lengths to the observed number of substitutions. In ape, there is no such conversion/option. The user my want to convert there user tree branch lengths to observed number of substitutions by issuing the following commands in R: |

− | Determining an appropriate 'lambda' value is the crux of the matter. This is where the cross-validation procedure comes in. | + | <pre> |

+ | x <- mytree$edge.length | ||

+ | |||

+ | for (i in x) | ||

+ | y<-x*7999 | ||

+ | |||

+ | mytree$edge.length <-y | ||

+ | |||

+ | </pre> | ||

+ | |||

+ | In the example above, 'mytree' is the tree with branch lengths (edge.length) I read into R. I then converted the branch lengths by a factor of 7999 (the number of sites used in the original dataset to calculate the branch lengths). | ||

+ | |||

+ | Now your tree is ready for the penalized likelihood analysis. However, you need to determine a lambda value (smoothing parameter). Determining an appropriate 'lambda' value is the crux of the matter. This is where the cross-validation procedure comes in. | ||

<pre> | <pre> | ||

Line 75: | Line 87: | ||

== How do I estimate divergence times using mean path length (MPL) == | == How do I estimate divergence times using mean path length (MPL) == | ||

− | Another method available in '''ape''' is the mean path length method of Britton et al. (2002, 2007). This is achieved by issuing the command: | + | Another method available in '''ape''' is the mean path length method of Britton et al. (2002, [http://www.informaworld.com/smpp/content~content=a782130970~db=all~jumptype=rss 2007]). This is achieved by issuing the command: |

<pre> | <pre> | ||

Line 89: | Line 101: | ||

The tests performed if test = TRUE is a comparison of the MPL of the two subtrees originating from a node; the null hypothesis is that the rate of substitution was the same in both subtrees (Britton et al. 2002). The test statistic follows, under the null hypothesis, a standard normal distribution. The returned P-value is the probability of observing a greater absolute value (i.e., a two-sided test). No correction for multiple testing is applied: this is left to the user. | The tests performed if test = TRUE is a comparison of the MPL of the two subtrees originating from a node; the null hypothesis is that the rate of substitution was the same in both subtrees (Britton et al. 2002). The test statistic follows, under the null hypothesis, a standard normal distribution. The returned P-value is the probability of observing a greater absolute value (i.e., a two-sided test). No correction for multiple testing is applied: this is left to the user. | ||

+ | ==References== | ||

+ | |||

+ | Britton, T., Oxelmanb, B., Vinnerstenb, A. and Bremer, K. (2002) Phylogenetic dating with confidence intervals using mean path lengths. ''Molecular Phylogenetics and Evolution'' 24(1), 58-65. | ||

+ | |||

+ | Britton, T., Anderson, C. L., Jacquet, D., Lundqvist, S. & Bremer, K. ( 2007) Estimating Divergence Times in Large Phylogenetic Trees. ''Systematic Biology'' 56(5), 741-752. | ||

+ | |||

+ | Sanderson, M. J. (1997) Sanderson, M. J., Thorne, J.L., Wikström, n. and Bremer, K. (1997) Molecular evidence on plant divergence times. ''American Journal of Biology'' 91, 1656-1665. | ||

+ | |||

+ | Sanderson, M. J. (2002) Estimating Absolute Rates of Molecular Evolution and Divergence Times. A Penalized Likelihood Approach. ''Molecular Biology and Approach'' 19, 101-109. | ||

+ | ---- | ||

Credit: Some of the information on this page is paraphrased from the book ''Analysis of Phylogenetics and Evolution with R" (Paradis, 2006) or from documentation within ape itself. | Credit: Some of the information on this page is paraphrased from the book ''Analysis of Phylogenetics and Evolution with R" (Paradis, 2006) or from documentation within ape itself. | ||

+ | |||

+ | [[HowTo/Table of Contents| Back to the HowTo/Table of Contents]] | ||

+ | |||

+ | [[Category:HowTo]] | ||

+ | [[Category:R Help]] | ||

+ | [[Category:Comparative Methods Help]] |

## Latest revision as of 15:05, 13 March 2008

## Contents

Many of the comparative methods require a ultrametric tree. Different methods making different assumptions about how the tree may be parameterized with respect to branching times. To estimate divergence times in R you need the ape package. In each of the following examples you will need a rooted tree with branch lengths. Likewise, your tree will need to be dichotomous (i.e., with no polytomies), therefore it might need a little massaging. See the page on Tree & Data manipulation. There are also many other programs available that estimate divergence times (e.g., PAUP*, paml, BEAST, r8smultidivtime, PATHd8).

## How do I estimate divergence times using nonparametric rate smoothing (NPRS)

The first step for each of these methods is to load the functions from the **ape** package:

library(ape)

next, you will want to read in your rooted tree with branch lengths equal or proportional to number of base pair substitutions with the **read.tree** command:

mytree <- read.tree(file="PATH_TO_FILE")

or

mytree <- read.nexus(file="PATH_TO_FILE")

depending on how your tree is formatted. The variable *mytree* is now an object of class *phylo*. This tree can be used with all of the following examples. The first is to transform your branch lengths using nonparametric rate smoothing (NPRS; see Sanderson, 1997). This is achieved by issuing the command:

chronotree <- chronogram(mytree)

This command takes three additional subcommand:

scale-- This assigns a age to the root of the tree.

expo-- This defines the exponent of the exponential function.

minEdgeLength-- Minimum edge length in the phylogram (default value: 1e-06). If any branches in the tree are shorter then this value, then they will be assigned it.

It is then possible to view the tree by passing the **chronogram** argument to the **plot** function:

plot(chronogram(mytree))

Likewise, you can save the tree to file by passing the **chronogram** argument to the **write.tree** function:

write.tree(chronogram(mytree), file="/Users/cbell/tree")

where "/Users/cbell/tree" is the path to where I want the file to be saved.

## How do I estimate divergence times using penalized likelihood (PL)

This next function estimates the node ages of a tree using a semi-parametric method based on penalized likelihood (Sanderson 2002).

chronopl(phy, lambda, node.age = 1, node = "root", CV = FALSE)

The branch lengths of the input tree are interpreted as (mean) numbers of substitutions where 'phy' is an object of class "phylo." lambda equals a value of the smoothing parameter; node.age is a numeric values specifying the fixed node ages; 'node' is the numbers of the nodes whose ages are given by node.age; "root" is a short-cut for the number of the node; and 'CV' is whether to perform cross-validation (see Sanderson, 2002). One thing to keep in mind, however, is that the likelihood function is calculated based on a Poisson approximation using the number of substitutions observed to perform the calculations. This is not a problem if parsimony was used to calculate branch lengths, where the branch length represents the inferred number of changes that occurred along any given branch. If maximum likelihood was used to infer branch lengths, the values are in expected substitutions per site. In the program r8s, the user provides the number of sites used to infer the branch lengths to convert branch lengths to the observed number of substitutions. In ape, there is no such conversion/option. The user my want to convert there user tree branch lengths to observed number of substitutions by issuing the following commands in R:

x <- mytree$edge.length for (i in x) y<-x*7999 mytree$edge.length <-y

In the example above, 'mytree' is the tree with branch lengths (edge.length) I read into R. I then converted the branch lengths by a factor of 7999 (the number of sites used in the original dataset to calculate the branch lengths).

Now your tree is ready for the penalized likelihood analysis. However, you need to determine a lambda value (smoothing parameter). Determining an appropriate 'lambda' value is the crux of the matter. This is where the cross-validation procedure comes in.

l <- 10^(-1:6) cv <- numeric(length(l)) for (i in 1:length(l)) cv[i] <- sum(attr(chronopl(mammals, lambda = l[i], CV=TRUE), "D2")) plot(l, cv)

## How do I estimate divergence times using mean path length (MPL)

Another method available in **ape** is the mean path length method of Britton et al. (2002, 2007). This is achieved by issuing the command:

chronoMPL(phy)

This function has two sub commands:

se-- a logical specifying whether to compute the standard-errors of the node ages (TRUE by default).

test--a logical specifying whether to test the molecular clock at each node (TRUE by default).

The tests performed if test = TRUE is a comparison of the MPL of the two subtrees originating from a node; the null hypothesis is that the rate of substitution was the same in both subtrees (Britton et al. 2002). The test statistic follows, under the null hypothesis, a standard normal distribution. The returned P-value is the probability of observing a greater absolute value (i.e., a two-sided test). No correction for multiple testing is applied: this is left to the user.

## References

Britton, T., Oxelmanb, B., Vinnerstenb, A. and Bremer, K. (2002) Phylogenetic dating with confidence intervals using mean path lengths. *Molecular Phylogenetics and Evolution* 24(1), 58-65.

Britton, T., Anderson, C. L., Jacquet, D., Lundqvist, S. & Bremer, K. ( 2007) Estimating Divergence Times in Large Phylogenetic Trees. *Systematic Biology* 56(5), 741-752.

Sanderson, M. J. (1997) Sanderson, M. J., Thorne, J.L., Wikström, n. and Bremer, K. (1997) Molecular evidence on plant divergence times. *American Journal of Biology* 91, 1656-1665.

Sanderson, M. J. (2002) Estimating Absolute Rates of Molecular Evolution and Divergence Times. A Penalized Likelihood Approach. *Molecular Biology and Approach* 19, 101-109.

Credit: Some of the information on this page is paraphrased from the book *Analysis of Phylogenetics and Evolution with R" (Paradis, 2006) or from documentation within ape itself.*