Modeling Discrete Character Evolution in R
The evolution of a discrete character along a branch of a phylogeny can be model as a Markovian process, where the probability of moving the current state to a different state is governed by a rate matrix. We can learn about discrete character evolution by calculating and comparing the likelihoods under models which make different assumptions about this matrix. For example, we might compare a Jukes-Cantor model (where the probability of going from state 0 to state 1 is equal to the probability in the reverse direction) to an asymetrical model where the two rates are allowed to differ. If we found the latter model was a significantly better fit, we might conclude that the trait in question evolves in a directional fashion.
Estimating transition rates and calculating the likelihood of different models for discrete character evolution can be accomplished with both the geiger and the ape packages in R.
Estimating a Jukes-Cantor model with the geiger package (load the package before you begin)
We will use the Geospiza dataset for this example. Geiger requires that the input data be in the form of a factor, a data structure similar to a vector but with "levels", i.e. values that the factor can take. In these case of discrete data, these are likely to be numerically-coded character states (0, 1, 2..).
So first let's create a factor with character states for "Geospiza". The dataset has 14 species. We can see a list of the species with the following commands:
data(geospiza) attach(geospiza) geospiza.tree$tip.label
The data command loads the data set, and the attach command allows us to access objects within the data set, such as the tree and the morphological data. The last command will print out the tip.labels for the taxa in the geospiza.tree, so you should now see a list of 14 taxa.
Now we can assign states for our discrete character to these 14 species with the commands:
x<-c(1,1,1,1,1,0,1,0,0,1,0,1,1,1) char1<-as.factor(x) names(char1)<-geospiza.tree$tip.label char1
The first line binds the 14 character states into a single vector called x. Then we ask R to encode this vector as a factor called char1, and we assign the names (tip labels) from the tree to these 14 values. So now when you type the last command, char1, R will show you the content of that object, a list of the species with their values (0's and 1's). You can see that char1 is a factor because it has two levels, 0 and 1.
Now you can fit a Jukes-Cantor model to this data and the tree by typing:
Geiger will return to you the log likelihood (lnL) for this model and the estimated transition rate (q).