Scalable Bayesian divergence time estimation with ratio transformations
Ji, X.; Fisher, A.A.; Su, S.; Thorne, J.L.; Potter, B.; Lemey, P.; Baele, G.; Suchard, M.A. (2023). Scalable Bayesian divergence time estimation with ratio transformations. Syst. Biol. 72(5): 1136-1153. https://dx.doi.org/10.1093/sysbio/syad039
In: Systematic Biology. Oxford University Press: Washington, D.C.. ISSN 1063-5157; e-ISSN 1076-836X, more
| |
Author keywords |
Bayesian inference; divergence time estimation; effective sample size; Hamiltonian Monte Carlo; pathogens; phylogenetics; ratio transformation |
Authors | | Top |
- Ji, X.
- Fisher, A.A.
- Su, S.
- Thorne, J.L.
|
- Potter, B., more
- Lemey, P., more
- Baele, G., more
- Suchard, M.A.
|
|
Abstract |
Divergence time estimation is crucial to provide temporal signals for dating biologically important events from species divergence to viral transmissions in space and time. With the advent of high-throughput sequencing, recent Bayesian phylogenetic studies have analyzed hundreds to thousands of sequences. Such large-scale analyses challenge divergence time reconstruction by requiring inference on highly correlated internal node heights that often become computationally infeasible. To overcome this limitation, we explore a ratio transformation that maps the original N -1 internal node heights into a space of one height parameter and N-2 ratio parameters. To make the analyses scalable, we develop a collection of linear-time algorithms to compute the gradient and Jacobian-associated terms of the log-likelihood with respect to these ratios. We then apply Hamiltonian Monte Carlo sampling with the ratio transform in a Bayesian framework to learn the divergence times in 4 pathogenic viruses (West Nile virus, rabies virus, Lassa virus, and Ebola virus) and the coralline red algae. Our method both resolves a mixing issue in the West Nile virus example and improves inference efficiency by at least 5-fold for the Lassa and rabies virus examples as well as for the algae example. Our method now also makes it computationally feasible to incorporate mixed-effects molecular clock models for the Ebola virus example, confirms the findings from the original study, and reveals clearer multimodal distributions of the divergence times of some clades of interest. [Bayesian inference; divergence time estimation; effective sample size; Hamiltonian Monte Carlo; pathogens; phylogenetics; ratio transformation.] |
|