Abstract
In this article we discuss the ancestry of sequences sampled from the coalescent with recombination with constant population size 2N. We have studied a number of variables based on simulations of sample histories, and some analytical results are derived. Consider the leftmost nucleotide in the sequences. We show that the number of nucleotides sharing a most recent common ancestor (MRCA) with the leftmost nucleotide is ≃log(1 + 4N Lr)/4Nr when two sequences are compared, where L denotes sequence length in nucleotides, and r the recombination rate between any two neighboring nucleotides per generation. For larger samples, the number of nucleotides sharing MRCA with the leftmost nucleotide decreases and becomes almost independent of 4N Lr. Further, we show that a segment of the sequences sharing a MRCA consists in mean of 3/8Nr nucleotides, when two sequences are compared, and that this decreases toward 1/4Nr nucleotides when the whole population is sampled. A measure of the correlation between the genealogies of two nucleotides on two sequences is introduced. We show analytically that even when the nucleotides are separated by a large genetic distance, but share MRCA, the genealogies will show only little correlation. This is surprising, because the time until the two nucleotides shared MRCA is reciprocal to the genetic distance. Using simulations, the mean time until all positions in the sample have found a MRCA increases logarithmically with increasing sequence length and is considerably lower than a theoretically predicted upper bound. On the basis of simulations, it turns out that important properties of the coalescent with recombinations of the whole population are reflected in the properties of a sample of low size.
Originalsprog | Engelsk |
---|---|
Tidsskrift | Genetics |
Vol/bind | 151 |
Udgave nummer | 3 |
Sider (fra-til) | 1217-1228 |
Antal sider | 12 |
ISSN | 0016-6731 |
Status | Udgivet - 1 mar. 1999 |
Udgivet eksternt | Ja |