Abstract
Inference about population history from DNA sequence data has become increasingly popular. For human populations, questions about whether a population has been expanding and when expansion began are often the focus of attention. For viral populations, questions about the epidemiological history of a virus, e.g., HIV-1 and Hepatitis C, are often of interest. In this paper I address the following question: Can population history be accurately inferred from single locus DNA data? An idealised world is considered in which the tree relating a sample of n non-recombining and selectively neutral DNA sequences is observed, rather than just the sequences themselves. This approach provides an upper limit to the information that possibly can be extracted from a sample. It is shown, based on Kingman's (1982a) coalescent process, that consistent estimation of parameters describing population history (e.g., a growth rate) cannot be achieved for increasing sample size, n. This is worse than often found for estimators of genetic parameters, e.g., the mutation rate typically converges at rate √log(n) under the assumption that all historical mutations can be observed in the sample. In addition, various results for the distribution of maximum likelihood estimators are presented.
Original language | English |
---|---|
Journal | Journal of Mathematical Biology |
Volume | 46 |
Issue number | 3 |
Pages (from-to) | 241-264 |
Number of pages | 24 |
ISSN | 0303-6812 |
DOIs | |
Publication status | Published - 1 Mar 2003 |
Externally published | Yes |
Keywords
- Coalescent process
- Genealogy
- Maximum likelihood inference
- Population history