*e
ШФАКУЛЬТЕТСКАЯ ЛАБОРАТОРИЯ СТАТИСТИЧЕСКИХ МЕТОДОВ
М. Б. МАЛЮТОВ, В. П. ПАСЕКОВ
РЕКОНСТРУКЦИЯ РОДОСЛОВНЫХ ДЕРЕВЬЕВ
ИЗОЛИРОВАННЫХ ПОПУЛЯЦИЙ
Препринт * 19
М. В. MALYUTOV, V. P. PASSEKOV
ON THE RECONSTRUCTION OP THE GENEALOGICAL
TREE OP ISOLATED POPULATIONS
Preprint N 19
Издательство Мооковсхого университета
1971
SUMMARY. I. Let us denote p^, t = /r. . , О % the frequencies of
various types (alleles) Ax of the gene Л • Under the
hypothesis of random mating the genotype of any progeny
is determined by the random sample of two genes from the
practically infinite assembly of alleles Л. with
probabilities p. • As a result the gene frequencies p!<; of
allele in the population of ^ progeny have multinomial
distribution
pf*-* •'■'••• ^Л&К1
If we restrict ourselves with the model of non-overlapping
generations with the constant size N of the population,
th9n the gene frequencies vector is «supposed to change as a
Markov chain with the transition probabilities (I) (in real
calculations one takes the so called effective size N
instead of tne total population number1. The examples of its
evaluation see in [2Ц ] ). This Markov chain is called "the genetic drift"• It is
also the name of the diffusion process pit) of the gene
frequencies^ evolution. The probability density of it at tne
moment t in the point p changes according to the
forward equation of A. lLKoJmoflurov:
(the time is measured in number of generations)* It is known
that in diminished scale of time T- t(2N)'* the continual*
4
process well approximates the discrete one. theorem I. If t/v""' —* 0 then the distribution of
the random variable
a t.
tends to % with Q -i degrees of freedom ( p {0) * 0,£. р^1,
The essential point is that the distribution of 6*
asymptotically does not depend on the initial frequencies p^O)
The main idea of the proof is the transformation хк= ф^7, K = 4,... ,q
which transforms genetic drift to a spherical brownian motion
with some trend which is negligible when tN —» 0 • Then we
approximate Bpherical brownian motion by a brownian motion on
the tangent plane to the sphere Z. X* - i at the initial
point of the process. Let the population is splitted into two isolated
populations. Let us assume, the effective sizes of the
populations remained equal to N< and fVa respectively. We know
at present the frequencies pi(( and QiK of the i-th attele
of k-th from the independently inherited locusesf к = i f... , rn,
i = {,... , QK . Let &H^^cco$Z)/p^u and V^Ifi/,
The immediate consequence of the theorem I is
Theorem 2. The distribution of the random variable ,Г^ н^/ц/Т*
5
tends to the distribution of X* ( where the number
of degrees of freedom U П) a S £/\T -* 0. Thus -^ y^fofj H *s a consistent estimate of t when
thl~ is small, ft —► «*=> according to the law of large
numbers.
3. Let we are given the contemporary frequencies of
alleles in some populations as well as the effective sizes of
contemporary and earlier populations. For estimating the
genealogical tree we reconstruct the nodes of the tree successively
according to their remoteness from the present time. Let the
estimates of the nearest nodes are known. We take as the
descendant of the following split the pair of the node of the
reconstructed tree for which
кт к
«* *«--
is minimal. Here tK , t m (for definiteness tK > tm)
coordinates
are the estimates of time~Vxor the nodes ,the variance matrices
of the vectors x.