An EM Exercise
The Expectation-Maximization (EM) algorithm is a popular method to obtain the Maximum Likelihood Estimate (MLE) when some of the data may be missing. See [Roche, 2012] for a nice tutorial on EM. The following problem is a nice exercise in working out the algorithm details to reinforce the concepts.
Let be i.i.d. from a normal distribution with unknown mean and variance . Suppose that negative values of are truncated at , so that instead of , we actually observe
from which we would like to estimate . By reordering, assume that and . Use the EM algorithm to estimate from .
Let and denote the probability density function (pdf) and the cumulative distribution function (cdf), respectively, of the standard normal distribution . Let denote the unobserved data and denote the observed data. Then the joint pdf of and the complete data likelihood is given by
We may obtain the pdf of the incomplete data, and hence the incomplete data likelihood, by integrating out . Thus,
From the above expression, we may obtain the conditional pdf of given and as
In the E-step, we compute the expected complete data log likelihood given the current estimate of the mean (say ) and the observed data, with the expectation taken over the unobserved data. In the M-step we shall find the value of the mean that maximizes this expected likelihood, denoted by , thereby forming a sequence that converges to the MLE.
Since the expectation in the E-step is used only to compute the value of that maximizes it, we may ignore the terms that do not contain . Thus
To simplify the above expression, use the identity
and a bit of algebra to yield
The estimate of that maximizes the above expression can be computed in multiple ways. One easy method is to notice that the functional form of the log likelihood has the same structure as that of the log likelihood of , with each element of the vector replaced by . Thus, using the expression for the MLE estimate of a normal distribution, we have
This recursive formula allows us to compute the sequence for some initial value .
Let denote the MLE estimate of the incomplete data likelihood. In other words,
Since maximizes the log likelihood, the first derivative of the log likelihood evaluated at is zero. Writing this down and rearranging the terms, we see that satisfies the equation
We can use the above expression to verify that the MLE estimate is a fixed point of the recursion obtain in our M-step. Specifically, substituting in the aforementioned recursion, we get
which proves that is a fixed point of the recursion.
See [Casella and Berger, 2002] for a proof that that the EM sequence converges in general to the incomplete data MLE. However, it is instructive to analyse the convergence of the EM sequence in this problem.
From the recursion for the EM sequence and the equation for the MLE, we see that
The ratio of the pdf and cdf in the above expression is known as the inverse Mills ratio. Define
The Mills ratio is fairly well studied in the literature. From [Sampord, 1953] we see that the inequality
holds for , the derivative of .
Assume that . Then using the mean value theorem, there exists some such that
This allows us the write the difference between the MLE estimate and the EM sequence as
Thus, provided that at least one of the observations is not truncated, we obtain
which shows that , for any starting point . By a similar analysis it can be shown that
for any starting point . Thus, for any starting point , thereby establishing that the EM sequence converges to the MLE estimate.