Statistics and Algebra. An Example.

Written by : Matt

There is a developing field called algebraic statistics which explores probability and statistics problems involving discrete random variables using methods coming from commutative algebra and algebraic geometry. The basic point is that the parameters for such statistical models are often constrained by polynomial relationships – and these are exactly the subject of commutative algebra and algebraic geometry. I would like to learn something more about this relationship, so in this post I’ll describe one example that I worked through – it comes from a book on the subject written by Bernd Sturmfels. Disclaimer : the rest of this post is technical.

Suppose that you have three independent exponential random variables X_1, X_2, X_3 with parameters \lambda_1, \lambda_2, \lambda_3 respectively. Recall that this means X_i has the density function

f(t) = \lambda_i e^{ -\lambda_i t} when t > 0 and 0 otherwise.

Such a random variable is often interpreted in terms of waiting time to first occurrence for a Poisson process.

Suppose that rather than observing the time of occurrence X_i directly, we only get to observe whether X_1 and X_2 occur before X_3 or not. (I’ll leave you to think about how this could come up in practice, but it seems like it could). So what we observe is a discrete random variable Y with four possible states (0,0), (1,0), (0,1), (1,1) indicating whether or not variable i occurs before variable three. Now for an exercise in working with probability distributions (something the students in the class I taught could hopefully do!):

P(Y = (0,0)) = \frac{\lambda_3}{\lambda_1 + \lambda_2 + \lambda_3}
P(Y = (1,0)) = \frac{\lambda_1}{\lambda_1 + \lambda_2 + \lambda_3} \cdot \frac{\lambda_3}{\lambda_2 + \lambda_3}
P(Y = (0,1)) = \frac{\lambda_2}{\lambda_1 + \lambda_2 + \lambda_3} \cdot \frac{\lambda_3}{\lambda_1 + \lambda_3}
P(Y = (1,1)) = \frac{\lambda_1}{\lambda_1 + \lambda_3} \cdot \frac{\lambda_2}{\lambda_2 + \lambda_3} \cdot \frac{\lambda_1 + \lambda_2 + 2\lambda_3}{\lambda_1 + \lambda_2 + \lambda_3}

Great. Now suppose that we sample the variable Y some large number of times, and record the number of times (a, b, c, d) that Y takes on the four values above. Given these observances, we would like to predict the three \lambda values. One way of doing this is, given a choice of values for the \lambdas, to compute P(a,b,c,d | \lambda_i). Then we choose the \lambda values to maximize this probability (this is known as the method of maximum likelihood estimation). Rather than maximize this function, we’ll maximize the log of it (which amounts to the same thing) to take advantage of the multiplicative structure of the function. You can compute this function:

p(\lambda) = (b + d) \log (\lambda_1) + (c + d) \log (\lambda_2) + (a + b + d) \log(\lambda_3) + d \log(\lambda_1 + \lambda_2 + 2\lambda_3) - (c + d) \log(\lambda 1 + \lambda_3) - (b + d) \log(\lambda_2 + \lambda_3) - (a + b + c + d) \log(\lambda_1 + \lambda_2 + \lambda_3).

Since the probability function involves equations of degree 0 in the \lambda_i, there’s no harm in setting \lambda_3 = 1 and solving for \lambda_1 and \lambda_2. Any other set of solutions will be the same as these up to scaling all the \lambda values. After doing this, the maximizing equations (i.e. the gradient components of p) are

\frac{b + d}{\lambda_1} + \frac{d}{\lambda_1 + \lambda_2 + 2} - \frac{c + d}{\lambda_1 + 1} - \frac{a + b + c + d}{\lambda_1 + \lambda_2 + 1} = 0.
\frac{c + d}{\lambda_2} + \frac{d}{\lambda_1 + \lambda_2 + 2} - \frac{b + d}{\lambda_2 + 1} - \frac{a + b + c + d}{\lambda_1 + \lambda_2 + 1} = 0.

Things are starting to look more like polynomials! If someone asked you to solve this, you would probably try to clear denominators, use one equation to solve for \lambda_1 and plug it into the second equation to get one equation with one unknown. The problem is that when you clear denominators, you will introduce extra solutions into your equation. That is, you will find solutions to your new equations which are not solutions to the equations you are trying to solve. In this example, you will find (0, -1) is a solution to the problem with cleared denominators, but not to the original equation.

How do you know which solutions are extra and which are not? In commutative algebra, there is exactly a process for this, and it’s called the saturation of an ideal. The ideal we must use in our example is generated by the polynomials consisting of the equations above with cleared denominators, and the common denominator of all terms appearing in the two equations. (Our ideal is an ideal in the ring \mathbb{Q}(a,b,c,d)[\lambda_1, \lambda_2]).

Saturation of an ideal is a process which is cumbersome to carry out by hand for all but the simplest examples, but has been implemented by computer algebra packages like Macaulay, Sage, and Singular. In general it’s very computationally intensive, so it may be slow for ‘large’ ideals. One strategy that will speed it up in this example is to use actual numbers in place of (a,b,c,d). For example, when (a,b,c,d) = (1,2,3,5) then

\lambda_1 = (5\lambda_2^2 + 35\lambda_2 - 16) / 56 and \lambda_2 is given by a solution to
5\lambda_2^3 + 75\lambda_2^2 - 128\lambda_2 - 128 = 0

Then I calculate that \lambda_2 = 2.17559 (there is only one positive root) and \lambda_1 = 1.49664 (full disclosure: used Mathematica). In general the result of the saturation will allow you to solve for \lambda_1 in terms of \lambda_2 and then you will have that \lambda_2 is given by the zeros of some polynomial which is usually of degree 3:

(a + 2b)(c - d)\lambda_2^3 + (ab + b^2 - 3ac - 6bc + c^2 - 2ad - 5bd + cd) \lambda_2^2 + (2a + 3b - 3c - 3d)(c + d)\lambda_2 + 2(c + d)^2 = 0


2 thoughts on “Statistics and Algebra. An Example.

  1. GT says:

    It seems like computers can already work this out. Why bother with the analytic solution?

  2. Matt DeLand says:

    GT – are you referring to this specific example, or to the whole field of algebraic statistics? I’m not sure what your point is. This was an example to show how commutative algebra might be used effectively to solve problems in the realm of probability.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: