Estimating the Mean from Poisson Distributed Count Data
Assume we want to estimate the mean for Poisson count data: is it better to keep all the data, or is the sum and the size of the dataset enough? Let me do the calculations here. I’ll also give the posterior distribution for three different choices of prior.
The Problem
Assume we have a set of size
I am interested in estimating the mean from the sum of all observed counts
We will get back to estimating the mean from the complete set of results, but first let’s look at estimating it from the sum of the results.
Estimating the Mean from the Sum
We are interested in the posterior probability of
All probabilities are conditioned on
The prior probability of
Since this expression also depends on the prior, I will calculate it in the later sections. If we were just interested in the maximum of the posterior distribution we need not calculate the evidence term, because it does not depend on
The Posterior Distribution for the Sum
In the following I will calculate the posterior distribution for
Flat Priors: Maximum Likelihood
A flat prior means that we are sure that the value of
Note that
where I have used
Jeffreys’ Prior
Harold Jeffreys himself recommended
Using some help from Wolfram Alpha we can see that the integral in the denominator is
This is a Gamma distribution, as was the case for the flat prior, but with shape
Conjugate Prior
The conjugate prior for the Poisson distribution4 is a Gamma distribution:
where we use
which is, again, a Gamma distribution but with shape
All Roads Lead to the Gamma
We see that the posterior distribution for the sum is a Gamma distribution for all priors that I considered:
where the shape
For large
Estimating the Mean from the Complete Dataset
Until now I have derived the posterior probabilities from the sum of the i.i.d variables. This means we have reduced our set of results
So we see that the likelihood term for obtaining the set is identical to the likelihood term of obtaining the sum multiplied by a constant that does not depend on
To me this result is somewhat remarkable. It means, for this particular case, that we can reduce the dataset to two numbers
Estimators for The Mean
To estimate the most likely a posteriori mean
So for the flat prior we arrive at
which is the maximum likelihood estimator. For Jeffreys’ prior we get
and for the conjugate prior, which is a
As
Quantifying Our Confidence
Now we know how to estimate the mean from a collection of Poisson distributed random variables. All we need is the sum of the variables and the number of trials. However, we have not looked into credible intervals. Since we always end up with a Gamma distribution there should be information out there. Maybe here is a good place to start. Another time…
Endnotes
-
The astute reader will have noticed that Jeffreys’ prior is meant for a Poisson distribution with mean
and not like we have here. However, transforming these functions would only introduce a constant for this particular prior, if I am not mistaken. The constant would cancel out because it appears in the integral in the denominator as well as the numerator. ↩ -
This is different from the prior that results when applying Jeffreys’ Rule. See here for an explanation of the confusing nomenclature ↩
-
As is the case for the flat prior. Both are called improper priors. ↩
Comments
You can comment on this post using your GitHub account.
Join the discussion for this article on this ticket. Comments appear on this page instantly.