Working with Probability Distributions
In this post we will be exploring the idea of Discrete and Continuous probability distributions. One of the best ways to really understand and idea in mathematics is to interrogate it! Finding out what different distributions can and cannot tell us helps us to understand how to use them and leads to some interesting holes in our knowledge.
Discrete Probability Distributions
When most people start learning about probability the first type of distribution they encounter is the kind of distribution you get when flipping coins. If you flip 30 coins you can have anywhere from 0 to 30 heads, but you can't get 1.5 heads. The distribution that describes the number of heads after 30 tosses is a Discrete Probability Distribution. In particular the distribution just described is the Binomial Distribution. Given that we have \(n\) trials, and we want \(k\) heads, and the probability of getting a head is \(p\), we calculate the probability as:
$$\binom{n}{k} p^k(1-p)^{n-k}$$
The \(\binom{n}{k})\) tells us "How many ways are there to get \(k\) heads?" and the \(p^k(1-p)^{n-k}\) answers the question "how likely is any given \(k\) heads and \(k-n\) tails?", multiply them together and we get the probability of getting exactly \(k\) heads. When we use the Binomial distribution we know \(p\) and \(n\) and the distribution describes the probabilities for all the \(k\)s we could have. Our fair coin example looks like this:
Because this is a discrete probability distribution we refer to the function defining it as the Probability Mass Function. The real thing we want to know is: What kind of questions can our Probability Mass Function answer? Since our distribution needs to know \(p\) and \(n\) we'll assume \(p = 0.5\) and \(n = 30 \) as we did for our example.
Questions we can ask probability distributions
We know that we can ask questions like "What is the probability of getting exactly 5 heads?"
$$P(\text{heads} = 10) = \binom{30}{10} 0.5^{10}(1-0.5)^{30-10} = 0.028$$
Can we also ask about ranges of values? "What is the probability of getting 18 to 24 heads?"
$$P(\text{heads} \geq 18 \text{ and } \text{heads} \leq 24) = \sum_{k=18}^{24} \binom{30}{k} 0.5^k(1-0.5)^{30-k} = 0.181$$
We can even slice up our distribution and ask something like "What is the probability that the number of heads ends in 3?"
$$P(\text{number of heads ends in 3}) = P(\text{heads = 3}) + P(\text{heads = 13}) + P(\text{heads = 23}) = 0.113$$
The most obvious question that we can't ask is "What is the probability of getting 1.5 heads?"
It is tempting to say that the probability of such an event is 0. If we think about it carefully, the better answer is "That question doesn't even make sense, you can't get 1.5 heads!" The question of 1.5 heads is similar to asking what the probability of rolling a six-sided die and getting a 7 or a picture of a cow, it is a nonsensical question.
Continuous probability distributions
For our Discrete Probability Distribution, we knew the number of tosses \(n\) and the probability of getting heads \(p\) and wanted to describe the possible number of heads \(k\). But suppose we have a coin, and we don't know if it is fair? We want to be able to flip a coin repeatedly and make a good guess. That is we know \(n\) and \(k\) but not \(p\)? After 30 tosses, we have 11 heads, what's our best guess for \(p\)? One thing we could do is look at the Binominal distribution. If we assume that the \(p\) for the coin is 0.5, then the Binomial distribution says that there's only a 0.05 chance of getting 11 heads in 30 tosses. Given a Binomial Distribution where \(p = 0.5\) this particular outcome doesn't sound super likely. What we can do next is look at a bunch of different Binomial Distributions defined by different \(p\)s and compare how well each of them explains the data. If \(p = 0.3\) then the probability of getting exactly 10 heads in 30 tosses is 0.11! Because the data we observed is more likely to occur, a distribution defined by \(p = 0.3\) explains the data better than one defined by \(p=0.5\). Maybe we can do even better, let's plot out the likelihoods for a bunch different values for \(p\).
Now we can see that \(p=0.4\) is the most likely! But it's only the most likely for this small subset of possible values for \(p\)! This time we'll increment by 0.01 rather than 0.1.
Now we can see that the peak is actually a little less than 0.4. If we had not been so focused on probability distributions, our first guess for the most likely \(p\) would have been \(\frac{11}{30} = 0.366\bar{6} \). We could continue to break our increments down smaller and smaller but we'll never arrive at a distribution that contains \(\frac{11}{30}\) since it's a repeating decimal! So we need a distribution that is truly continous to model even all the rational numbers that we might be interested in. There's another issue with our distribution which is less obvious: our discrete probabilities don't add up to 1! This just means that we need some sort of normalizing constant to force our values to sum up correctly.
The distribution which solves this problem is the Beta Distribution and it is our example of a Continous Probability Distribution. The Beta Distribution solves a very similar problem to the Binomial Distribution only for the case where we know \(n\) and \(k\) but not \(p\). The parameters for the Beta Distribution are slightly different, they are: \(\alpha\) which is the same as \(k\) and \(\beta\) which is actually \(n-k\). The Beta Distribution is defined as:$$Beta(\alpha,\beta) = \frac{x^{\alpha - 1}(1-x)^{\beta -1}}{B(\alpha ,\beta)}$$ where \(B(\alpha, \beta)\) is the Beta function which computes the normalizing constant we'll need. We refer to this function as a Probability Density Function rather than the Probability Mass Function. \(Beta(11,19)\) looks as we would expect it based on our discrete approximations:
Questions for Continuous Probability Distributions
Now let's ask questions of our new distribution. We'll try to mirror our questions from before and see where that leads us.
For starters "What is the probability that \(p\) is exactly 0.4?"
While we can see that the distribution is very dense near \(p = 0.4\) the answer to this questions is actually 0. Intuitively this makes sense: it is very likely that the true probability of heads is around 0.4, but there's pretty much no way it is exactly 0.4 and not 0.400000001 or 0.399998.
And mathematically it makes sense as well. If we think of simply dividing our discrete probability distribution into to smaller and smaller pieces (for example we initially divided 0-1 into sections of size 0.1) each x can be seen as taking up \(\frac{1}{n}\) space which shrinks as the \(n\) grows. The probably of any given \(x\) is \(f(x)\cdot \frac{1}{n}\) where \(f\) is just our probability density function. As we approach a continuous function \(n \to \infty\) and $$\lim_{n \to \infty} = 0$$ so no matter how big \(f(x)\) gets it's always going to have a probability of 0.
The next question is "What is the probability that \(p\) is between 0.2 and 0.5?"
This is what we use Continuous Probability Distributions for all the time. The probability of a range of values is simply the definite integral of our beta function:$$\int_{0.2}^{0.5} Beta(11,19) = 0.91$$
And now for "What is the probability that \(p\) ends in 3?"
This is an interesting problem. We have to count up all numbers the like 0.3, 0.333, 0.13, 0.1333. That's quite a bit, we know it is a subset of the rational numbers (since they all terminate they can't be irrational but \(\frac{1}{3}\) is not included since its decimal representation is non-terminating), so its countably infinite. This might seem like a big number until we realize that the probability of any given number in this set is 0! A countably infinite sum of 0s is still 0.
So the probability of a number ending in 3 is 0? That sounds odd. By this logic, we can also conclude that the probability for all terminating decimals is 0! Since the rational numbers are countably infinite then we can also see that the probability for any rational number is 0! And since our probabilities must sum to 1 does this mean that the probability of all the irrational numbers is 1? This last bit seems to be the most confusing: that the probability of all the irrational numbers is 1, but the probability of any rational number is 0.
This sounds absurd! But is it? Maybe some of these questions are just like asking the probability of 1.5 heads, a nonsense question? The answer is at this point in our exploration we simply don't have the tools to know!
A suggestion of Measure Theory
It is often the case when studying math that what you are studying is kept in a carefully controlled box. When learning calculus it is not usually mentioned that the set of nowhere differentiable functions is larger than the ones we can differentiate. When learning Discrete and then Continuous probability distributions students are often carefully steered away from asking these simple questions that lead to really interesting problems. There is even a 3rd type of Probability distribution we haven't touched on!
The good news is that math does have an answer to these questions. It involves the subject of Measure Theory. Measure Theory is often left to hard textbooks and graduate level courses on rigorous probability. It is my personal belief that this need not be the case. Measure Theory, as applied to Probability Theory, can be viewed as a formalization of the question "What kind of questions can I ask?"
This post is one of a few I plan that show how even at the elementary level, if you ask the right questions, you bump into surprisingly hard problems.