Probabilistic Models of Evidence

In what follows, we’ll consider three approaches to representing evidential relationships between propositions: the standard Bayesian definition of evidence, and two versions of another probabilistic representation of evidence called the likelihood principle. All of these approaches employ probabilistic models—that is, they use the mathematics of probability to represent how one proposition can provide evidence for another. Each model has its own advantages in different applications, so it is worthwhile to study all three approaches.

Approach 1: The Bayesian Definition of Evidence

As explained in the previous chapter, Bayesian confirmation theory defines confirmation in terms of the conditionalization rule, which says that upon learning E, your unconditional credence in H should be updated to match your prior conditional credence in H given E. If conditionalizing on E increases your credence in H, we say that E confirms H. The word ‘confirm’ in this context just means that E provides evidence for H. Thus, we already have a Bayesian definition for the concept of evidence:

The Bayesian Definition of Evidence: A person regards E as evidence for hypothesis H if and only if she thought, prior to learning E, that H is more likely to be true given the assumption that E is true: pr(H|E) > pr(H).

All of the probabilities shown on this page are prior credences. To simplify notation, we’ll drop the subscript ‘1’ from the probability functions. For instance, instead of writing ‘pr1(H)’ for the prior probability of H, we’ll just write ‘pr(H).’

In other words, you regard some fact E as evidence for H if and only if learning E raises your credence in H. (Advocates of Bayesianism sometimes express this idea with the slogan “evidence is probability raising.”) Equivalently, we can say that you regard E as evidence for H if and only if the Bayesian multiplier, pr(E|H)/pr(E), is greater than 1. Remember, the Bayesian multiplier is the factor by which your credence in hypothesis H increases or decreases when you learn E. So, if the Bayesian multiplier is greater than 1, your credence in H will increase.

Notice that the Bayesian multiplier is greater than 1 if and only if pr(E|H) > pr(E). Moreover, this inequality looks almost like the one in the Bayesian definition of evidence, except that E and H have switched places! This means that if E is evidence for H according to the Bayesian definition, then H can also be regarded as evidence for E. The evidential relationship between E and H goes both ways. (However, it is not symmetrical in the sense of being equally strong in both directions: E may provide stronger evidence for H than H does for E, or vice versa. On the next page, we’ll consider how to measure the strength of evidence.)

Approach 2: The Likelihood Principle (contrastive version)

Another model of evidential relationships is called the likelihood principle or the law of likelihood. It provides a helpful way to understand the role of evidence when we want to judge between two competing hypotheses rather than considering only a single hypothesis. The likelihood principle says that evidence E favors whichever hypothesis makes E more likely:

The Likelihood Principle: Evidence E favors hypothesis H1 over H2 if and only if pr(E|H1) > pr(E|H2), provided the two hypotheses are mutually exclusive.

Some philosophers defend a version of the Likelihood Principle without the proviso that the two hypotheses must be mutually exclusive. However, if H1 and H2 are not mutually exclusive—for example, if one hypothesis logically entails the other—then the Likelihood Principle can yield results inconsistent with Bayesianism.

In other words, if E is more likely (more expected, or less surprising) if H1 is true than if H2 is true, then E is evidence in favor of H1 in contrast to H2. Notice that this is a contrastive notion of evidence: the likelihood principle does not indicate whether E is evidence for H1 tout court, but only whether evidence E “favors” hypothesis H1 over an alternative hypothesis H2. To say that E favors H1 over H2 means that the ratio of your credences in these two propositions should shift in favor of H1. That is, the ratio pr(H1)/pr(H2) should increase upon learning E. This could happen even while both credences decrease! In other words, even if E is evidence against both H1 and H2, it could still favor H1 over H2.

The contrastive likelihood principle is a reformulation of the Bayesian model, not an alternative to Bayesianism,However, as mentioned in the “Fine Print” section above, the Likelihood Principle becomes inconsistent with Bayesianism when hypotheses H1 and H2 aren’t mutually exclusive. Some philosophers have defended this incompatible version of the likelihood principle as an alternative to Bayesianism. See, for example, Elliott Sober, “Bayesianism—its Scope and Limits,” in Richard Swinburne (ed.), Bayes’s Theorem (Oxford: Oxford University Press, 2002), 21-38. but it provides a more illuminating view of the role evidence plays in many situations. We can illustrate the contrastive likelihood principle using Bayes bars:

Let’s consider a scenario similar to the police detective example discussed in the previous chapter. This time, you are investigating the mysterious disappearance of your own car keys. They’re not on the table where you thought you left them. You can think of only three plausible explanations:

M: You’ve misplaced the keys (or misremembered where you left them).

B: The keys were borrowed by a family member.

S: The keys were stolen.

You regard the last hypothesis as less likely than the other two:
Misplaced Borrowed Stolen
Before undertaking a thorough search for the missing keys, you decide to check whether your car is still in the driveway. Assuming you merely misplaced the keys, the car certainly will be there: pr(C|M) = 1, where C is the proposition that the car is in the driveway. So, C is true in the entire M segment (blue part) of the Bayes bar. Assuming a family member borrowed your keys, you’re only 25% confident that the car will be in the driveway: pr(C|B) = ¼, so C is true in a quarter of the B segment (yellow part). And if the keys were stolen, the car very likely will be gone too. So, pr(C|S) is very low—just a tiny sliver (unshaded red part) of the segment where S is true:
prior Bayes bar
Car Car No Car No Car
Strolling to the window, you are relieved to see your car in the driveway. This eliminates the shaded yellow and shaded red segments of the Bayes bar. After renormalization, your posterior credences look like this:
posterior Bayes bar
M B S
Obviously, the car’s presence in the driveway favors the misplaced hypothesis over either of the alternatives. However, it also favors borrowed over stolen: the ratio of pr(B) to pr(S) has increased. The B segment was roughly twice the length of S in your prior Bayes bar, but now it is many times longer than S! Thus, C favors B over S, even though it is evidence against both of those hypotheses. Although your credences in the borrowed and stolen hypotheses both decreased, the car’s presence favors the former over the latter because pr(C|B) > pr(C|S).

Approach 3: The Non-contrastive Likelihood Principle

The contrastive version of the likelihood principle, defined above, does not tell us whether E is evidence for H1 by itself. (In fact, as just explained, it could be evidence against H1 while still favoring H1 over H2.) However, we can get a non-contrastive version of the principle by comparing a hypothesis with its own negation. In other words, we can use H and ~H as the two hypotheses in the likelihood principle. This yields the following special case, which I’ll call the non-contrastive likelihood principle:

Non-contrastive Likelihood Principle: E is evidence for H if and only if pr(E|H) > pr(E|~H).

This version of the likelihood principle, unlike the contrastive version, does tell us whether E is evidence for H. The resulting definition of evidence is consistent with the standard Bayesian definition given above, in the sense that both definitions agree on which propositions count as evidence for which hypotheses. Any proposition that counts as evidence for hypothesis H according to the Bayesian definition will also count as evidence according to the likelihood principle, and vice versa. In fact, the non-contrastive likelihood principle can be derived mathematically from the Bayesian definition of evidence.

Although these two definitions of ‘evidence’ are essentially equivalent, each definition has its own advantages. When applying the standard Bayesian definition of evidence, we have to ask “does the evidence make the hypothesis more likely?” However, to apply the likelihood principle, it’s the other way around: “does the hypothesis make the evidence more likely?” Depending on the situation, one of these questions may be easier to answer than the other.

For an illustration of the non-contrastive likelihood principle using Bayes bars, recall the weather example from the previous chapter. In that illustration, we examined how the evidence of wet walkways could bear on the hypothesis that it recently rained. Without specifying exact probabilities, we assumed that your conditional credence in wet walkways given rain was significantly higher than your credence in wet walkways given no rain:

pr(W|R) > pr(W|~R)

This means that wet walkways are evidence that it recently rained. Conversely, dry walkways are evidence that it didn’t rain, as demonstrated with Bayes bars in that example.