Solving the Classical Monty Hall Problem through Bayes’ Theorem and Causal Inference

Analysis With Anh
6 min readApr 12, 2023

--

The Monty Hall problem, which has become a classic probability puzzle, was originally introduced and solved in a letter by Steve Selvin to the American Statistician in 1975. In this game show scenario, the contestant is faced with three doors, one hiding a car and the other two hiding goats. After choosing door D1, the host, who is aware of the contents of each door, opens door D2 to reveal a goat. The contestant is then given the option to stick with their original choice or switch to door D3. The key question is whether switching or sticking with the original choice provides the contestant with a higher chance of winning the prize.

Although many people may be inclined to believe that the odds of encountering the car behind the remaining 2 doors are evenly split at 50–50, mathematicians suggest opting to switch the door. The reason for this is as follows:

First of all, we define each variable as below:

  • Door: D1, D2, D3. Each of the doors D1, D2, D3 has 2 possible outcomes {Car, Goat}
  • The door that the Gamer chooses: G. There are 3 possible outcomes- The gamer either chooses {D1, D2, D3}
  • The door that the Host opens: H. There are 3 possible outcomes- After the Gamer chooses a door, the Host then either chooses to open {D1, D2, D3}

What we are going to prove is that- given the Gamer choses D1, and Host opens D2, the chance of the car is behind door D1 (the chance of keeping the door) is smaller than the chance of the car is behind door D3 (the chance of switching the door), in other words:

P(D1=Car|G=D1,H=D2) < P(D3=Car|G=D1,H=D2)

Methodology 1: Bayes’ Theorem

Bayes’ Theorem is a fundamental concept in probability theory that provides a systematic approach for updating the probability of a hypothesis based on new evidence. It is named after an 18th-century English statistician and minister, Thomas Bayes:

P(H | E) = P(H) * P(E | H) / P(E) (*1*)

where: P(H | E) is the posterior probability of H given E. P(H) is the prior probability of H. P(E | H) is the likelihood of E given H. P(E) is the marginal probability of E

Bayes’ Theorem can be extended to three events, where we are interested in the probability of a hypothesis H given evidence E1 and E2. In this case, the theorem can be expressed as:

P(H | E1, E2) = P(H) * P(E1 | H) * P(E2 | H, E1) / P(E1, E2) (*2*)

where: P(H | E1, E2) is the posterior probability of H given E1 and E2 P(H) is the prior probability of H P(E1 | H) is the likelihood of E1 given H P(E2 | H, E1) is the likelihood of E2 given H and E1 P(E1, E2) is the joint probability of E1 and E2

This formula can be further extended to include additional events, by adding more terms to the numerator that represent the likelihood of each event given the previous events and the hypothesis.

Therefore, we can apply Bayes’ Theorem for 3 events (*2*) in Monty Hall problem as below:

From (*2*) we have:

P(D1=Car|H=D2,G=D1) = P(H=D2|D1=Car,G=D1) * P(D1=Car|G=D1)/ P(H=D2|G=D1)

Now, calculate each element of the right side of the equation above:

  • P(H=D2|D1=Car,G=D1) = 1/2

Explanation: If the Host knows that the Gamer already chose the Door with the Car, then it does not matter which Door (D2 or D3) they will open, as they are all Goat. Thus, the probability that the Host open D2 in this situation is 1/2. (P(H=D2|D1=Car,G=D1) = P(H=D3|D1=Car,G=D1) =1/2)

  • P(D1=Car|G=D1) = 1/3

Explanation: What is behind each Door is pre-determined and independent of the decision of the player favoring to open which door first. Therefore P(D1=Car|G=D1) = P(D1=Car) =1/3.

  • P(H=D2|G=D1) =

P(H=D2|G=D1,D1=Car) * P(D1=Car)

P(H=D2|G=D1,D2=Car) * P(D2=Car)

P(H=D2|G=D1,D3=Car) * P(D3=Car)

= ½ * ⅓ + 0 * ⅓ + 1 * ⅓ = 1/2

where:

  • P(H=D2|G=D1,D1=Car) = 1/2
  • P(H=D2|G=D1,D2=Car) = 0 (The Host will never open the Door with Car)
  • P(H=D2|G=D1,D3=Car) = 1 (The Host has to open Door 3 since opening Door 2 will reveal the Car)
  • P(D1=Car) = P(D2=Car) = P(D3=Car) = 1/3

Therefore P(D1=Car|H=D2,G=D1) = (1/2 * 1/3)/ 1/2 = 1/3

Similarly, apply the same method as above, we find out that P(D3=Car|G=D1,H=D2) = P(H=D2|D3=Car,G=D1) * P(D3=Car|G=D1)/ P(H=D2|G=D1) = ⅔

Hence, P(D1=Car|G=D1,H=D2) < P(D3=Car|G=D1,H=D2)

Or in other words, switching the door doubles the chance of getting the car.

Methodology 2: Causal Inference

Causal Inference is a field of statistics that aims to identify the causal relationships between variables in a given system or phenomenon. It involves analyzing data to determine the extent to which one variable influences or causes changes in another variable. Three types of structures that are commonly used in Causal Inference to identify causal relationships between variables are Collider, Fork, and Chain.

A Fork is a structure where two variables have a common effect, and conditioning on the effect can reveal a causal relationship between the variables.

A Chain is a structure where one variable causally affects another, which then causally affects a third variable. The effect of the first variable on the third variable can be observed by conditioning on the intermediate variable.

A Collider is a variable that is affected by two or more variables, which are called its causes, and its effect can be observed through the correlation between the causes. Unlike in Fork and Chain structure, conditioning on a collider variable opens a path between its causes, which can create a spurious association between them. This can lead to biased estimates of the causal effect if the spurious association is not taken into account.

To solve the Monty Hall problem using Causal Inference, we need to understand what variables happen first, what variables happen after, which variables are the cause and which variables are the result:

Following that framework, we have 3 variables so 3 relationship pairs are in consideration: D&G, H&D, H&G

The contestant’s choice does not cause the prize to be behind a particular door, as the prize is randomly assigned before the contestant makes their choice. Instead, the causal relationship is between the Host’s choice of which Door to open and the prize.

Graph 1: The relationship is correctly described in the right side of the picture. There is no relationship between D & G since the contestant’s choice does not cause the prize to be behind a particular door, as the prize is randomly assigned before the contestant makes their choice

Next, the Host does not decide which items behind the doors. In contrast, what behind the doors determines the Host’s action as he has to open the Door with the Goat. Hence, the Door causes the action of the Host.

Graph 2: The relationship is correctly described in the right side of the picture. The Host’s selection of Door is influenced by the fact that he already knew what behind it.

Lastly, the Host’s response is impacted by the door selected by the Gamer since he is restricted from revealing the Car’s location and is forced to reveal a goat.

Graph 3: The relationship is correctly described in the right side of the picture. The Host’s response is impacted by the door selected by the Gamer since he is restricted from revealing the Car’s location and is forced to reveal a goat.

By merging the three diagrams above, the situation can be depicted as D→H←G. When the Host reveals a goat, we are conditioning on H, which establishes a correlation between D and G. Hence, G is no longer independent of D and cannot be attributed to random chance.

Graph 4: Monty Hall problem can be visualized as the Collider. Unlike in Fork and Chain structure, conditioning on a collider variable opens a path between its causes, which can create a spurious association between them.

To put it differently, the cause-and-effect connection between the prize and the host’s decision to open a particular door implies that switching the contestant’s choice will be advantageous if the host has to reveal a door without the prize. By comprehending this causal relationship, we can employ causal inference to solve the Monty Hall problem and identify the most effective approach for the contestant.

Conclusion

In conclusion, both Bayes’ Theorem and Causal Inference have demonstrated that changing the door is advantageous for the participant. If you happen to be on a game show in the future, you now know what course of action to take.

--

--

Analysis With Anh
Analysis With Anh

Written by Analysis With Anh

I help analyze the data, show stories behind, and provide solutions for the problems that those stories shown.