Shared birthdays among party guests
Programming Snapshot – Probability
At a party with 23 guests, having two guests with the same birthday in more than 50 percent of cases may sound fairly unlikely to amateur mathematicians. Armed with statistical methods, party animal Mike Schilli sets out to prove this claim.
The problem depends on the exact wording. Nobody can expect to go to a party with 23 people and meet someone with the same date of birth with 50 percent probability. The unexpected result comes about by the fact that n guests are compared with each other (i.e., each with (n – 1) other guests). It is much more likely that two random guests will be born in the same month and on the same day (the year is not considered) than if you only compare your own birthday with that of (n – 1) guests [1].
Bottom Up
At a party with only two guests, what is the probability of both celebrating their birthday on the same day? Assuming a year to be 365 days for the sake of ease, without taking into account seasonal birth fluctuations or special cases such as twin parties, this occurs in one in 365 cases. Conversely, the probability that both guests have birthdays on different days is 364 in 365.
If another guest joins the pair, the probability that no one in the room is celebrating their birthday on the same day is the coincidence of two independent events: The first event, which we just calculated to occur with the probability 364/365, and a second event, where the added person does not share a birthday with the first or the second person and can thus celebrate a birthday on only 363 of 365 days.
A statistician determines the total probability of different birthdays for three guests by multiplying the probabilities of the two independent single events above, which comes to 364/365 x 363/365, or around 0.991795. The reverse event, namely the case where two or more people celebrate their birthdays on the same day, results in a probability of 1 – 0.991795, or 0.008205.
The sequence continues with guests number four, five, and so on. In each round, the number of remaining days, and thus the numerator of the fraction, is decremented by one; the result of which is multiplied by the probability of the previous round.
Listing 1 [2] shows a lean Python implementation of the calculation. The output (Figure 1) shows that the 50 percent probability of a shared birthday between two guests was exceeded for the 23rd guest, showing a value of 50.73 percent. The script sets the number of days remaining in the calendar to 365 at the beginning and subtracts a value of 1 from it after each round, when a new guest with an unseen birthday arrives. The probability prob
indicates the likelihood of no one in the room sharing their birthday with another person: For a room with only one person, this is obviously 1; for two, it is 0.9973.
Listing 1
birthdayparadox
01 #!/usr/bin/env python3 02 03 dates = 365 04 dates_left = dates 05 prob = 1 06 07 for person in range(1,24): 08 prob=prob*dates_left/dates 09 print("%2d: %.4f" % (person, 1prob)) 10 dates_left = 1
In each iteration of the loop, Listing 1 multiplies the probability of the last round in prob
by the new event's probability value and assigns the result back to prob
. However, the probability of having no shared birthdays is not what we are looking for; instead, we want the opposite – the chance of a collision. Therefore, line 9 indicates the probability of the opposite event, or the chance of one or more people at the party sharing birthdays, or 1prob
.
Simulator
A simulation script (Listing 2) will show whether the formula for the computation was correct; in each round it assigns 23 guests in the guest_bdays
list to a party, assigns each of them a random birthday from a list of 365 integer values, and then decides in the bday_match()
function whether there are integer duplicates in guest_bdays
. The randint()
function from the random
module outputs values between the extremes 1 and 365 (inclusive) for birthdays.
Listing 2
bpsim
01 #!/usr/bin/env python3 02 03 import random 04 05 def bday_match(bdays): 06 seen = set() 07 for bday in bdays: 08 if bday in seen: 09 return True 10 seen.add(bday) 11 return False 12 13 for epoch in range(10): 14 parties = 100000 15 matches = 0 16 nof_days = 365 17 nof_guests = 23 18 19 for party in range(parties+1): 20 guest_bdays=[] 21 for _ in range(nof_guests): 22 bday = random.randint(1,nof_days) 23 guest_bdays.append(bday) 24 25 if bday_match(guest_bdays): 26 matches += 1 27 28 print(matches/parties)
The for
loop starting in line 13 iterates over a total of 10 test runs with 100,000 parties each. For each event showing a birthday pair, it increases the counter in matches
by one. At the end of each run, the script prints the fraction of the number of parties with shared birthdays relative to the total number of parties; Figure 2 shows that the value settles at about 50.7 percent.
The bday_match()
function from line 5 expects a Python list with integers and checks if there are one or more duplicates. This test is efficient because it uses a hash function to squash previously seen values into a seen
set; it can then quickly check whether the value is already in the set for each newly examined value. If you have ever had this task in a recruitment test, you will be aware that the compute time for the duplicate check using this procedure drops to O(n) for n list elements, while it would be O(n^2) for a less clever twoloop solution.
Black on White
How does the probability of a birthday collision develop with an increasing number of guests? Thanks to the matplotlib
Python library, simply installed with
pip3 install user matplotlib
Listing 3 produces the graph in Figure 3 with the output data from Listing 1:
Listing 3
bdplot
01 #!/usr/bin/env python3 02 03 import matplotlib.pyplot as plt 04 import sys 05 06 x=[] 07 y=[] 08 for line in sys.stdin: 09 (guests,prob)=line.split(': ') 10 x.append(guests) 11 y.append(prob) 12 13 plt.plot(x,y) 14 plt.xlabel('Guests') 15 plt.ylabel('Probability') 16 17 plt.savefig('bdcollision.png')
With the special file handle sys.stdin
, Listing 3 reads the output lines of Listing 1 and uses the split()
method in line 9 to split them at the colon, thus separating the number of guests and the probability. For the x and y values in the graph, it compiles the x
and y
lists by appending the latest value with the append()
method to the respective list for each value pair. The plot()
method then collectively accepts all x and y values and draws the graph that the savefig()
method writes to a PNG image file in line 17. It can hardly be done with less effort, and the graph looks quite appealing.
./birthdayparadox  ./bdplot
What happens at a party with 100 participants can also be determined with these scripts: The probability of two guests sharing a birthday bounces up to 99.99996928 percent; the chance that all guests have different birthdays at this mega party is more than 1 in 3 million.
Infos
 The Birthday Problem/Paradox: https://www.youtube.com/watch?v=QrwV6fJKBi8
 Listings for this article: ftp://ftp.linuxmagazine.com/pub/listings/linuxmagazine.com/211/
Buy this article as PDF
(incl. VAT)
Buy Linux Magazine
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.
News

DebConf24 to be Held in South Korea
Busan will be the location of the latest DebConf running July 28 through August 4

Fedora Unleashes Atomic Desktops
Fedora has combined its solid distribution with rpmostree system to make it possible to deliver a new family of Fedora spins, called Fedora Atomic Desktops.

Bootloader Vulnerability Affects Nearly All Linux Distributions
The developers of shim have released a version to fix numerous security flaws, including one that could enable remote control execution of malicious code under certain circumstances.

Microsoft Says VS Code Will Work with Ubuntu 18.04
If you're a user of Microsoft's VS Code and you're still using Ubuntu 18.04, you can breathe a sigh of release that the IDE will continue working… for a while.

Purism Crowdfunding Launched
The first public offering on StartEngine crossed the US$ 100,000 mark within 48 hours of launch.

System76 Refreshes Serval WS Laptop
With more and more users turning to AI workloads, System76 decided to raise the bar with its Serval WS laptop by upping the specs on a few key components.

Docker Build Cloud Helps Speed Up Build Time
Docker has announced a new product that directly addresses the time spent waiting on builds to complete.

Firefox 122 Release Includes Official DEB for Ubuntu Distros
Finally, Mozilla has returned to Ubuntu's/Debian's roots to offer an official DEB package for those who prefer to not use the Snap package.

MX Linux 23.2 "Libretto" Released
MX23.2 is a second refresh of the MX23 release and mostly consists of bug fixes and updates, but users will find a couple of new tools in the mix.

Linux Mint 21.3 – with Extra Cinnamon – Available for Download
Linux Mint 21.3 has arrived for general usage and includes Cinnamon 6.0 and new goodies all around.