In statistical computations, intuition can be very misleading

Infected Script

Listing 2 simulates the experiment in Perl. In an array of 1,000 women whose health status is initially set in line 12 to 0 (i.e., "no findings"), the infect() function in line 50 randomly introduces eight 1s, thus simulating the 0.8 percent of women with breast cancer in the population. To allow this to happen, line 55 uses the shuffle() function from the CPAN Algorithm::Numerical::Shuffle module to shuffle an array with the element index numbers of the patients' array based on the Fisher-Yates method [6]. Then, it selects a total of eight indices. At these points, the function then modifies the patient's array.

Listing 2



The experiment performs the while loop in line 16 until the "mammography" has returned 100,000 positive results, each initiated with the examine() function in lines 34-47 with the previously known health status of the patient (for test purposes). The diagnosis takes errors of the first kind (10 percent) and the second kind (7 percent) into consideration. In case of positive findings, the function returns a true value to the main program and for negative findings, a false value. The results after calling the script are in line with the previous mathematical projections:

$ ./base-rate
Test score: 9.26%

The Perl script confirms that a test with a significant false positive rate on a group of people with a small ratio of true positive findings is inherently unreliable. Instead of giving test results more credit than they're worth, in this case, it might be advisable to seek a second opinion instead.


  1. Ludo:
  2. p-Value:
  3. Reinhart, Alex. Statistics Done Wrong: The Woefully Complete Guide. No Starch Press, 2015.
  4. Listings for this article:
  5. Base rate fallacy:
  6. Fisher-Yates shuffle:

The Author

Mike Schilli works as a software engineer in the San Francisco Bay Area. He can be contacted at Mike's homepage can be found at

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Python generators simulate gambling

    Can 10 heads in a row really occur in a coin toss? Or, can the lucky numbers in the lottery be 1, 2, 3, 4, 5, 6? We investigate the law of large numbers.

  • Calculating Probability

    To tackle mathematical problems with conditional probabilities, math buffs rely on Bayes' formula or discrete distributions, generated by short Perl scripts.

  • Programming Snapshot – Go

    Every photo you take with your mobile phone stores the GPS location in the Exif data. A Go program was let loose on Mike Schilli's photo collection to locate shots taken within an area around a reference image.

  • Tails Closes Security Gaps

    Recent privacy revelations bring new importance to the anonymous surfing Live distro.

  • Bash scripting

    A few scripting tricks will help you save time by automating common tasks.

comments powered by Disqus

Direct Download

Read full article as PDF:

Price $2.95