In statistical computations, intuition can be very misleading

Infected Script

Listing 2 simulates the experiment in Perl. In an array of 1,000 women whose health status is initially set in line 12 to 0 (i.e., "no findings"), the infect() function in line 50 randomly introduces eight 1s, thus simulating the 0.8 percent of women with breast cancer in the population. To allow this to happen, line 55 uses the shuffle() function from the CPAN Algorithm::Numerical::Shuffle module to shuffle an array with the element index numbers of the patients' array based on the Fisher-Yates method [6]. Then, it selects a total of eight indices. At these points, the function then modifies the patient's array.

Listing 2



The experiment performs the while loop in line 16 until the "mammography" has returned 100,000 positive results, each initiated with the examine() function in lines 34-47 with the previously known health status of the patient (for test purposes). The diagnosis takes errors of the first kind (10 percent) and the second kind (7 percent) into consideration. In case of positive findings, the function returns a true value to the main program and for negative findings, a false value. The results after calling the script are in line with the previous mathematical projections:

$ ./base-rate
Test score: 9.26%

The Perl script confirms that a test with a significant false positive rate on a group of people with a small ratio of true positive findings is inherently unreliable. Instead of giving test results more credit than they're worth, in this case, it might be advisable to seek a second opinion instead.


  1. Ludo:
  2. p-Value:
  3. Reinhart, Alex. Statistics Done Wrong: The Woefully Complete Guide. No Starch Press, 2015.
  4. Listings for this article:
  5. Base rate fallacy:
  6. Fisher-Yates shuffle:

The Author

Mike Schilli works as a software engineer in the San Francisco Bay Area. He can be contacted at Mike's homepage can be found at

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Python generators simulate gambling

    Can 10 heads in a row really occur in a coin toss? Or, can the lucky numbers in the lottery be 1, 2, 3, 4, 5, 6? We investigate the law of large numbers.

  • Calculating Probability

    To tackle mathematical problems with conditional probabilities, math buffs rely on Bayes' formula or discrete distributions, generated by short Perl scripts.

  • Welcome

    When Facebook renamed itself Meta in honor of its new vision of a virtual reality metaverse, I knew they were taking their initiative very seriously. I will admit, though, it was a little difficult to figure out what they were talking about.

  • Qiskit

    Qiskit is an open source framework that aims to make quantum computing technology both understandable and ready for production.

  • R For Science

    The R programming language is a universal tool for data analysis and machine learning.

comments powered by Disqus
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters

Support Our Work

Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.

Learn More