In statistical computations, intuition can be very misleading

Infected Script

Listing 2 simulates the experiment in Perl. In an array of 1,000 women whose health status is initially set in line 12 to 0 (i.e., "no findings"), the infect() function in line 50 randomly introduces eight 1s, thus simulating the 0.8 percent of women with breast cancer in the population. To allow this to happen, line 55 uses the shuffle() function from the CPAN Algorithm::Numerical::Shuffle module to shuffle an array with the element index numbers of the patients' array based on the Fisher-Yates method [6]. Then, it selects a total of eight indices. At these points, the function then modifies the patient's array.

Listing 2

base-rate

 

The experiment performs the while loop in line 16 until the "mammography" has returned 100,000 positive results, each initiated with the examine() function in lines 34-47 with the previously known health status of the patient (for test purposes). The diagnosis takes errors of the first kind (10 percent) and the second kind (7 percent) into consideration. In case of positive findings, the function returns a true value to the main program and for negative findings, a false value. The results after calling the script are in line with the previous mathematical projections:

$ ./base-rate
Test score: 9.26%

The Perl script confirms that a test with a significant false positive rate on a group of people with a small ratio of true positive findings is inherently unreliable. Instead of giving test results more credit than they're worth, in this case, it might be advisable to seek a second opinion instead.

Infos

  1. Ludo: https://en.wikipedia.org/wiki/Ludo_(board_game)
  2. p-Value: https://en.wikipedia.org/wiki/P-value
  3. Reinhart, Alex. Statistics Done Wrong: The Woefully Complete Guide. No Starch Press, 2015.
  4. Listings for this article: ftp://ftp.linux-magazine.com/pub/listings/magazine/178
  5. Base rate fallacy: https://en.wikipedia.org/wiki/Base_rate_fallacy
  6. Fisher-Yates shuffle: https://en.wikipedia.org/wiki/Fisher--Yates_shuffle

The Author

Mike Schilli works as a software engineer in the San Francisco Bay Area. He can be contacted at mailto:mschilli@perlmeister.com. Mike's homepage can be found at http://perlmeister.com.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

comments powered by Disqus

Direct Download

Read full article as PDF:

Price $2.95

News