A brief discussion on Confusion Matrix – As the COVID-19 pandemic continues to ravage the world, India has done remarkably well. Even as global infections have crossed 1.2 million, India’s number at around 4,000 seem model, given we are home to one-sixth of the world’s population. In per capita terms, only 3 in a million people in India are infected by COVID-19, vs 156 in a million people globally (as of 5th April 2020).

A number of reasons have been suggested including a smaller number of tests, BCG vaccine usage in India, Indian strain being less virulent and Indians being more resistant to infections.

In this context, I seek to use the confusion matrix (also called error matrix and matching matrix in unsupervised learning) to discuss the impact of higher testing. 

Students of my Quantitative Methods class would be familiar with the chart below.  The confusion matrix tends to confuse a lot of the students. Its implications are underappreciated by many practitioners of statistics.

In brief, for any test, in medicine or otherwise, the results may have some error. These are classified as:

1. True Positive: A true positive is the correct affirmation of the presence of a condition

For example, concluding a pregnant lady is pregnant would be “True Positive”

2. False Negative: A false negative is an error in which a test result improperly indicates no presence of a condition (the result is negative), when in reality, the condition is present.

For example, concluding a pregnant lady is not pregnant would be “False Negative”

3. False Positive: A false positive is the error in affirmation of the presence of a condition

For example, concluding a man is pregnant would be “False Positive”

4. True Negative: A true negative is the correct affirmation that the condition is not present 

For example, concluding a man is not pregnant would be “True Negative”

For any tests, some error will occur. Some infected people will be assumed to be uninfected (false negatives) and some uninfected people will show up being infected (false positives). What are the true positive rates (also called sensitivity) for the COVID-19 tests? We don’t know yet. 

Let’s assume (this is hypothetical and not meant to be a forecast), that: 
  • The current number of infections in India (3 infections per million population) , the infection rate in India is 10 people per million population (or 13,000 actually vs 4,000 reported)
  • The tests are very accurate and the sensitivity is 99.9% (or 99.9% of the infections are correctly reported)
  • There are some chances of true negatives, 0.01% (or 1 in 10,000 not infected will be shown to have  an infection). This implies a true negative of 99.99%. This is higher than the true positive rate of 99.9%

Now lets assume India has the money, resources, time and effort to get everyone in India tested. Yes, all 1.3 billion Indians. What would that mean? Think about this before you peep into the solution. 

We assume 10 in a million or 13,000 Indians are infected.  This is the true number of infections:

1. True Positive:

99.9% will be correctly diagnosed or 12, 987 infected Indians will be confirmed to have been infected 

2. False Negative: 

0.1%  (or 13 Indians) will be incorrectly diagnosed as being uninfected while they are infected

3. False Positive: A false positive is the error in affirmation of the presence of a condition

1,29,99,87,000 are uninfected. However, the false positive rate is 0.01%. As a result, 129,999 people who do not have the infection would have been diagnosed as being infected. This is around 10 times the actual number of infected. As a result, if the medical test leads to even small false positives, excessive testing may overwhelm the medical system and make it difficult for the actual patents to get correct treatment.

4. True Negative: A true negative is the correct affirmation that the condition is not present 

99.99% of the 1,29,99,87,000 uninfected (or 1,29,98,57,001) will be true negatives.

However, we realize how excessive testing and even small false positives can impact us in ways we do not realize. Currently, 3% of COVID-19 tests lead to confirmation of infections. We are testing 33x more than actual infections which is better than most nations and in line with South Korea, among the best in testing its citizens. India is already testing 97% of the people who end up not being infected. We may infer that we are testing more likely and possible cases than most western nations already.

Are we testing enough? Should we test more? I am not a medical practitioner, and will let more capable minds decide on the testing rates.

What do you think? Would love to hear your thoughts and comments.

Disclaimer: I offer my views, with the knowledge that medicine, and health is not my area of expertise. Also, given that many discussions on the topic have been polarized by political leanings and viewpoints, I would like to stress that these are not to promote any ideology.