Apple Watch Warning: False Positives Ahead!

In this post, I claim that the Apple watch’s new Atrial Fibulation detection feature will very likely produce some unintended adverse results. Here are my predictions:

  • After the first 18 months of sales, the AFib diagnosis will result in the death of between 5 and 20 people under age 55.

  • There will be several front-page law suits.

  • I expect Apple will disable the feature in all watches.

  • Apple will discontinue this model of the watch as a result.

Let me break down the story for you and show you my reasoning.

DISCLAIMER: I AM NOT A BIOSTATISTICIAN, NOR A DO I HAVE A DEGREE FROM ANY INSTITUTE OF STATISTICS. THIS IS AN OPINION PIECE.

Apple’s newest watch can detect AFib - atrial fibulation. That’s a particular kind of irregular heartbeat that has a tendency to shorten life. At age 48, I was diagnosed with full-time AFib while getting a routine EKG before a knee operation. I had my heart jump-started twice, went on some very dangerous medicine, and finally had not one but two ablation operations. I am now out of AFib and have a normal heart rhythm. I’ve learned enough to put this new development in perspective. In particular, I’m going to show how to use Bayesian statistics to understand the implications of this new Apple device.

Screen Shot 2019-06-09 at 9.43.58 PM.png
 

There are two kinds of AFib: continuous and intermittent. I had continuous - it was always present. But more people have an intermittent kind, where they are sometimes in a normal (“sinous”) heart rhythm and sometimes are not. It could be once a week, once a day, every few hours, when exercising, etc. This is called “paroxysmal AF.” The way you find out is to wear a heart monitor 24/7 and bring the recording to the doctor to analyze it.

The latest Apple watch can do that for you. As you wear the watch, it can pretty well detect AFib and give you a record of it you can show to your doctor. On the face of it, this sounds great. Since AFib is bad, if the watch tells you early enough you may be able to minimize its bad effects over your lifetime. In fact, Apple went into the AFib-diagnostic business seriously, conducting their own large-scale study with Stanford hospital.

But wait. We’re now in uncharted territory. In the past, most people with AFib were discovered when they had symptoms - which is admittedly fairly late. They probably had AFib for years before noticing any symptoms (usually shortness of breath). A few of us were diagnosed early just by chance. And the device isn’t 100 percent accurate, either - there are going to be some false positives and false negatives. How many false positives and false negatives?

We want to answer the question: “If the watch tells me I have AFib, what are the chances that I have AFib?” This is exactly the kind of question Bayes’ Theorem answers. Using Bayes’ theorem, we can calculate the chances of actually having a condition given a positive test for that condition, taking the base rate into account. Here’s a quick review of the basics for those interested …

Bayesian Basics
In case your Bayes’ Theorem skills are a bit rusty, I’ve got the refresher right here, from my Bayesian Reasoning playlist on YouTube:

 
 

Daniel Yazdi, a doctor who looked at this problem concluded in his article:

I did some calculations to answer the question, “If my watch tells me I have atrial fibrillation, what are the odds it is correct?” The answer depends on the watch wearer’s age.

Is he right? Yes, he is very right. The age of the wearer is the key to understanding the accuracy of the test.

AFib in the Population
The point of using Bayesian reasoning is to take into account what we already know about the prevalence of the disease in the population. This is also called the background rate, or the base rate. We can look, as Dr Yasdi did, at the ATRIA study, which broke down AFib patients by age in the United States:

afib chart.png

Using the published Apple watch numbers for sensitivity and specificity, Yazdi plugged these three numbers for people under age 55:

Prevalence of the disease: 0.001

Sensitivity: 0.98

Specificity: 0.996

Into this Bayesian calculator. And he concludes:

In people younger than 55, Apple Watch’s positive predictive value is just 19.6 percent. That means in this group — which constitutes more than 90 percent of users of wearable devices like the Apple Watch — the app incorrectly diagnoses atrial fibrillation 79.4 percent of the time.

But there’s a yellow flag here: Yazdi shows a positive prediction value (probability that the person has the disease given a positive test) of 19.6, which is overprecise for this kind of calculation. Yazdi is saying that if someone under age 55 has a watch telling her she has AFib, chances are only about 1 out of 5 that she actually has AFib. This is what we call base-rate neglect, and it happens all the time, not just in medicine but in many fields. We’ll come back to the base rate and the yellow flag in a minute.

What Comes Next?
Now that these watches are gaining market share, doctors will start to see many new patients come in with smart-watch recordings of AFib. Here is what Rob Siegel, a cardiologist in New York (and my brother) has to say about them:

We know that atrial fibrillation (AF) comes and goes in some AF patients--this is called "Paroxysmal Atrial Fibrillation." We still recommend AF treatments for these people, even though sometimes their EKG will show AF and sometimes their EKG will not show AF. There will be people who previously did not suspect they had AF who will receive Apple Watch alerts that they might have AF. We hope to take great care of these people, but we still have lots of questions about what will help them most. Obviously we'll generally offer follow-up testing to people who receive these alerts, and these tests will generate two new groups of patients:

People with AF confirmed on follow-up testing (group 1). Should these people receive different treatments from what we give to people whose AF we had discovered for other reasons? Those treatments have major side effects, and we'd like to have better data to evaluate whether the benefits of those treatments outweigh their risks in people whose AF was discovered by the (probably extremely sensitive) Apple Watch.

People whose follow-up testing does NOT find AF (group 2):

A. Some people in this group never had AF. They had false-positive (inaccurate) Apple Watch test results, and should not receive AF treatments. 

B. Others in this group have paroxysmal AF, and the subsequent tests were performed at times when the patient's AF was dormant. 

Before they invented the Apple Watch, this dilemma did not exist. Now we anticipate large numbers of people in group 2 will start showing up in cardiology clinics. What should they do? By definition, they won't know if they are in category A or B. We will probably recommend more tests to look even harder for AF. Which tests should we recommend? If they keep failing to detect episodes of AF, should we give them one of the most accurate tests of all, the surgically-implanted loop recorder, even though that test is painful and has some dangerous side effects? Even if our follow-up tests don't detect AF, should we determine that enough of these people are in category B that they might experience a net benefit from some AF treatments? We don’t have enough data to properly answer these questions.

Cure Worse than Disease?
According to Yazdi, if you are under age 55 and have a positive AFib notice on your watch, you only have a 1 in 5 chance of actually having AFib! Overdiagnosis and overtreatment is common in the western world. Once you start down the slippery slope of measurement and treatment, your chances of lower quality of life or death go up, because side effects of treatment can last a lifetime. As Larry Hustin says in his piece, Beware the hype over the Apple Watch heart app. The device could do more harm than good:

When evaluating a new drug or device, it is a cardinal rule that the benefits must be weighed against the risks. With some drugs and devices, the risks are obvious. In others, such as with something as apparently benign as the Apple Watch, the risks may be less immediately apparent. Nevertheless, they can be real and potentially significant.

Screen Shot 2019-06-10 at 9.13.21 PM.png
 

What problem are we really trying to solve? The US Preventative Services Task Force has looked at screening asymptomatic people for AFib and recommended against ECG screening in asymptomatic adults at low risk for heart disease. In 2018, the group published a statement saying:

The USPSTF recommends against screening for cardiovascular disease with resting or exercise ECG in adults at low risk of cardiovascular disease events.

Here are my calculations, which I encourage you to understand (and critique).

Looking at the Denominator
Back to the yellow flag. In Bayesian calculations, the result is always sensitive to the base-rate in the population, also known as the prior. Did you breeze by the ATRIA study I mentioned a few sections back? That’s the study Dr Yazdi used to determine the background rate for AFib in the population and the study that produced the graph, above. That study followed some 9,000 residents of Northern California who had AFib, to see if their drug, Warfarin, was helping prevent dangerous hemorrhages. These Northern Californians had already been diagnosed with AFIB and therefore most had symptoms. Most of them were over 65. It’s not surprising that only a handful were under age 55, and if you only measure a few people, that could skew the result a lot. That should lead us to question the 0.01 percent base rate Yazdi used above.

This German study compares data in Germany with that of other studies. Here is what they show:

Screen Shot 2019-06-03 at 3.29.10 AM.png

This is better. Data from many studies shows that the incidence of AFib in the general population (again, we don’t really know how they got this data) is probably less than 2 percent in people under age 60. I plugged that into the formula and got a positive prediction value of 83 percent. That’s better than what Yazdi got with the Kaiser data, but this exposes our ignorance much more than our knowledge.

Let’s look at a table of the outcomes given different base-rate assumptions:

Screen Shot 2019-06-09 at 6.11.11 PM.png

The denominator matters. We don’t know what the denominator really is. The Apple watch may tell us!

As I explain when I teach this, the problem is the accuracy of the test. because it’s 99 percent accurate, people will assume they have AFib if their watch says they do. Wouldn’t you? But we know now that it’s not correct, and we can expect many false positives as a result.

How Many Deaths?
I’m saying here that the Apple watch is almost certainly going to kill someone, through misdiagnosis and mistreatment. Let’s make a few back-of-the-napkin assumptions and try to calculate how many:

Apple should sell around 20 million watches in the next 12 months. Combine that with about 10 million from the previous six months since the watch came out, gives us 30 million. Let’s assume that 25 million of those are under 50 years old, where we have almost no data on AFib. Most of these people live in advanced countries with plenty of doctors and test equipment waiting to be put to good use. Once you show up at the doctor’s office, she has really no choice but to “do something” about your situation.

Let’s calculate what happens next. Looking at the graph above, I’ll guestimate a background rate of 1 percent of the under-60 population has some amount of AFib. So we expect that about 250,000 Apple watch owners over the next few years will show some degree of AFib (mostly very intermittent) on their device. The test is 99 percent accurate on the positive side, so we can expect 225,000 doctor visits and perhaps as many as 200,000 follow-up tests! I’m sure the cost of just these initial tests is far more than the amount paid for all those watches.

Possibly a very few of these people will benefit from continued monitoring as they age. But let’s assume that 10 percent are treated. That’s 20,000 treatments.

The number-one thing to do is prescribe blood thinners, and some percentage of people on blood thinners have major bleeding events, some of those end in death. On the one hand, AFib generally increases the chance of stroke or cardiac arrest, so we don’t want that. On the other hand, the use of blood thinners also increases those same risks. I’m not qualified to go into the analysis here, but I think if you give 20,000 healthy young people blood thinners, you are likely to see some adverse events, including some deaths. From trials on patients in their 70s, we know that around 1 out of 2,000 patients is killed by drug side-effects, which would point to roughly 10 deaths from treatment out of 20,000. But young people are not 70 years old, so we have to assume there would be fewer deaths. There could be quite a few young people with bleeding issues that are not fatal as well.

Another first-line treatment is called cardioversion, a simple procedure where they put the patient to sleep and shock the heart, to try to restore sinus rhythm. Assuming that there are 16 deaths per 10,000 from going under general anesthesia alone, not to mention the treatments, we can expect on the order of 30 deaths from this new watch. Most patients won’t undergo cardioversion, but some will.

The Number Needed to Harm
The NNH is the number of people you need to treat to (in this case) kill someone by side effects. Using blood-thinner data, we see that the number is around 2,000. But that data is from people in their 70s, so we can’t expect that number for younger people.

So I’m going to cut my numbers in half and estimate 5 to 20 deaths out of 20,000 people treated with 95 percent confidence. I might not be right - smart people can make other assumptions and come up with a different range, but I think my range is reasonable. Not bad, Apple - you sold 30 million watches and only killed a handful people, plus hundreds of patients with non-lethal side effects from unnecessary treatment. Did you save anyone? Maybe the watch helped a couple dozen sexogenerians live a few years longer.

Let’s hope I’m wrong, but it’s hard to imagine that with over 200,000 Apple watch alerts no one will be killed, if only driving to the hospital for tests!

A Prediction
My prediction is that, after the first couple of law suits, Apple will rewrite the iOS to stop diagnosing AFib. Now that will be interesting, because that could stop tens of thousands of people who probably should be monitored from the use of a helpful tool. If I were Apple, I would find a way to make sure that only people 65 or older have this feature available to them.

Summary
Here’s what I believe:

  • Apple is doing a great job coming up with cool ideas for new watch features.

  • The vast majority of Apple watch owners are under age 60.

  • Hundreds of thousands of Apple customers will now be going to doctors presenting with Atrial Fibulation alerts, at a cost to society far higher than the value of the watches.

  • Those under 60 are at risk from over-diagnosis and over-treatment. Possibly a lot of people. Some will be helped. Most may not be (we don’t know).

  • I expect between 5 and 20 people will be killed by the medical system (not by the watch itself).

Screen Shot 2019-06-09 at 9.53.47 PM.png
  • People 60 - 75 may benefit. After diagnosis, they have a chance for treatment to help them. So let’s re-run the numbers in the Bayesian calculator! If you are 65, your positive predictive value is 90 percent, which is reassuring. 90 percent of those people should probably see their doctors to discuss options and risks. Quite a few of these people may well live longer lives as a result. Perhaps ten percent of them will be treated needlessly (some may die prematurely from needless treatment).

  • Those over 75 who are asymptomatic probably won’t benefit much from a diagnosis of AFib - they probably have other things that are more likely to end their lives. Treating them may lead to more problems. I have low confidence of this statement - it could be beneficial.

  • I predict Apple will discontinue this feature at the sign of the first law suit. That will remove the functionality many people bought the watch for in the first place. Will there be law suits over that?

  • We are about to learn much more about AFib. It will take decades to understand how it benefits people who get an early diagnosis from a wearable. The Apple watch’s main benefit is in gathering data for future study. It would probably be better if the data were anonymous and the wearers were not notified.

  • This is going to be interesting.

  • It would be helpful if some real biostatisticians looked into this. I am not one.

Either way, Apple has opened a can of worms with this feature, and the simplistic (and Stanford-endorsed) Apple Heart Study is just another effective marketing tool.

It would be cool if each wearer would consent to giving this data anonymously to be collected and researched. If you have an Apple watch, please participate.

If you liked this post, please sign up for our newsletter. We have a weekly call to get to know each other, a newsletter, and a Telegram group. We are looking for people to join us and build something big together. Much more is coming.