Original source: link

Human dimension presents new hurdles for AI in medicine

When Google Health researchers conducted a study in Thailand that looked at the effectiveness of their company’s medical imaging technology, the most interesting results had little to do with how the algorithms worked.

Instead, they discovered the human dimension that so often undermines the potential of such technology — real-world problems that can hamper the use of AI.

The study, into the use of a deep learning system to identify a diabetic eye disease, was the first “human-centred observational study” of its kind, according to Google Health. The research team had good reason to be hopeful about the effectiveness of the underlying technology: when examining images of patients in a lab setting, the software missed far fewer cases than specialists (it reduced the rate of false negatives by 23 per cent), at the cost of increasing false positives by 2 per cent.

In the real world, however, things went awry. In some clinics, the lighting was not right, or internet speeds were too slow to upload the images. Some clinics did not use eye drops on patients, exacerbating problems with the images. The result was a large number of images that could not be analysed by the system.

This does not mean that the technology cannot bring benefits to large populations that lack adequate access to healthcare. Due to scarcity of resources, half of diabetes sufferers are never examined for diabetic retinopathy, an eye disease, says Eric Topol, a professor of molecular medicine at Scripps Research Institute. “It's still good enough to make a difference,” he says.

The gulf between the technical brilliance claimed for Google’s deep learning model and its real-world application points to a common problem that has hindered the use of AI in medical settings.

“Accuracy is not enough,” says Emma Beede, lead researcher on the paper. “In a lab setting, researchers can miss out on those socio-environmental factors that surround use of a system.”

But the study highlighted how far below potential the technology may be working. With such inconsistencies in its application, AI cannot be relied upon to support critical decisions.

These studies do not reflect the problems algorithms encounter when presented with data collected in messy, real-world situations. One limitation is that most studies used to validate the clinical use of AI are retrospective, meaning they are based on historical data sets that have been cleaned up and property labelled for the purpose.

Despite the big claims that many tech companies involved in medical imaging make for their products, medical-grade research is scarce. A study in the British Medical Journal earlier this year found only two completed randomised clinical trials involving the use of deep learning systems in medical imaging, with another eight ongoing.

But that did not stop the companies behind 81 other, less stringent trials from claiming success: three-quarters asserted that their systems had performed as well as, or better than, humans.

Send us your ideas on how technology can improve healthcare

The FT/Lancet Governing Health Futures 2030 Commission comprises independent leading experts who will publish their peer-reviewed report in late 2021. Share your views here.

Read the rest of our Future of AI & Healthcare special report here.

Most of these trials are “at high risk of bias, and deviate from existing reporting standards”, the researchers warn. As a result, they present “a risk for patient safety and population health at the societal level, with AI algorithms applied in some cases to millions of patients”.

These “pseudo validations”, which are used to give medical AI systems the patina of reliability, often lead to hugely irregular outcomes, says Mr Topol. “We know that the algorithms vary very considerably in their performance by the population that’s studied,” he says. “If you test in one hospital it might not work in another.”

A mammogram taking place at the Paoli-Calmettes Institute, Marseille. Research by Google suggests that using AI does not generally increase the level of accuracy from screenings © Christine Poujoulat/AFP via Getty Images

One frequent challenge comes from fitting the new technology into the workflow of medical practitioners. Google, for instance, cites research showing that using medical imaging AI in mammography can increase, rather than reduce, the workload of radiographers and does not generally raise the level of accuracy from screenings.

According to Ms Beede, Google Health’s study in Thailand showed that it takes painstaking work to fit the technology to the clinical environment. Changing the protocols governing how its AI was used had led to greatly improved results, she says.


Some issues holding back the development of effective AI for medical imaging, however, are likely to be harder to solve. Many of the most challenging revolve around the collection and use of data needed to train the systems.

Privacy rules limit the availability of useful data sets. Data is usually cleaned so that it can be used to train a system, but this often does not replicate the messy situation when the technology is used to draw inferences in a clinical setting.

There is also the problem of trust. A lack of transparency about how the algorithms have been developed and validated presents an obstacle to their wider adoption, researchers warn.

The BMJ study counted “at least 16” algorithms that had been approved for medical imaging by US regulators, but only one randomised trial registered in the US to look at their results.

“The medical community doesn’t even know what the data are that the algorithm works on,” Mr Topol says. Without more transparency, he adds, it is unsurprising that many medical practitioners — who are naturally conservative when it comes to adopting new technology — have yet to be convinced of the value of AI.