AI in health: safety a joint responsibility between users and developers, say researchers

Advances in artificial intelligence (AI) have attracted substantial interest for potential healthcare applications, but how can we know that the technology is performing safely and effectively? A new report, authored by experts at Birmingham Health Partners (BHP) member organisations, has recommended a ‘medical algorithmic audit’ framework which would see users and developers collaborate to ensure patient safety and correct performance of the tool in question.

AI – the use of computers to do tasks that would usually require human thought – is now part of our everyday life, from image recognition in our smartphones to complex robotics in high-risk areas. One of the most exciting potential applications of AI is in health, where it has the potential to provide faster, more accurate diagnoses, treatment decisions that are personalised to the patient, and improved predictions of disease. Examples include detecting skin cancer from photographs and predicting outcomes for patients with COVID-19 based on chest scans.

However, before AI can be implemented into healthcare, a number of safety and reliability concerns must be resolved. A specific area of enquiry is whether AI technologies work reliably when introduced to a new setting, such as a new hospital with new devices. Similarly, there are concerns about whether AI works equally well across demographic groups, particularly when concerns have been raised that the technology has been developed and validated in non-representative datasets.

Previous work by experts at University Hospitals Birmingham and the University of Birmingham – both founding members of BHP – has led to the development of the key tools that help researchers evaluate AI in clinical trials, and that help ensure that regulators – like the MHRA in the UK – have the best evidence to decide whether AI technologies should be approved for use in patients.

The team’s latest report introduces the medical algorithmic audit – a novel framework which brings people and technologies together to assess these AI technologies within the real-life health and care pathway in which they are used.

Dr Xiao Liu, Senior Researcher in Digital Healthcare and AI at University Hospitals Birmingham and BHP Centre for Regulatory Science and Innovation, explained: “AI is highly sensitive to changes in context, and it can produce incorrect results that wouldn’t be missed by human clinicians. Because many of these errors are not obvious, unless we proactively look for them, they can easily go undetected until they result in patient harm.’

“The results of the audit can be used by health staff to improve the safety of the technology or even withdraw the AI technology completely if needed. We suggest this can be done most successfully if clinicians and AI device manufacturers work collaboratively.”

The report brought together a team of clinicians, computer scientists and ethicists from the UK, The Hospital for Sick Children (Sickkids), Massachusetts Institute of Technology and the University of Adelaide, and was overseen by joint senior authors Dr Lauren Oakden-Rayner of the Australian Institute for Machine Learning and Professor Alastair Denniston, Deputy Director of the BHP Centre for Regulatory Science and Innovation, both of whom are senior doctors overseeing the introduction of AI health technologies, and are committed to ensuring that this is done safely.

Dr Oakden-Rayner explained: “As a radiologist, I am excited about the potential for AI technology in healthcare, but there is now a wealth of evidence that the standard practices for testing AI systems do not adequately predict how these technologies will behave in the real world. More comprehensive assessments of model behaviour, such as algorithmic auditing, allow us to predict risks and more importantly take steps to mitigate harm to patients before it happens.”

Professor Denniston commented: “One of the most important aspects of the algorithmic audit, is that it contains specific requirements to assess the AI performance in different groups of people, and to do the detective work to make sure we understand why it fails and to reduce the chance of it happening again. The algorithmic audit is an important step in making sure that AI is not just ’safe on average’ but is ’safe for all’.”

‘The medical algorithmic audit’ appears in The Lancet Digital Health on Tuesday 5 April 2022 and can be accessed at https://doi.org/10.1016/S2589-7500(22)00003-6.