ML-AIM Machine Learning and Artificial Intelligence for Medicine

Research Laboratory led by Prof. Mihaela van der Schaar

April 03, 2020

Progress using COVID-19 patient data to train machine learning models for healthcare

One short week ago, I called on governments to use existing data and proven machine learning and AI techniques to help healthcare systems combat the COVID-19 pandemic.

The response was amazing. My team and I received encouragement, ideas, and proposals for collaboration.

We also received, courtesy of Public Health England, a set of (depersonalized) data on existing COVID-19 cases. Along with my team at the Cambridge Centre for AI in Medicine, I’ve spent the last few days training our models on this data. The results so far are extremely encouraging.

Among other things, we wanted to demonstrate that machine learning techniques can accurately predict how COVID-19 will impact resource needs (ventilators, ICU beds, etc.) at the individual patient level and the hospital level, thereby giving a reliable picture of future resource usage and enabling healthcare professionals to make well-informed decisions about how these scarce resources can be used to achieve the maximum benefit.

Based on the data we received from Public Health England, we now have a proof-of-concept demonstrator showing that this can be done, in the form of a new system we call Adjutorium.

Isn’t flattening the curve enough?

Social policies can certainly help take the strain off healthcare systems around the world. But there’s no guarantee that certain individual hospitals won’t still be stretched well beyond capacity. Additionally, these measures themselves may not be properly observed by everyone, or may be relaxed slowly over time. It’s important to ensure that hospitals remain armed with information that will help them manage peaks in demand for resources like ICU beds or ventilators.

As I touched upon last week, life-or-death choices will be made regarding the use of scarce resources like ventilators and ICU beds. If you are managing or working in a hospital, it would be incredibly helpful (but it’s currently not possible) to have a highly reliable picture of the likely usage status of these resources over time.

This is what too many healthcare professionals around the world are currently worrying about:

We can help answer these questions by being smart about how we use existing data on hospital admissions, ICU admissions, use of ventilators, patient outcomes (e.g. discharge, mortality), and more. If we have access to high-quality datasets containing such information, we can use machine learning to answer questions such as:

- Which patients are most likely to need ventilators within a week?

- How many free ICU beds is this hospital likely to have in a week?

- Which of these two patients will get the most benefit from going on a ventilator today?

While these questions can reliably be answered using the machine learning techniques we’ve developed, I cannot emphasize enough that the decisions themselves will, of course, still be made by healthcare professionals on the basis of their organization’s priorities and policies.

Here’s how a machine learning model can help answer questions in a way that’s useful to healthcare professions:

As you can see, patients are given risk scores based on their likelihood of ICU admission or ventilator usage. These are then aggregated across the hospital to give a picture of future demand on resources.

Using Public Health England data

Last week, I shared a firm belief that existing and proven machine learning techniques can already tackle these kinds of challenges and can deliver essential insights, even using existing (possibly quite noisy) data sources. Thanks to the data we received from Public Health England, I feel more confident than ever.

We received data for nearly 1,700 patients, and that number continues to increase because the dataset is updated daily. While the data was depersonalized, it includes basic information, lab results, hospitalization details, risk factors and outcomes.

We fed this data to AutoPrognosis, a state-of-the-art automated machine learning framework that our team developed in 2018 (initially for cardiovascular issues, but subsequently also for cystic fibrosis and breast cancer, among others).

To predict mortality, we used data from 850 patients to train our model, and then verified the accuracy of the model using results from 197 other patients from the same dataset. For ICU admission prediction, we trained with data from 950 patients and verified with data from 285 patients. To predict need for ventilation, we trained with 810 patients and verified with 276 patients.

We called the new system we created “Adjutorium,” meaning help, assistance or support.

What we learned

So, how did Adjutorium perform?

Simply put: it did really, really well.

Once trained with patient data, Adjutorium was able to make highly accurate predictions about the patients whose data we used for verification. Crucially, we managed to do so much more accurately than existing and widely-used survival analysis techniques such as Cox regression or well-known indexes such as the Charlson comorbidity index.


Adjutorium accuracy

Cox regression accuracy

Charlson index accuracy


0.871 ± 0.002

0.773 ± 0.003

0.596 ± 0.002

ICU admission

0.835 ± 0.001

0.771 ± 0.002

0.556 ± 0.013


0.771 ± 0.002

0.690 ± 0.002

0.618 ± 0.002

Accuracy is measured using AUC-ROC. Higher is better.

It’s also worth bearing in mind that Adjutorium achieved these results with a relatively small proportion of the data that could be gathered from COVID-19 cases globally. The more data we have access to, the better we can train our models and improve their accuracy, and the more useful Adjutorium becomes.

Next steps

The progress we’ve made so far is extremely encouraging: we now have a functioning proof of concept that demonstrates the potential use of machine learning in helping to manage scarce resources like ICU beds and ventilators. There’s still work to be done, though, and much of this will rely on continuing to receive new and high-quality data.

Our immediate priority is to continue to validate the models we’ve developed. Doing so will bring us closer to finalizing the system for usage by healthcare professionals.

We also need to get our hands on new types of data that will make our existing models even more accurate. Specifically, we require longitudinal data that enables us to gain a deeper understanding of the progression of patients while they’re hospitalized (rather than irregularly-recorded “snapshots” that show the state of affairs at specific times). Given how little is known about COVID-19, such data would provide valuable insights. Additionally, we’re hoping for clearer data regarding the timing and effects of ventilators when used to treat patients. This would let us tell, for example, how long individual patients could or should have waited before ventilation in order to achieve the best possible outcomes.

We will also be working with the NHS and Public Health England to transform our tools into a system that can easily be used and understood by healthcare professionals. In this sense, interpretability is key: we want to ensure that decision-makers can debug and analyze the information generated by our system.

If you’d like to know more…

On Wednesday, I gave a presentation summarizing our progress at a COVID-19 workshop hosted by ELLIS (European Laboratory for Learning and Intelligent Systems). You can find the slide deck here and a video of my presentation embedded below.

March 27, 2020

Responding to COVID-19 with AI and machine learning

This perspective paper, published on March 27, 2020, is authored by Professor Mihaela van der Schaar alongside members of the Cambridge Centre for AI in Medicine. The paper calls on governments and healthcare authorities to use proven AI and machine learning techniques and existing data to coordinate a response to the global COVID-19 pandemic.

A summary is provided below, and the full paper is linked at the bottom of this page.


Both the UK and the international community are still in the early stages of a crisis that will see an unbelievable amount of pressure put on social and healthcare infrastructure. Ventilators and ICU beds will be in short supply, and the time of clinical professionals will be stretched across too many patients to cover. This will lead to unfortunate but necessary decisions. Life-and-death choices will be made, and often.

Photo by eberhard grossgasteiger on Unsplash

AI and machine learning can use data to make objective and informed recommendations, and can help ensure that scarce resources are allocated as efficiently as possible. Doing so will save lives and can help reduce the burden on healthcare systems and professionals.

Our paper goes into detail about specific challenges faced by healthcare systems, and how AI and machine learning can improve decision-making to ensure the best outcomes possible. I’ll avoid going into too much detail (the paper is linked at the end of this post), but here’s a summary.

1. Managing limited resources

AI and machine learning can help us identify people who are at highest risk of being infected by the novel coronavirus. This can be done by integrating electronic health record data with a multitude of “big data” pertaining to human-to-human interactions (from cellular operators, traffic, airlines, social media, etc.). This will make allocation of resources like testing kits more efficient, as well as informing how we, as a society, respond to this crisis over time.

AI and machine learning can also help us work out which infected patients are more likely to suffer more severely from COVID-19. We can provide more accurate patient risk scores that will help clinical professionals decide who needs urgent treatment (and resources), and when.

Photo by Tai's Captures on Unsplash

2. Developing a personalized treatment course for each patient

As mentioned above, COVID-19 symptoms and disease evolution vary widely from patient to patient in terms of severity and characteristics. A one-size-fits-all approach for treatment doesn’t work. We also are a long way off from mass-producing a vaccine.

Machine learning techniques can help determine the most efficient course of treatment for each individual patient on the basis of observational data about previous patients, including their characteristics and treatments administered. We can use machine learning to answer key “what-if” questions about each patient, such as “What if we postpone a couple hours before putting them on a ventilator?” or “Would the outcome for this patient be better if we switched them from supportive care to an experimental treatment earlier?”

3. Informing policies and improving collaboration

We have seen a huge variety of approaches taken by decision-makers when deciding on policies to respond to COVID-19. This is true from the individual level (i.e. practitioners) all the way up to the government level. For example, differences in triaging protocols used by medical institutions and practitioners could mean that two patients with similar profiles will end up receiving different types of treatment depending on where they happen to live.

It’s hard to get a clear sense of which decisions result in the best outcomes. In such a stressful situation, it’s also hard for decision-makers to be aware of the outcomes of decisions being made by their counterparts elsewhere.

Once again, data-driven AI and machine learning can provide objective and usable insights that far exceed the capabilities of existing methods. We can gain valuable insight into what the differences between policies are, why policies are different, which policies work better, and how to design and adopt improved policies.

This information can be shared between decision-makers at all levels, improving consistency and efficiency across the board. The result is that routine decisions can be made in a more coordinated and timely way, freeing up valuable medical attention to the cases that demand real-time expertise.

Photo by cheng feng on Unsplash

4. Managing uncertainty

We still know very little about the COVID-19 pandemic, and the virus itself may continue to change over time. We may not be able to rely on the data from decisions and outcomes taken in other countries (China, Iran, South Korea, Italy, etc.), as those may generalize poorly to other countries like the UK or the US. In the meantime, unproven hypotheses about the disease are likely to propagate online, impacting individual behaviour and causing systemic risks.

We can use an area of machine learning called transfer learning to account for differences between populations, substantially eliminating bias while still extracting usable data that can be applied from one population to another.

We can also use methods to make us aware of the degree of uncertainty of any given conclusion or recommendation generated from machine learning. This means that decision-makers can be provided with confidence estimates that tell them how confident they can be about a recommended course of action.

5. Expediting clinical trials

Randomized clinical trials (RCTs) are generally used to judge the relative effectiveness of a new treatment. However, these trials can be slow and costly, and may fail to uncover specific subgroups for which a treatment may be most effective. A specific problem posed by COVID-19 is that subjects selected for RCTs tend not to be elderly, or to have other conditions; as we know, COVID-19 has a particularly severe impact on both those patient groups.

Rather than recruiting and assigning subjects at random, machine learning methods can recruit subjects from identifiable subgroups, and assign them to treatment or control groups in a way that speeds up learning. These methods have been shown to significantly reduce error and achieve a prescribed level of confidence in findings, while also requiring fewer subjects. We can also use machine learning to target particular treatments to specific subgroups and to understand what treatments are suitable for the population as a whole.

These techniques are proven, and should be implemented without delay

The AI and machine learning techniques I’ve mentioned above do not require further peer review or further testing. Many have already been implemented on a smaller scale in real-world settings. They are essentially ready to go, with only slight adaptations required.

The data to support these techniques already exists in the UK and many other countries. There is a wealth of information we can get from electronic health records and emergency call databases, as well as “big data” for human-to-human interactions. We simply need to be able to integrate this information on a national, hospital and individual level.

Photo by Hello I'm Nik 🎞 on Unsplash

My fellow authors and I call upon the governments of the UK and other nations to implement the above techniques as soon as possible. We also extend our support in the form of technologies, resources and knowledge to assist with their implementation. If we act now, we may be able to have these systems in place before our healthcare infrastructure is overwhelmed. Doing so will save lives.

You can read the full paper here.