Google Health, Google’s health arm, and DeepMind, the AI division, teamed up with researchers at Imperial College London to write in the journal Nature. Describes how three deep learning networks in some cases outperform human radiologists in diagnosing cancer by reading mammograms. But the rules suggest that the technology has not yet reached the level of complete replacement of radiologists.
Pictured: Google’s health team, DeepMind and Imperial College London used three different deep learning neural networks, from top to bottom, facebook AI’s RetinaNet and Google’s MobileNetV2. The middle part is ResNet-v2-50, and finally the bottom of ResNet-v1-50. Each neural network picks out the areas that look suspicious in a mammogram photo in a different way, and then summarizes the findings to arrive at a probability decision about cancer or non-cancer
If you get a “quite correct” rating, which means more right than error, the result may not be very satisfactory. But if you’re an artificial intelligence (AI) algorithm, you’ll get a lot of credit for it. After all, AI programs don’t have to give a clear answer, just a probability answer, or the probability (percentage) of the correct answer, whether it’s performing a natural language translation or diagnosing cancer.
The latest example of AI’s achievements is in this week’s issue of Nature, an international assessment of the Breast Cancer Screening AI system, written by 31 academics from Google Health, DeepMind and Imperial College London, including Scott Mayer McKinney McKinney, Marcin T. Sieniek, Varun Godbole and Jonathan Godwin and DeepMind CEO Demis Hassabis) and so on.
The headline was that Google’s science had beaten radiologists in the UK and the US in subsequent mammograms and announced the existence of a cancer diagnosis, indicating a “significant reduction in false positive and false negative false positives.” “The AI technology even beat a team of six human radiologists who were commissioned to perform the task, looking at 500 mammograms and giving the diagnosis.
The results have made an important contribution to the development of AI tools, which may also be useful to doctors. But that doesn’t mean it can replace a diagnosis by a human doctor. It’s important to look closely at the data because there’s a lot of little-known stuff hidden inside.
Let’s start with the background: Scientists collected data on women screened for breast cancer between 2012 and 2015 from three different hospitals in the UK, and a total of 13,918 women met certain criteria, such as age and screening. That’s what researchers use to train neural network systems. Once the system has been trained, another 26,000 women’s data has been used to test the system. The researchers also processed the same data collected by a U.S. hospital, Northwest Memorial Hospital from 2001 to 2018, but the sample size was much smaller.
The scientists trained three different neural networks, each of which looked at mammograms at different levels of detail. This setting detail of deep learning is fascinating and may represent the most advanced level of machine learning networks. One neural network is the ResNet V-150, which has been a classic image recognition method developed by Dr. He and his Microsoft colleagues in 2015.
The second neural network is RetinaNet, developed by Facebook AI researchers in 2017. The third is the MobileNet V2 neural network, released last year by Google scientists. This is a great hybrid approach that shows how code sharing and open scientific publications enrich the work of each network. Details are included in the supplementary material paper at the bottom of the main paper in the journal Nature.
Now, the tricky part is: whether any cases of breast cancer have been confirmed as “basic facts” by subsequent live tissue tests in trained networks. In other words, the diagnosis is not just what the image looks like, but also the conclusions that subsequent medical tests have drawn by explicitly extracting a piece of cancerous tissue. In this case, the answer to the presence of cancer is clear lying or not.
But the perfect combination of the three deep learning neural networks described above does not give a clear answer of yes or no. It only produces a score from 0 to 1 as a “continuous value” rather than a binary “one or one” judgment. In other words, the AI diagnosis may be absolutely accurate or completely wrong, depending on how close or far away it is from the correct value in any given situation, 0 or 1.
To match the probability score to what humans do to make their judgments, McKinney and his colleagues had to convert the probability score of AI to a binary value. They do this by selecting a single answer through a separate set of validation tests. Comparing “superiority” with human judgment is the way AI picks out relatively accurate answers in the broader set of total answers it produces.
As the authors explain: “AI systems are born with a continuous score that represents the likelihood of cancer, so to support comparisons with human doctors’ predictions, we have a threshold to produce similar binary screening decisions, in which case, Thresholds mean selecting a single point for comparison. For each clinical benchmark, we use the validation set to select a different point of action, which is equivalent to a score threshold that separates both positive and negative decisions. “
AI is almost as good as humans in predicting whether something is cancer, compared with the UK’s data. As the report says, the term is “non-shoddy”, meaning that it is no worse than human judgment. The area where AI networks are clearly doing better is the so-called “specificity”, a statistical term that means neural networks do a better job of avoiding false positives. That is, predict disease in the absence of it. This is important, of course, because being misdiagnosed as cancer means too much stress and anxiety for women.
However, it is worth noting that in this case, human scores come from doctors, who must determine whether further tests, such as biopsies, are required based on mammograms. It is conceivable that, in the early stages of diagnosis, doctors may make overly broad assessments in order to push patients for further testing to avoid the risk of cancer detection. This is the fundamental difference between doctors deciding where to go next and the likelihood that the machine will guess the outcome after a few years.
In other words, doctors sitting in front of patients don’t usually try to guess the probability of outcomes over the next few years, but rather try to determine what the key steps patients take next? For example, even if AI is lowly likely to determine cancer based on mammograms in certain situations, patients will still expect their doctor to make a mistake and prescribe a biopsy to ensure safety without regret. They are likely to appreciate this caution.
The scientists wrote in their concluding section that although AI had found missing cases by doctors, several cases of cancer diagnosed by doctors had been ignored by AI. This was particularly evident in the additional “reading study”, in which six human radiologists looked at 500 cancer screenings. The researchers found that “all six radiologists missed cancer samples that were correctly identified by the AI system” and that “all six radiologists found cancer samples that were missing from the AI system.” “
Some disturbing, the authors write, are not entirely clear about why AI succeeds or fails in each case. “While we are not in a position to establish a clear pattern in these circumstances, the existence of this marginal condition suggests that AI systems and human doctors may complement each other in reaching accurate conclusions,” they said. “
To be sure, one wants to know more about how these three deep learning neural networks make probability guesses. For example, what do they see? The problem, which is what neural networks represent, is not addressed in the study, but it is a critical one for AI in such a sensitive application.
To sum up, one of the big questions we face is how much effort should we do to develop a system that can predict the probability of future cancer development more accurately than many doctors who have to make a preliminary assessment? The value of helping doctors use AI will be very high, even at this point AI cannot really replace doctors.
Incidentally, the study looked at both The UK and US data and came up with confusing findings about comparing the quality of the health care system. Overall, a preliminary review of the tests showed that British doctors appeared to be significantly more accurate than in the US, and they correctly concluded that something would prove to be cancerous.
Given the differences in the data set used, i.e. 13,981 women in the UK come from three hospitals, while only one hospital in the US provides 3,097 people, it’s hard to know how to get these different results. Obviously, as interesting as AI is, there are also differences in the relative abilities of doctors in these two different medical systems.