One of the biggest-and most lucrative-applications of artificial intelligence (AI) is in health care. And the capacity of AI to diagnose or predict disease risk is developing rapidly. In recent weeks 9 researchers have unveiled AI models that scan retinal(adj.办视网膜的)images to predict eye-and cardiovascular-disease risk, and that analyse mammograms(mammogram n.乳房X线照片)to detect breast cancer. Some AI tools have already found their way into clinical practice.
AI diagnostics have the potential to improve the delivery and effectiveness of health care. Many are a triumph for science, representing years of improvements in computing power and the neural networks that underlie deep learning. In this form of AI, computers process hundreds of thousands of labelled disease images, until they can classify the images unaided. In reports, researchers conclude that an algorithm is successful if it can identify(vt.确定;鉴定;识别,辨认出确定)a particular condition(n.病情)from such images as effectively as can pathologists and radiologists(radiologist n.放射科医生).
But that alone does not mean the AI diagnostic is ready for the clinic. Many reports are best viewed as analogous to(analogous to 类似于;类同于)studies showing that a drug kills a pathogen(n.病原体;病菌)in a Petri dish(Petri dish 陪替氏培养皿). Such studies are exciting, but scientific process demands that the methods and materials be described in detail, and that the study is replicated and the drug tested in a progression of studies culminating in large clinical trials. This does not seem to be happening enough in AI diagnostics. Many in the field complain that too many developers are not taking the studies far enough. They are not applying the evidence-based approaches that are established in mature fields, such as drug development.
These details matter. For instance , one investigation published last year found that an AI model detected breast cancer in whole slide images better than did 11 pathologists who were allowed assessment times of about one minute per image. However, a pathologist(n.病理学家)given unlimited time performed as well as AI, and found difficult-to-detect cases more often than the computers.Some issues might not appear until the tool is applied. For example,a diagnostic algorithm(n.[计][数]算法,运算法则算法)might incorrectly associate images produced using a particular device with a disease—but only because, during the training process, the clinic using that device saw more people with the disease than did another clinic using a different device.
These problems can be overcome. One way is for doctors who deploy AI diagnostic(AI diagnostic 人工智能诊断)tools in the clinic(n.临床)to track results and report them, so that retrospective studies expose any deficiencies(deficiency n.缺陷,缺点;缺乏;不足的数额). Better yet, such tools should be developed rigorously-trained on extensive data and validated in controlled studies that undergo peer review. This is slow and difficult, in part because privacy concerns can make it hard for researchers to access the massive amounts of medical data needed. A News story in Nature discusses one possible answer:researchers are building blockchain-based systems to encourage patients to securely share information. At present, human oversight will probably prevent weaknesses in AI diagnosis from being a matter of life or death. That is why regulatory bodies, such as the US Food and Drug Administration, allow doctors to pilot technologies classified as low risk.
But lack of rigour does carry immediate risks:the hype-fail cycle could discourage others from investing in similar techniques that might be better. Sometimes, in a competitive field such as AI, a well-publicized set of results can be enough to stop rivals from entering the same field.
Slow and careful research is a better approach. Backed by reliable data and robust methods, it may take longer, and will not chum out(vt.搅拌;搅动)as many crowd-pleasing(讨人喜欢的)announcements. But it could prevent deaths and change lives.