Model architecture: The visual encoder processes sequences of CMR images and the text encoder processes the text from the “impression” section of the corresponding reports.

Image source: Nakashima M, Qiu J, Huang P et al., Nature Communications 2026 (CC BY-NC-ND 4.0)

News • Contrastive language image pretraining

AI system to interpret cardiac MRI scans with enhanced accuracy

Trained on more than 13,000 patient studies, novel system significantly outperforms existing models by up to 35%

The novel system, called CMR-CLIP, is designed to interpret cardiac MRI scans by connecting moving images of the heart with corresponding clinical radiology reports. In testing, it significantly outperformed general-purpose AI models, in some cases by more than 35%. The system also showed strong potential for improving cardiac imaging analysis, case retrieval and clinical decision support.

This research was published in Nature Communications. The CMR-CLIP codebase is publicly available at GitHub.

“This work demonstrates that domain-specific foundation models can significantly outperform general-purpose AI systems in specialized clinical applications,” said Ding Zhao, associate professor in Carnegie Mellon University’s Department of Mechanical Engineering and co-principal investigator on the study. “By designing models that reflect the structure and complexity of cardiac MRI data, rather than adapting generic image models, we can unlock new levels of performance and clinical utility.”

This work highlights a new direction for medical AI by showing how large-scale clinical data can be used to train models without requiring time-consuming manual labeling
Deborah Kwon

David Chen, Ph.D., of Cleveland Clinic, a co-principal investigator on the project, emphasized the clinical implications of the work. “Cardiac MRI interpretation is highly specialized and time intensive. Systems like CMR-CLIP have the potential to support clinicians through automated screening, and interpretation support, particularly in settings where expert readers are limited. Such reader assistant tools are critical to improving patient access to this powerful diagnostic technology.”

Cardiac MRI is widely regarded as the gold standard for evaluating heart structure, function and tissue health. A single scan can provide a comprehensive view of the heart, including pumping performance, muscle damage, blood flow and structural abnormalities. However, each study can contain hundreds to thousands of images across multiple views and time points. Even for trained specialists, interpreting a single exam can take 40 minutes or more. Because the technology is expensive and concentrated in major medical centers, there is a limited supply of experts available to meet growing clinical demand.

This combination of complexity and limited data has also made cardiac MRI one of the most challenging domains for AI. Most machine learning systems rely on large, carefully labeled datasets, but in cardiac imaging, expert annotations are scarce, time-consuming to produce and costly to scale.

To overcome this barrier, the research team leveraged a resource already embedded in routine clinical workflows: radiology reports. Every cardiac MRI exam is paired with a written summary in which clinicians document key findings in an “impression” section. Instead of relying on manual labels, the team trained CMR-CLIP to align MRI image sequences with these natural language clinical summaries, enabling the model to learn directly from how physicians describe and interpret scans in practice.

Rather than treating cardiac MRI as a collection of static images, CMR-CLIP represents each study as a video of the beating heart. The model processes multiple standard views of the heart alongside time-resolved sequences that capture motion and tissue behavior. This lets the model capture both structure and movement, much like a cardiologist does when reviewing a scan.

Trained on more than 13,000 de-identified real patient studies from Cleveland Clinic, the system learned from over a million images and hundreds of thousands of motion sequences collected over more than a decade. When tested, CMR-CLIP was able to identify cardiac conditions in a “zero-shot” setting, meaning it had never been directly trained on those specific labels, simply by matching images to descriptive prompts like “enlarged left ventricle.”

Example of the input data, where the number in parentheses indicates how many frames each modality–view pairing contributes.

Image source: Nakashima M, Qiu J, Huang P et al., Nature Communications 2026 (CC BY-NC-ND 4.0)

Even more striking, with just a single example of a condition, CMR-CLIP could often match the performance of other systems that required dozens of labeled cases. In more specialized diagnostic tasks, the model reached near-clinical levels of performance, including accuracy rates as high as 99% for certain heart conditions. It also demonstrated the ability to search through large databases of scans using natural language, retrieving similar cases in a way that could one day help clinicians quickly compare patients with rare or complex presentations.

A key test of whether the system was truly learning meaningful representations came when it was evaluated outside the institution where it was trained. The model still performed strongly on two entirely separate datasets (one collected in France, one in Cleveland Clinic Florida), suggesting it could generalize beyond a single hospital system.

“This work highlights a new direction for medical AI by showing how large-scale clinical data can be used to train models without requiring time-consuming manual labeling,” said Deborah Kwon, M.D., Director of Cardiac MRI at Cleveland Clinic, clinical lead and co-author of this study. “This technology has the potential to not only improve efficiency but also quality of reporting to support more consistent and clinically meaningful interpretations, as well as serve as an important teaching tool in a highly specialized and complex imaging field.”

Looking ahead, the research team plans to extend the model to additional cardiac imaging sequences, including perfusion imaging, T2-weighted imaging and parametric mapping, as well as explore applications in automated report generation and interactive clinical decision support systems in resource limited applications.

Source: Cleveland Clinic

23.05.2026