Four sets of xray images, from thorax, neck, left hand, and spine
Anatomy-matched real and GPT-4o-generated radiographs: (A) real and (B) GPT-4o-generated posteroanterior chest radiographs, (C) real and (D) GPT-4ogenerated lateral cervical spine radiographs, (E) real and (F) GPT-4o-generated posteroanterior hand radiographs, and (G) real and (H) GPT-4o-generated lateral lumbar spine radiographs. The pairs demonstrate that GPT-4o can produce radiographically plausible images across different anatomic regions.

Image credit: Radiological Society of North America (RSNA)

News • The rise of deepfake medical imaging

Real or fake? AI-generated X-rays have become suprisingly convincing (Quiz yourself!)

Findings highlight potential risks and offers strategies to identify synthetic images

The results were published in Radiology, a journal of the Radiological Society of North America (RSNA). The findings highlight the potential risks associated with AI-generated X-ray images, along with the need for tools and training to protect the integrity of medical images and prepare health care professionals to detect deepfakes. 

The term “deepfake” refers to a video, photo, image or audio recording that appears real but has been created or manipulated using AI. 

Deepfake medical images often look too perfect. Bones are overly smooth, spines unnaturally straight, lungs overly symmetrical, blood vessel patterns excessively uniform, and fractures appear unusually clean and consistent, often limited to one side of the bone

Mickael Tordjman

“Our study demonstrates that these deepfake X-rays are realistic enough to deceive radiologists, the most highly trained medical image specialists, even when they were aware that AI-generated images were present,” said lead study author Mickael Tordjman, M.D., post-doctoral fellow, Icahn School of Medicine at Mount Sinai, New York. “This creates a high-stakes vulnerability for fraudulent litigation if, for example, a fabricated fracture could be indistinguishable from a real one. There is also a significant cybersecurity risk if hackers were to gain access to a hospital’s network and inject synthetic images to manipulate patient diagnoses or cause widespread clinical chaos by undermining the fundamental reliability of the digital medical record.” 

Seventeen radiologists from 12 different centers in six countries (United States, France, Germany, Turkey, United Kingdom and United Arab Emirates) participated in the retrospective study. Their professional experience ranged from 0 to 40 years. Half of the 264 X-ray images in the study were authentic, and the other half were generated by AI. Radiologists were evaluated on two distinct image sets, with no overlapping between the datasets. The first dataset included real and ChatGPT-generated images of multiple anatomical regions. The second dataset included chest X-ray images—half authentic and the other half created by RoentGen, an open-source generative AI diffusion model developed by Stanford Medicine researchers. 

When radiologist readers were unaware of the study’s true purpose, yet asked after ranking the technical quality of each ChatGPT image if they noticed anything unusual, only 41% spontaneously identified AI-generated images. After being informed that the dataset contained synthetic images, the radiologists’ mean accuracy in differentiating the real and synthetic X-rays was 75%. 

Examples of GPT-4o-generated radiographs of fractures: (A) posteroanterior...
Examples of GPT-4o-generated radiographs of fractures: (A) posteroanterior radiograph of the hand, (B) posteroanterior radiograph of the lower leg, and (C) medial oblique radiograph of the foot. The images show fracture lines (arrow) that are unusually smooth, clean, and consistent and, in the case of B, unicortical. The presence of these idealized fracture lines, characterized by unnatural smoothness and incomplete cortical disruption, could serve as a primary diagnostic cue for identifying artificial intelligence–generated trauma images.

Image credit: Radiological Society of North America (RSNA)

Individual radiologist performance in accurately detecting the ChatGPT-generated images ranged from 58% to 92%. Similarly, the accuracy of four multimodal LLMs—GPT-4o (OpenAI), GPT-5 (OpenAI), Gemini 2.5 Pro (Google), and Llama 4 Maverick (Meta)—ranged from 57% to 85%. Even ChatGPT-4o, the model used to create the deepfakes, was unable to accurately detect all of them, though it identified the most by a considerable margin compared to Google and Meta LLMs. 

Radiologist accuracy in detecting the RoentGen synthetic chest X-Rays ranged from 62% to 78% and the LLM models’ performance ranged from 52% to 89%. There was no correlation between a radiologist’s years of experience and their accuracy in detecting synthetic X-ray images. However, musculoskeletal radiologists demonstrated significantly higher accuracy than other radiology subspecialists. 

The study identified common features of synthetic X-rays. "Deepfake medical images often look too perfect,” Dr. Tordjman said. “Bones are overly smooth, spines unnaturally straight, lungs overly symmetrical, blood vessel patterns excessively uniform, and fractures appear unusually clean and consistent, often limited to one side of the bone." 

Recommended solutions to clearly distinguish real and fake images and help prevent tampering include implementing advanced digital safeguards, such as invisible watermarks that embed ownership or identity data directly into the images and automatically attaching technologist-linked cryptographic signatures when the images are captured. 

“We are potentially only seeing the tip of the iceberg,” Dr. Tordjman said. “The logical next step in this evolution is AI-generation of synthetic 3D images, such as CT and MRI. Establishing educational datasets and detection tools now is critical.” 

The study’s authors have published a curated deepfake dataset with interactive quizzes for educational purposes. 


Source: Radiological Society of North America 

24.03.2026

Related articles

Photo

News • Chest X-ray evaluation

Human readers still outperform AI in lung disease identification

Reports of AI gaining the upper hand in diagnostic imaging interpretation are piling up, but there are still areas where the eye of a trained human radiologist remains superior.

Photo

News • Deep learning analysis of X-rays

AI used to triage patients with chest pain

Artificial intelligence (AI) may help improve care for patients who show up at the hospital with acute chest pain, according to a new study published in Radiology.

Photo

News • Critical check for deep learning models

AI labeling in radiology: filling in the missing step

Researchers at Osaka Metropolitan University have discovered a practical way to detect and fix common AI labeling errors in large collections of radiology images.

Subscribe to Newsletter