Green human silhouettes in a row, against a background of genomic data bars. In...

Image source: Karen Arnott/EMBL-EBI and Isabel Romero Calvo/EMBL

News • Machine learning analysis

Cancer genomics: new ML tool weeds out false positives

SAVANA uses a machine learning algorithm to identify cancer-specific structural variations and copy number aberrations in long-read DNA sequencing data.

Long-read sequencing technologies analyse long, continuous stretches of DNA. These methods have the potential to improve researchers’ ability to detect complex genetic alterations in cancer genomes. However, the complex structure of cancer genomes means that standard analysis tools, including existing methods specifically developed to analyse long-read sequencing data, often fall short, leading to false-positive results and unreliable interpretations of the data. These misleading results can compromise our understanding of how tumours evolve, respond to treatment, and ultimately how patients are diagnosed and treated. 

To address this challenge, researchers developed SAVANA, a new algorithm, which they recently described in the journal Nature Methods. SAVANA uses machine learning to accurately identify structural variants – large genomic alterations such as insertions, deletions, duplications, or rearrangements – and the resulting copy number aberrations in cancer genomes – using long-read sequencing data.

By training the algorithm directly on long-read sequencing data from cancer samples, we created a new method that can tell the difference between true cancer-related genomic alterations and sequencing artefacts

Isidro Cortes-Ciriano

It is important to have the right tool for the job. SAVANA is tailored for the task and designed to efficiently deliver reliable results, the researchers state. This algorithm was developed and tested across 99 human tumour samples by researchers at EMBL’s European Bioinformatics Institute (EMBL-EBI) and the R&D laboratory of Genomics England, in collaboration with clinical partners at University College London (UCL), the Royal National Orthopaedic Hospital (RNOH), Instituto de Medicina Molecular João Lobo Antunes, and Boston Children’s Hospital. “Because other analysis tools are not developed to account for the particularities of cancer genomics data, they often pick up false positives that could lead to incorrect clinical and biological interpretations,” said Isidro Cortes-Ciriano, Group Leader at EMBL-EBI. “SAVANA changes this. By training the algorithm directly on long-read sequencing data from cancer samples, we created a new method that can tell the difference between true cancer-related genomic alterations and sequencing artefacts, thereby enabling us to elucidate the mutational processes underlying cancer using long-read sequencing with unprecedented resolution.” 

“When we developed SAVANA, our focus was clear: create a tool sophisticated enough to characterise complex cancer genomes but practical enough for clinical use,” explained Hillary Elrick, former Predoctoral Fellow at EMBL-EBI and Postdoctoral Fellow at the Francis Crick Institute. “As a result, SAVANA can accurately distinguish somatic structural variants, copy number aberrations, tumour purity, and ploidy – all key to understanding tumour biology and guiding clinical treatment decisions,” added Carolin Sauer, Postdoctoral Fellow at EMBL-EBI.

Recommended article

Photo

Article • Awareness

Focus on cancer

From solid tumors to metastatic carcinomas and leukemia: cancer is among the most common causes of death. Keep reading for latest developments in early detection, staging, therapy and research.

Its rapid analysis and robust error correction make SAVANA well suited for clinical use. The method was recently applied to study osteosarcoma, a rare and aggressive bone cancer that mostly affects young people, where it helped researchers uncover new genomic rearrangements, providing novel insights into how osteosarcoma evolves and progresses. The team also compared SAVANA’s results from long-read data with Illumina sequencing of the same samples analysed using a whole-genome sequencing data analysis pipeline used to deliver clinical reports. The findings were highly consistent across technologies, demonstrating that SAVANA performs on par with current clinical standards while revealing additional cancer-relevant alterations. 

“The capability to accurately detect structural variants is transformative for clinical diagnostics,” said Adrienne Flanagan, Professor at UCL, Consultant Histopathologist at RNOH. “SAVANA could help us confidently identify genomic alterations relevant for diagnosis and prognosis. Ultimately, this means we would be better placed to deliver personalised treatments for cancer patients.” 

The UK is investing significantly in genomic sequencing technologies as part of the NHS Genomic Medicine Service. This initiative is the first in the world to offer whole genome sequencing as part of routine care. By embedding genomics into everyday clinical practice, it aims to improve diagnostic accuracy and support personalised cancer treatments. However, investments in clinical genomics will only achieve their intended impact if genomic data are interpreted accurately, and this relies on specialised analytical tools. Genomics England explored SAVANA’s use as part of its work looking at the clinical potential of long-read sequencing technology to support earlier, faster diagnosis of cancer. “Using SAVANA will ensure clinicians receive accurate and reliable genomic data, enabling them to confidently integrate advanced genomic sequencing methods such as long-read sequencing into routine patient care,” said Greg Elgar, Director of Sequencing R&D at Genomics England. 

SAVANA is also being deployed as part of nationwide initiatives, such as the UK Stratified Medicine Paediatrics project funded by Cancer Research UK and Children With Cancer UK, and co-led by Cortes-Ciriano. This project is focused on developing more efficacious and less toxic treatments for childhood cancers using advanced sequencing technologies to better understand tumour biology and monitor disease recurrence. Additionally, SAVANA is being used in Societal, Ancestry, Molecular and Biological Analyses of Inequalities (SAMBAI), a Cancer Grand Challenges funded project aimed at addressing cancer disparities in recent African heritage populations. 


Source: European Bioinformatics Institute at the European Molecular Biology Laboratory

30.05.2025

Related articles

Photo

News • Real-time tumor profiling

AI tool decodes brain cancer’s genome during surgery

Scientists have designed an AI tool that can rapidly decode a brain tumor’s DNA to determine its molecular identity during surgery — critical information that can guide treatment decisions.

Photo

News • Identificación de mutaciones tumorales

El aprendizaje automático impulsa la medicina personalizada del cáncer

El laboratorio de Genómica Biomédica del IRB Barcelona (Institute for Research in Biomedicine) ha desarrollado un método computacional que identifica las mutaciones causantes del cáncer para cada…

Photo

News • Tool to identify tumour mutations

Machine learning fuels personalised cancer medicine

The Biomedical Genomics laboratory at the Institute for Research in Biomedicine (IRB) Barcelona has developed a computational tool that identifies cancer driver mutations for each tumour type. This…

Related products

Subscribe to Newsletter