The imec platform encompasses the full analysis pipeline from data preparation to variant calling on a similar hardware infrastructure, opening new opportunities and efficiency gains for hospitals and medical practitioners.
Image source: imec
“This is the breakthrough we have been anticipating for years. Finally, we can run the entire DNA analysis pipeline with a single software platform solution, and faster than ever,” said imec researcher Dr. Charlotte Herzeel. “Because variant calling is the most complex step, gathering results up to 16 times faster than the previous method has resulted in a four- to nine-fold reduction in time, all while retaining GATK4-identical results. For the medical sector, this allows massive efficiency gains because the time between sampling and diagnosis dramatically decreases and doctors can run analyses overnight. Moreover, since many hospitals run their analyses via rented cloud solutions, the reduced throughput times can immediately result in a cost reduction per analysis.”
After a DNA sample is sequenced, there are hundreds of gigabytes of data representing the genetic information of the original sample, which, in the sequencing process, was cut into a multitude of smaller fragments. These fragments have to be reconstructed to a representation of the original DNA sample. Afterwards, an analysis is performed to, for example, detect genetic variants in comparison to a known reference model, and elPrep 5 is specifically designed to optimize this variant calling analysis.
Gene sequencing has been progressing in leaps and bounds over the past few years. The process of determining the order of nucleotides in DNA has become faster and more precise. Moreover, today both microbial DNA and host DNA can be identified in the blood sample of an infected patient, isolated and sequenced. “We are facing a technological revolution, but we still struggle with actually…
Performing this analysis is a computational-heavy challenge. Despite substantial cost reductions for DNA analysis over the past decade, runtimes — up to two to three days for a whole genome — were still prone to improvement. Now, imec’s elPrep5 can perform a whole genome analysis within a few hours without compromising the quality of the output. Extensive validations show completely identical outputs to its industry counterparts in GATK, SAMtools and Picard.
By taking advantage of its parallel execution framework, elPrep5 performs the complete analysis after a single pass through the data. This architecture avoids the intensive read and write processes of fragments of data in and out of the memory. elPrep5 is written in Go, an open-source programming language developed by Google, and can be run on standard servers that most hospitals have locally or in the cloud. ElPrep5 extends and improves the elPrep4 functionality and performance by including variant calling as the final step to encompass the whole DNA analysis pipeline and by realizing additional efficiency gains in the process.
ElPrep5 targets users in the pharmaceutical industry, scientific research, medical laboratories, sequencing service providers, sequencing vendors and hospitals. The speedups brought by elPrep5 enable these industries to move from research runs into clinical practice and further scale their operations. Several industrial partners have already expressed interest to integrate elPrep5 into their daily operations.