Article • Datasets

Benefits from The Cancer Genome Atlas

Report: Lisa Chamoff

Last year, scientists at the University of California San Francisco (UCSF) revealed that by measuring the proportion of both immune and cancerous cells in tumours, or ‘tumour purity,’ clinicians could more precisely predict the success of certain precision therapies. A key aspect of the discovery was access to over 10,000 samples constituting 21 different Cancers.

This wealth of data came from The Cancer Genome Atlas, a vast public resource, which is a joint initiative of the National Cancer Institute and the USA’s National Human Genome Research Institute. The database contains genomic profiles of over 10,000 cancer patients, including mutations, copy number variations, gene expression, DNA methylation and proteins profiling, as well as pathology slides and full clinical information.

‘There’s no equivalent to this resource with so many “omics” character ratios for so many patients,’ says Dvir Aran, a research scientist in the laboratory of Atul Butte at UCSF’s Institute for Computational Health Sciences. Aran will discuss the team’s use of the database in his talk, ‘Studying the Tumour Microenvironment with Big Data’, at the Cambridge Healthtech Institute’s Liquid Biopsy Summit (June. S.F.) For the study, published in the 4/11/2015 issue of Nature Communications, the team combined four previously developed algorithms to estimate tumour purity. They found ‘immense differences’ in tumour purity levels among cancer types and among patients with the same type of cancer, which may be a major factor when analysing gene expression, Aran says.

‘For example, we measure an expression of a gene that’s expressed only from the cancer cells, but one of the samples is 90 percent pure and another is only 50 percent. A naive interpretation might suggest that the first sample is activating this gene, while this is only a result of the difference in purity,’ Aran explains. ‘Our analysis showed that this biases interpretations and leads to false conclusions of many bioinformatics analyses, such as constructing co-expression networks, clustering tumours to molecular subtypes and finding genes that are (differentially) expressed in tumours compared to normal samples.’

The estimate has an impact on tests used to predict the effectiveness of checkpoint inhibitor drugs, a popular cancer immunotherapy, the researchers found. When immune cell infiltration was measured, predicting the likely success of this expensive treatment was much more accurate.

‘In this study we showed that the influence of tumour purity on the results of genomic analyses is much stronger than previously appreciated, and ought to be included as a covariate in any future analysis,’ Aran says. ‘Tumour purity differences resulting from sampling variation exceed intrinsic individual differences. Lower purity samples, by influencing genomic data, may make precision medicine efforts more challenging. We urge cancer researchers and clinicians to take tumour purity in to account when analysing genomic data from patient samples.’

The Cancer Genome Atlas (TCGA) is a project to catalogue genetic mutations responsible for cancer, using genome sequencing and bioinformatics.

For bioinformaticians like me, this is a goldmine
Dvir Aran

Studying the tumour as a whole, including the non-cancerous cells that are part of it, was not possible before TCGA was available, Aran points out. Researchers previously used genomic data from cell lines, which characterise only the cancer cells, and not tissue from real patients.

Announcing the study, Butte said datasets like TCGA are ‘the ultimate commodity. Unlike oil or water that can only be used once, data continually generates new insights.’

The data, which is completely free, can be downloaded from the TCGA data portal. For the study, Aran and team combined DNA, RNA, epigenetic and pathology analyses to create a consensus measurement of tumour purity, which is the percentage of cancer DNA in the sample, and studied the measurement in the context of the patient’s clinical information.

‘For bioinformaticians like me, this is a goldmine,’ Aran says.

27.04.2016