Tapesh Santra is Junior Group Leader and represents Systems Biology Ireland who made it to the DatSci Awards finals for the Best Use of Data Science in an Academic Research Body

In his article, Tapesh describes the achievements of Systems Biology Ireland and where Data Science fits into the treatment against cancer.

The curse of cancer

Cancer is a major cause of death, around one in five of all deaths are now attributed to cancer. Available drugs for treating cancer are very expensive and often do not work.  A typical cancer drug costs between 30,000-20,0000 euros per patient per annum and has response rates between 10% -50%. Which means, not all drugs work on all patients. Many patients initially respond to treatment but develop resistance at a later stage. Currently there are no ways of predicting whether a drug will work on a patient or not without treating the patient with the drug and waiting to see what happens. This not only endangers the lives of the patients, but also puts enormous financial burden on the patients and/or their healthcare providers. One way to resolve this problem is to develop better ways of determining which drugs will be effective for a patient.

How do doctors decide how to treat a cancer patient?

Let’s have a look at how a treatment regimen is determined in a typical cancer hospital. Once a patient is diagnosed with cancer, he/she undergoes a series of scans and tests. The scans provide a glimpse inside a patient’s body to determine the stage and/or spread of the tumour. The tests measure whether the patient has certain mutations (some nasty ones are BRAF, KRAS, P53 etc.) and the levels of some hormone receptors (sensors that sense hormone levels in your blood). All in all, for each patient, about 10-15 tests and scans are performed which are then examined by a doctor who determines a treatment regimen based on his/her experience.

Why cancer treatments fail, what’s missing?

Research has shown that progression and treatment response of cancer depend on many factors. For instance, the age, gender, ethnic background, geographic location, addictions, stress, diet, heredity, clinical history and molecular features such as genomic, epigenetic, proteomic makeup of a patient may all contribute to his/her disease. But most of these aspects are currently not taken into account when determining a treatment regimen. It is now possible to measure between 100000-1000000 molecular features in a patient’s tumour. This, along with socio-economic and clinical profile of a patient provide a significantly more comprehensive characterization of the state of his/her disease than the current diagnostic scans and tests. It is simply not feasible for a human doctor to sift through and interpret all these information in order to determine a treatment option. This is where we come in. We develop statistical machine learning algorithms and mathematical models to analyse this data in order to predict the best course of treatment for individual patients.

The promise of data science

The most straight forward approach is to gather molecular profiles, clinical profiles and treatment histories for as many patients as possible who have already undergone cancer treatment. Then we identify a few molecular or clinical (mutations, methylations, gene expressions etc.) features, out of hundreds of thousands of measurements, that are associated with the response of cancer patients to a certain drug using feature selection algorithms. The selected features are then used to build and train classifiers which can predict the response of a new patient to the same drug.

Pushing the boundaries, reverse engineering tumour cells

However, it is rarely the case that we have such comprehensive datasets for enough number of patients to build a classifier. Typically we have the molecular profiles for a large number of patients but their treatment histories are sporadic, incomplete and sometimes missing.  In such cases, we first group together patients with similar molecular profiles using specially designed clustering algorithms. Then we try to predict optimal treatment for each group of patients. To determine whether a drug will work on a group of patients, we need to have a clear understanding of the internal machinery of their tumour cells. At a very abstract level, a cell is like a computer chip, e.g. microprocessors which are made of millions of tiny electronic components like diodes, transistors, resistors etc. These components are connected to each other via microscopically thin wires in very specific ways that make the whole chip work.

Likewise, cells in our body are made of ~ 40000 components (genes, proteins, microRNAs etc.) which interact with each other. Trouble is, the wiring diagram of these interactions are not known, and in tumour cells these wirings are different from normal cells. Therefore, we first reverse engineer the wiring diagram of the tumour cells using the molecular profiles of patients who have similar types of tumour. These molecular profiles are fairly noisy and sometimes incomplete. Trying to reconstruct the molecular wirings using such data is like solving a ~40000 piece jigsaw puzzle in a dimly lit room without knowing how the complete picture looks like. In our laboratory, we developed a series of algorithms which can solve such problems in reasonable time. The reconstructed wiring diagrams are then used to build mathematical models (using ordinary, stochastic and partial differential equations) that work as virtual tumour cells. These models are then used to simulate how the effects of drugs propagates through internal machinery of the tumour cells and to what end. Once we find a drug that kills these virtual cells, we test it in biochemical experiments using real tumour cells.

Sometimes we get the opportunity to study individual cancer patients who are undergoing treatment. Tumour samples from these patients are cultured in lab and a number of experiments are performed on these cells. It is not possible to experimentally test all existing cancer drugs and their combinations on the cells since these experiments are time consuming and expensive. Therefore, the cells are typically treated with a manageable number of drugs and their molecular profiles are subsequently screened. This data is then used to reverse engineer tumour cells of an individual patient and develop computer models of these cells. These models allow us to simulate the effects of different drugs and drug combinations on the patient.

data science medical field

Figure 1: Our approach of combining heterogeneous data to develop patient specific (personalized) treatment regimens for cancer patients.


What have we achieved? 

In last few years, we developed a series of algorithms to solve emerging problems encountered in modern molecular biology, medical, clinical and healthcare research. We applied these algorithms to understand the causes and find potential cures for several diseases, especially cancer.  We discovered new molecular insights into why some cancers are difficult to treat and spread to other organs, which genes are responsible for tumour growth, which genes can stop tumour growth, why some cancers develop resistance to treatments etc.

We also predicted new treatment options for patients who do not respond to many of the existing treatments. For instance, in a recent study we successfully predicted that a cocktail of two drugs (BIBX and p70S6K inhibitor) can efficiently kill and stop the spread of a very aggressive strain of drug resistant colon cancer. In another study, we successfully predicted that some drug resistant breast cancer patients may respond to treatment if their treatment is combined with high doses of vitamin-D.

Our predictions were initially validated in lab and we are now planning to apply for clinical trials to test if our discoveries can be used to help patients who are undergoing treatment. We are also bringing machine learning and data science closer to the clinic than we could have imagined even a year ago. In a new collaboration with the Mainz Paediatric Oncology Hospital, Germany, we are helping to determine the next line of treatment for paediatric brain tumour patients who are currently under palliative care, by analysing their whole genome screens.

How do we see our work transforming modern healthcare?

Our primary weapon against diseases are drugs. But, here is a scary statistics. The top 10 selling drugs in the U.S. has response rates between 4-25%, i.e. vast majority of people who take these drugs do not benefit from them. Cancer drugs are no exception. Only difference is, when a cancer patients fail to respond to drugs the stakes are astronomically higher. It will save us vast sums of money and countless lives if we could simply figure out, a priori, which drug will work on whom.

A more precise term for this is “personalized medicine”, i.e. developing treatment regimens tailored to individual patients. This is the direction our healthcare systems are gradually moving towards. Systems Biology Ireland, a research centre in University College Dublin, Ireland, is playing a leading role in developing technologies that are essential for personalized medicine to grow into its full potential.

It consists of several research groups which specialize in different aspects of personalized medicine. In our group, we develop statistical machine learning algorithms which form the core predictive platform that analyses data from various sources and predict patient specific treatment options. The research performed in our group and S.B.I. in general is part of worldwide effort in laying the groundwork for future medicine that will shape the healthcare system for generations to come.

You can be part of the DatSci Awards as well on the 21st of September in Croke Park, Dublin, Ireland. Be sure to get your ticket for a great opportunity to talk and learn from over 400 leading Data Science professionals in the Data Science community!