The Data Science Programme

A forward-looking medical-scientific programme to centralise, structure and analyse data in order to speed up knowledge acquisition in the patient’s best interest.

Managers :

Marc Deloger
Sergey Nikolaev


Frise Banner: 
The Data Science Programme
Cancer is a complex disease. It is vital to cross-reference the opinions of professionals in the various care and research disciplines (oncologists, biologists, geneticists, anatomopathologists and radiologists, etc.) in order to gain the most comprehensive overview and offer each patient the best therapeutic option or direct them towards an appropriate clinical trial. The aim of the "DATA SCIENCE" programme is to draw on a large volume of data relating to the diagnosis and management of previously treated patients to facilitate comparison. The creation of a Data Analysis Centre will generate the ‘twin patient’ concept, leading to the creation of a digital avatar of each patient in a bid to direct patients to optimum treatment from diagnosis onwards.
The "DATA SCIENCE" medical-scientific programme was developed as part of Gustave Roussy's institutional strategic plan by Marc Deloger and Sergey Nikolaev, research scientists in bioinformatics. It is based on an entirely dedicated infrastructure comprising the Data Analysis Centre and a new approach to research and care based on previous experience in order to better understand and treat patients both now and in the future.   

To create a unique cancer data pool

Each year, nearly 50,000 patients attend the Institute for a consultation. This generates a vast amount of information that needs to be centralised and organised in order to create a database/knowledge base that is useful for research and scientifically acclaimed at international level. "DATA SCIENCE" aims to make the data readily accessible to researchers and clinicians in the interests of conducting research.
In order to meet this challenge, "DATA SCIENCE" has been structured around the Data Analysis Centre, which is responsible for extracting, preparing, using and enhancing all of the data generated to date (blood tests, genetic sequencing, anatomopathology slides, MRI images, scans, medical or administrative reports, etc.), particularly in the context of patient cohorts participating in clinical trials (adults, paediatrics and personalised medicine). The multidisciplinary team is working on structuring and developing all the data and knowledge accumulated at the Institute over the last 20 years in a bid to create a catalogue of data useful to research scientists and clinicians alike.

To become the reference centre for analysing oncology data via the Data Analysis Centre

The creation of this catalogue allows information to be shared, increases collaboration and training, and contributes to the creation of an IT bubble to guarantee the security and traceability of data flows in line with French and European legal requirements.
Advances in the treatment and diagnosis of cancer as well as the development of innovative disciplines such as artificial intelligence and the application of machine learning and deep learning methods, bioinformatics and applied mathematics are key assets that will make this project a resounding success, overlapping all of the medical-scientific programmes implemented at Gustave Roussy. The programme should highlight new specific biomarkers or predictive factors for treatment efficacy, in the patient’s best interest. 

From twin patient to the creation of a digital patient avatar

This catalogue of data will introduce the "twin patient" concept. This involves searching for patients with the same genomic, radiological, pathological and clinical characteristics. Treatment for today’s patients can be improved by observing the outcomes of previous patients. This will facilitate the selection of optimum therapeutic strategies from diagnosis onwards.
Research scientists can use this database to integrate all of the data needed to create a digital avatar of each patient. Studying the avatar and comparing it to a twin patient will determine the potential clinical course of a patient’s disease. This approach is particularly effective in the management of rare diseases or anomalies. The aim is to create several cancer patient digital twins to compare them with other identical cases. Gustave Roussy will be able to draw on the expertise of international research scientists in order to develop this strategy. Three initial projects are already underway:
  • The first, led by Elsa Bernard, seeks to understand how tumour heterogeneity influences the fate of digital twins. The aim is to develop a digital tumour heterogeneity test for each patient. This tool should guide clinicians towards the most effective treatment.
  • The project led by Sergey Nikolaev identifies the drivers of cancer progression in a digital format using complex bioinformatics methods.
  • The last project seeks to identify the immune environment of the tumour in a digital format in order to recognise the mechanisms used by tumour cells to better evade the immune system and thus promote their development.
The creation of digital cancer cohorts of patients with the same tumour heterogeneity, cancer drivers and immunological environment will be a valuable resource providing oncologists with new information and ensuring better patient management. 
Catégorie de la page: