Data Science in healthcare

Thanks to the abundance of data available, Data Science is revolutionising the health sector. Find out how data analytics and AI are transforming healthcare, and how to become a Healthcare Data Scientist…

Illustration of Data Science in Healthcare

The health sector generates immense amounts of data. According to a study conducted by the Ponemon Institute, this domain alone accounts for 30% of global data.

Medical records, clinical trials, genetic information, invoices, connected objects, databases, scientific articles are just some of the many data sources available to the medical community.

With the rise of tele-consultations and internet research related to health, the volume of data is literally exploding. For industry professionals, patient data is now centralized and more accessible than ever before.

We now speak of “quantified health” to designate the integration of data from connected objects such as connected bracelets, and accessories such as glucometers and scales in medical records through smartphones.

This is what platforms like Apple HealthKit and Google Fit offer . Thanks to these resources, it is now possible to quickly detect alarming signals and carefully monitor changes in behavior and vital indicators.

All this data can be used by health professionals , and opens up a multitude of possibilities. Find out how Data Science is changing the medical field.

Drug discovery

On average, it takes $2.6 billion and 12 years to create a drug and bring it to market. However, Data Science makes it possible to drastically reduce the cost and the time.

Using the data, scientists can now simulate how a drug reacts with the body's proteins and different cell types. According to Mark Ramsey, Chief Data Officer of pharmaceutical giant GSK, the process could be reduced to less than two years thanks to this simulation method.

Several startups are also exploring this avenue. For example, London-based BenevolentAI has raised $115 million to launch more than 20 drug creation programs and develop an artificial brain capable of creating new drugs and treatments.

Disease prevention

Prevention is better than cure, the saying goes. Thanks to connected objects and other tracking devices, taking into account the history and genetic information of the patient, it is possible to detect a problem before it becomes uncontrollable .

The company Omada Health, for example, uses connected accessories to create personalized behaviour plans and online coaching to help prevent chronic diseases such as diabetes , hypertension and cholesterol.

For its part, Propeller Health has created an inhaler usage tracker using GPS to link data from individuals at risk with environmental data from the US CDC. The aim is to provide interventions for asthmatics.

The Canadian startup Awake Labs, meanwhile, collects data from autistic children through connected accessories. Parents can thus be alerted in the event of a risk of crisis.

Artificial intelligence has made it possible several times to detect diseases at an early stage. Researchers at the University of Campinas, Brazil, have developed an AI platform to diagnose the Zika virus using metabolic markers.

Diagnosis of Diseases

At present, doctors' diagnoses are unfortunately still often incorrect. According to the National Academies of Sciences, Engineering, and Medicine, approximately 12 million Americans are misdiagnosed.

The consequences can sometimes be fatal. According to a BBC survey, misdiagnoses cause between 40,000 and 80,000 deaths a year .

However, Data Science makes it possible to greatly improve the accuracy of diagnoses. This is particularly the case for medical imaging analysis .

Computers can learn to interpret MRIs , X-rays, mammograms and other types of x-rays. The machine learns to identify patterns in this visual data, and will then be able to detect tumors, arterial stenosis and other abnormalities with an accuracy often surpassing that of human experts.

Without even going as far as the automated analysis of medical imaging, Data Science makes it possible to increase the size of an image or improve its definition . Interpretation will be easier for human experts.

Additionally, Stanford University researchers have developed data-driven models for detecting heart rhythm irregularities from electrocardiograms faster than a cardiologist . Other models are able to distinguish benign marks on the skin from malignant lesions.

The company Iquity, which develops a predictive analysis platform for the health sector, carried out a study by analyzing four million data points on 20 million New Yorkers.

By combining data from patients who have received a diagnosis – erroneous or not – of multiple sclerosis, Iquity has managed to predict with 90% accuracy the onset of a disease eight months before it can be detected with traditional tools.

Meanwhile, Microsoft researchers analyzed web search data from 6.4 million Bing users whose search results suggested they had pancreatic cancer.

They then reviewed keywords from their previous searches , such as weight loss or blood clots. It is therefore possible to use search engines to anticipate the diagnosis of pancreatic cancer.

Personalization of treatments

Thanks to Data Science, it is also possible to offer better targeted and personalized treatments. It is possible to take into account the subtle differences between each of us for a better efficiency of the care delivered.

For example, the National Institute of Health's 1000 Genome Project is an open study of regions of the genome associated with common diseases like diabetes or coronary heart disease. This study allows scientists to better understand the complexity of human genes and how a specific treatment will best suit an individual.

For their part, Emory University and Alfac Cancer Treatment have entered into a partnership with NextBio to study the malignant brain tumor of the medulloblastoma type. While radiation therapy was once the only treatment for this cancer, the analysis of a patient's genetic and clinical data now makes it possible to discover specific biomarkers to offer personalized treatment.

The MapReduce tool can read genetic sequences and reduce the time required for data processing. The SQL language is used to restore genomic data, manipulate “BAM” files and process data.

Follow-up of patients after returning home

Every operation or treatment can lead to side effects, complications or recurring pain. It can be difficult to track and monitor these phenomena after a patient has left the hospital.

Data Science allows doctors to continue monitoring patients remotely in real time after they return home. For example, the Cloudera software can predict a patient's chances of readmission within 30 days based on their medical data and the socioeconomic status of the region where the hospital is located.

For its part, SeamlessMD is developing a platform for post-operative care . This platform has enabled Healthcare System Saint Peter in New Jersey to reduce the average length of stay post-surgery by one day.

This represents a savings of $1,500 for each patient , who only has to indicate their level of pain in the app each day and let caregivers monitor the evolution over time. In the event of a potential problem, the application also issues alerts.

AI-enabled mobile apps can also help patients. Chatbots , or virtual voice assistants , can communicate with them. The patient can describe their symptoms or ask questions, and receive valuable information drawn from a vast network linking symptoms to diseases.

These applications can also remind the patient to take their medication on time, and arrange an appointment with a doctor if necessary. Among the most popular are the Woebot chatbot developed by Stanford University to help depressive patients, or the virtual assistant from the Berlin startup Ada which predicts illnesses based on symptoms.

Hospital Management

Hospitals are institutions whose management is complex and difficult. Data analytics help determine precisely how many caregivers need to be on deck at each hour of the day to maximize efficiency.

It also ensures that enough beds are available to meet demand, and much more. Predictive analysis also makes it possible to optimize schedules and streamline emergency services.

At Emory University Hospital, Data Science is used to predict the demand for laboratory tests. This reduces the waiting time by up to 75%.

It is also possible to use Business Intelligence to improve the billing system and identify patients at risk of having difficulty paying. These analyses can be coordinated with insurance and financial departments. For example, the Center for Medicare and Medicaid Services saved $210.7 million through Big Data-based fraud prevention .

Future of Data Science in the medical field

The healthcare industry is undergoing transformation through data science. Pharmaceutical giants, biotech startups, research centers, investors and healthcare establishments are investing heavily in this revolution.

There are still many challenges to overcome. For example, data is often scattered across multiple regions, administrative units, and hospitals. It is therefore difficult to group them into a single system.

In addition, many patients are concerned about the protection and privacy of their personal data . Some private companies are interested in the idea of ​​exploiting this valuable data for advertising targeting purposes. Google has in particular been the subject of legal proceedings for such practices.

Finally, some worry about the disappearance of the relationship between doctors and patients in favor of interactions with machines and algorithms. It is true that human contact is essential in the field of health.

Be that as it may, despite these challenges to overcome, Data Science offers many promises for the future of medicine. As technology develops, new possibilities will emerge…

How to become a Healthcare Data Scientist

The medical field therefore presents itself as an ideal field for Data Science. We now speak of “Health Data Science” or science of health data to designate the generation of “data-driven” solutions to the problems of the world of health. It is an emerging discipline, at the crossroads of statistics, computer science and medicine.

Health Data Scientists” or health data scientists are increasingly sought after in the health sector in all countries, both in the public and in the private sector. However, only 3% of American Data Scientists currently work in the medical field.

The role of a Healthcare Data Scientist is to design studies and evaluations, to carry out complex data analyses, or to advise healthcare establishments and caregivers based on the results of their analyses.

It will have to be based on data to predict the effects of drugs , to understand diseases affecting humans. Its role is also to deploy the power of artificial intelligence, and to enrich public health data sets.

This professional can work for government health departments, for hospitals, for universities and research institutes , for pharmaceutical companies, for health insurance or for private companies.

Becoming a Healthcare Data Scientist requires the same skills as a traditional Data Scientist. However, these skills must be coupled with a solid knowledge of the health field.

A Healthcare Data Scientist must have skills in mathematics, quantitative analysis and statistics. He must also be able to communicate with the various players in the medical community . Of course, it is important that he himself understands the concepts offered to this sector thanks to knowledge of medicine, epidemiology or virology.

Some companies offer specialized programs. For example, Harvard University has developed a Masters in Health Data Science . This 18-month program allows you to learn more specifically about the analysis and use of health data to meet the greatest challenges in this field.

More Reading | Reference

[1] Datascientest

[2] Wikipedia

[3] MIT News

[4] Towardsdatascience

[5] tdwi

[6] Kaggle