One of the most significant and difficult industries in the public sector is healthcare and healthcare management. Working with data and AI in this sector means handling, managing, and using the private and sensitive information of millions of people while at the same time developing new technologies and solutions. When it comes to data sharing and data-driven collaboration, which are crucial for advancing research and improving results, healthcare also encounters numerous challenges and restrictions.
The main data challenges in healthcare
One of the main challenges is data privacy. Healthcare data contains personal information that can reveal identities, diagnoses, treatments, and other confidential details. Sharing this data across different institutions or organizations can pose serious risks of data breaches, identity theft, discrimination, or misuse. Moreover, healthcare data is subject to strict regulations and ethical standards that limit its usage and distribution.
Another challenge is data availability. Healthcare data is often fragmented and siloed across different sources, such as hospitals, clinics, laboratories, pharmacies, or electronic health records (EHRs). This makes it difficult to access and integrate data from different locations and domains. Furthermore, healthcare data is often incomplete or inconsistent due to human errors or system failures.
These challenges hinder the potential of using artificial intelligence (AI) and machine learning (ML) in healthcare applications. AI and ML are powerful tools that can help analyze large amounts of data, discover patterns and insights, make predictions and recommendations, and automate tasks. However, AI and ML require access to sufficient and diverse data sets to train accurate and robust models that can generalize well to new situations.
Real World Federated Learning Examples
Federated learning (FL) is an emerging paradigm that aims to address these challenges by enabling collaborative learning without sharing raw data. FL allows multiple parties (e.g., hospitals) to jointly train a shared ML model by exchanging only model updates (e.g., gradients or parameters) instead of raw data. This way, FL preserves data privacy by keeping the data local at each party while still benefiting from the collective knowledge of all parties. Federated Learning has many advantages for healthcare applications in the public sector:
- Improves the quality and diversity of data by aggregating information from different sources without compromising privacy or security.
- Reduces the cost and complexity of data management by avoiding centralized storage or processing of large volumes of sensitive data.
- Enhances the scalability and efficiency of learning by distributing computation across multiple devices or nodes instead of relying on a single server or cloud.
- Empower innovation and collaboration by enabling cross-institutional or cross-domain learning without legal or ethical barriers.
FL-driven projects have already been applied [1] to various healthcare domains, such as medical imaging, remote health monitoring, genomics, and COVID-19 detection. Some examples are:
- The ABIDE project used FL to train models on sensitive fMRI imaging data for identifying disease biomarkers.
- The iPC [2] project used FL to train models on genomic data for personalized cancer treatment.
- The COVID-Collab project [3] used FL to train models on smartphone sensor data for monitoring COVID-19 symptoms.
Challenges and how Blossom Sky helps to solve them
Federated Learning (FL) has its challenges. To overcome these challenges researchers and FL companies like Databloom are developing novel techniques such as compression, aggregation, encryption, and automated data regulation. Databloom’s flagship product, Blossom Sky, can solve or mitigate some of these challenges. Here are the most asked questions and our answers.
FL requires frequent communication between parties to exchange model updates which can consume bandwidth resources especially when dealing with large models or datasets.
That is true, and that’s why we developed in the first place our FL framework “Blossom Sky”. Blossom Sky organizes communication and minimizes the amount of transmitted data while ensuring that only approved data is used by participating parties. It features a comprehensive user interface that allows multiple parties to collaborate on the same project with changes being tracked and made transparent to the entire team. It can be thought of as the “Google Docs of AI”.
FL involves heterogeneous parties that may have different types of devices (e.g., smartphones vs servers), datasets (e.g., size vs distribution), objectives (e.g., accuracy vs privacy), etc which can affect the convergence and performance of FL algorithms.
Blossom Sky uses Apache Wayang at this core. Apache Wayang is a cross-platform data processing system that aims to decouple the business logic of data analytics applications from concrete data processing platforms such as Apache Flink, Apache Spark, Tensorflow or any other data or AI framework. It is an API-first system designed to fully support cross-platform data processing and enables users to run data analytics over multiple data processing platforms, nodes or devices without changing the native code. This allows for greater flexibility and ease of use of different devices and datasets.
FL still faces security threats such as malicious parties that may tamper with model updates or infer private information from them using various attacks such as poisoning or inference.
This is true as for any AI / ML project, the outcome is only so good as the data behind are. There are several methods to defend against data poisoning attacks in Federated Learning (FL). One approach is to use an isolated forest algorithm to detect anomalies in the data. Another approach is to use a genetic algorithm during the participation stage of FL to find an optimal combination of data that avoids data poisoning attacks. Databloom invests in researching mitigating approaches and develops prototypes with universities and early adopters which will be part of future releases of Blossom Sky.
In conclusion, federated learning is a promising technique that can transform healthcare for the public sector by enabling privacy-preserving collaborative learning across multiple parties without sharing raw data. This way federated learning can unlock new opportunities for innovation research and improvement in healthcare while respecting ethical legal and social values.
Databloom is a federated data access and analytics company that develops the federated analytics platform “Blossom Sky” to enable decentralized AI. It provides fast and interactive enterprise-ready distribution, consisting of additional tooling and configurations, enabling data scientists and analysts to run AI models and training against ultra-sensitive data. Blossom Sky integrates with all major data processing and streaming frameworks, as well as AI systems like Tensorflow, Pandas, PyTorch. Our technology helps unlock new opportunities in the healthcare sector by allowing for more secure and efficient processing of sensitive patient data.
About Databloom
Databloom is a software company that has developed a powerful AI-Powered Data Platform Integration as a Service platform called Blossom Sky. This platform enables users to unlock the full potential of their data by connecting data sources, enabling generative AI, and gaining performance by running data processing and AI directly at independent data sources. Blossom Sky allows for data collaboration, increased efficiency, and new insights by breaking data silos in a unified manner through a single system view. The platform supports a wide range of ML and AI algorithms and is designed to adapt to a wide variety of AI algorithms and models.
[1]: The future of digital health with federated learning | npj Digital Medicine (nature.com)
[2]: iPC squares off against Paediatric Cancer | iPC Project EU
[3]: Overview ‹ Pandemic Response CoLab | MIT Media Lab