The Problem: Balancing Patient Data Privacy Regulations and Needed Data Analytics
The US established patient privacy rules for healthcare data with HIPAA regulations, and the European Union has similar patient data privacy regulations with the General Data Protection Regulation The purpose of these regulations is to ensure that patient medical data is not accessed or used without patient permission other than for healthcare treatments.
Several healthcare projects are attempting to build large data repositories of de-identified patient data to advance healthcare data analytics and associated AI deep learning models and algorithms. These advancements are expected to provide beneficial insights into patient treatments, outcomes, medication efficacy, and protocols for treating disease and chronic illness. Truveta, Google and Ascension, Google and Mayo Clinic, and globally via the Health Data Collaborative are examples of these healthcare projects.
The challenge in all these efforts is to create an effective de-identification process for patient data that ensures patient privacy regulations are met. Several de-identification tools are available to process structured data, unstructured data, and images. Many of these tools have been developed by leading medical universities and are not commercial, off-the-shelf applications that are supported and frequently updated with new capabilities. Using these tools will likely create a significant amount of overhead for the data informaticists or scientists who are working to create large healthcare data lakes that can be supported and maintained for long periods of time. New commercial solutions for de-identifying data are emerging, and these new solutions may dramatically increase the value of healthcare data analytics and AI.
The Solution: Synthetic Data Emerges to Resolve Healthcare Data De-Identification Challenges
Synthetic data solutions are emerging to assist data scientists with preparing de-identified data that can be used to create large aggregate databases to generate more accurate data analytics, AI models, and AI algorithms. Synthetic data is annotated information that computer simulations or algorithms generate as an alternative to real-world data. Synthetic data may be artificial, but it mathematically or statistically reflects real-world data. Several studies attest to the benefits of using synthetic data for AI models:
Synthetic data will become increasingly valuable to supporting deep learning AI models. Deep learning that supports neural programming, bioinformatics, and natural language processing will benefit from large-volume synthetic data sets.
Gartner estimates that by 2024, 60% of the data used for the development of AI and analytics projects will be synthetically generated. This projection suggests that the market is about to engage in a rapid uptake and utilization of synthetic data, but healthcare providers tend to lag behind the adoption curves of other industries. Emerging commercial synthetic data solutions will drive higher adoption.
The Justification: Synthetic Data Will Drive Higher Success Rates for AI Projects
Synthetic data solutions will allow healthcare organizations to generate the large data sets that are needed to produce more accurate analytics and AI models that generate algorithms that continue to improve the output. Synthetic data solutions will perform these functions while protecting the confidentiality and identification of the patient data that is synthesized. This approach for creating large patient data sets will also protect consumers from unauthorized use of the data by large technology companies (e.g., Google, Microsoft, and Amazon). We expect that some of the existing data collaborations between Google, Mayo Clinic, and Ascension will convert to using synthetic data if they are not doing so already. Synthetic data will be a catalyst for high success rates with AI projects.
The Players: Emerging Commercial Synthetic Data Companies
While synthetic data solutions have been developed by universities for their needs, the healthcare provider market will require commercial solutions to provide the support functions expected by the market segment. The following are some of the emerging vendors.
Success Factors:
Summary:
The ability to use large patient data sets while balancing the need to comply with patient privacy regulations creates significant challenges for many organizations to implement and expand AI projects. Synthetic data provides a solution for overcoming these challenges. Large healthcare organizations and medical centers can create custom synthetic data solutions due to their ability to hire skilled programmers and informaticists. Most provider healthcare organizations do not have the budgets to recruit and retain skilled resources to support their AI deep learning projects. Commercial synthetic data vendors are emerging and will allow many healthcare organizations to recruit data collaboration partners for sharing synthesized patient data to drive higher levels of AI success.
We hope that existing data collaboration partnerships between technology companies and healthcare organizations will convert patient data to synthetic data to ensure the privacy and confidentiality of patient data.
Synthetic data solutions will drive data sharing for growing healthcare provider organizations, and that will result in achieving expected AI benefits that are envisioned by the industry.
Photo Credit: fedrunovan, Adobe Stock
End of Messages