Big Data Environments Using Hadoop Have Peaked
Apache Hadoop is an open-source software framework for storing data and running applications on clusters of commodity hardware. It provides massive storage for any kind of data (e.g., structured and unstructured), enormous processing power, and the ability to handle virtually limitless concurrent tasks or jobs. Hadoop automatically creates data backups, so you lose nothing in the case of a cloud failure. Much of the cloud environment for storing and processing large data sets was developed on Hadoop. However, Hadoop has some limitations:
Tools such as Spark, another Apache open-source solution, can be used to improve the processing power of Hadoop and provide improved programming support since Spark uses SQL. However, there are currently no tools for Hadoop that offer comprehensive data standardization, data management, and data governance.
As healthcare moves to value-based care service and reimbursement structures, the ability to gather, store, manipulate, and analyze large sets of data from across myriad modalities of care will be a requirement for long-term success and viability for healthcare providers. Legacy enterprise data warehouse solutions based on Hadoop will likely not be competitive for managing big data environments with various data formats.
Kubernetes Emerges Out of Necessity to Improve Data Lake Analytics
The healthcare market has grown from siloed big databases to a data lake environment that both stores and analyzes structured and unstructured data from several internal and external sources. As data storage and analytics have become challenged by larger and larger data sets, Google developed an environment called Kubernetes out of necessity to better manage and more quickly analyze petabytes of information.
Kubernetes is an ecosystem of components and tools that improve the efficiency of developing and running applications in public and private clouds. IT teams can implement and manage applications quickly and predictably, scale them in real time, roll out new features without application disruption, and optimize hardware resources as needed for the applications. The advantages of Kubernetes include:
Emerging data lake solutions from Amazon and Snowflake use Kubernetes, and we expect to see the use of Kubernetes for cloud solutions increase dramatically in the next few years since Kubernetes is now an open-source solution.
The Cloud Becomes the Borg Collective for Healthcare Computing
Healthcare IT solutions are increasingly becoming cloud-based to provide lower costs, increase performance, improve fault-tolerance, and likely improve security. In Star Trek, the Borg are cyborgs that are all connected to the collective to drive a controlled and very efficient society. Healthcare is becoming a more data-driven environment for evaluating patient risk, evidence-based medicine, standardized care, clinical outcomes, and financial risk. Large volumes of data will need to be processed and analyzed as quickly as possible to ensure that appropriate care and financial decisions are made. The ability to use data lakes with data from several healthcare entities will be needed to generate more accurate analytics and business intelligence to survive the value-based care transformation.
Big Technology Players Drive Kubernetes Adoption
Several large technology companies use Kubernetes for their cloud services and applications. Representative companies include:
Success Factors
Summary
Technology companies that have large client bases using their cloud-based solutions have been driven by necessity to move from data architectures based on Hadoop to the newer architecture of Kubernetes to deliver more efficient data management and analytics services. The ability to provide higher application performance, more efficient allocation of computing resources, more support of existing AI/ML frameworks, and the ability to use edge servicers to contain computing costs are factors that align well with the needs of healthcare organizations for supporting the management and analysis of several types of internal and external data formats. Observing Amazon launch a specific data lake solution for healthcare suggests that large technology companies are advancing cloud-based data management to appropriately support healthcare clients. Provider organizations will need to perform a strategic analysis on their data environments to determine the most cost effective and painless path for moving to the newer data lake solutions based on Kubernetes. Resistance is futile.
Photo Credit: Adobe Stock, Kittiphat
End of Messages