Hi everyone! This is the first of our two-part blog series about data streaming in healthcare applications. Recent times show that digital healthcare is the most highly regulated sector across the globe.
Data Analytics is of great significance to the healthcare industry and is very straightforward. Here, data is considered to be the most valuable. It involves all aspects of healthcare research and implementing it into reality.
In the first of our two-part blog series, we are focusing on the challenges faced while streaming healthcare data for analytics.
We can evidently visualize the rapid growth of the health data sets and the increasing graph line is very prominent. Since the evolution of Artificial Intelligence/ Machine Learning, it can do wonders in Healthcare if we have the clean & right set of data.
Data analytics on healthcare data is very straightforward. We cannot make use of the raw data directly from the database. We have to think from the perspective of complaints and other security aspects since it has got more sensitive information. We need to filter out all patient Personally Identifiable Information (PII) in the data before running any analytics on it.
Analytics really helps people in the healthcare sector to enhance the patient experience, spot the prevailing & circling trends and improvise the quality of overall patient care.
How it started,
Once we got a requirement from one of our clients asking to stream the application data (patient information) from the Production Database into another database where the Data Scientist can perform their analytics.
As said earlier, we couldn’t make use of the data directly from the database. We went through trouble as always :P. Let me explain it to you,
the Challenges faced,
- Filter out ePHI:
- When you work with healthcare applications, the data will contain both PHI & non-PHI data. When you Stream data into the analytics database you need to make sure you filter out all the PHI data.
- Data Stream:
- Our application is growing rapidly and we are getting data from multiple sources. Whenever the data falls into our production database it has to be streamed to the analytics database, but with one condition of not providing any interception to our regular Users.
- Data Transform
- The table structure we had in the application is not that easy for making analytics. The same table requires a transformation into different structures that are suitable for the analytics database.
- So we build a middleware that will transfer our data into the structure which is suited to the analytics database.
Cross Stream
The first challenge for us was to filter out the PHI data while it is being pushed from our healthcare application to the analytics application. To overcome this challenge, we built our own Ruby Gem.
Once the gem is installed, it has to be included in the models/ tables that need to be pushed. Here, we might not require all the fields in the table from our database. So, the columns are defined (that needs to be pushed in the table) and sent to the next process.
Whenever any changes happen in the database (in the tables), Cross Stream will collect the columns and put them into an endpoint. We can add different endpoints or multiple endpoints to the data.
Currently, the gem built is pushing the data into Aws Kinesis, a service that makes it easy to collect, process, and analyze real-time, streaming data.
cross_stream :id, :column_1, :column_2
Here, we come to the end of the first part. I will be explaining the further steps that happen after pushing the data into AWS Kinesis.
Want to know how we overcome the other two challenges & built a real-time data stream for a healthcare application?
Then wait for part ✌🏻, coming soon 🙂