Case Study
Data Engineering

Transforming Patients Lives with Superior care experience and Data-Driven Healthcare Insights

About This Project

The client is a leading healthcare technology company dedicated to empowering clinicians and patients. They make it easy for clinicians and their patients to access their universal medical records while providing advanced AI tools to support more personalized health insights and decision-making. We collaborated with the customer to develop a robust ETL (extract, transform, load) pipeline. The ETL ensures secure, efficient, and compliant processing of medical records data while advancing modern healthcare's frontiers.


Data Engineering


About the Client

Our client is a health-tech company based in the United States, specializing in providing comprehensive tools for clinicians and patients to enhance insights and decision-making. These tools empower clinicians by enabling access to extensive patient data, simultaneously saving time on documentation and treatment planning. Their patient-facing app allows individuals to delve into critical contributors to their well-being, such as food intake, sleep patterns, exercise, hydration, and more. Additionally, the application grants access to medical records, empowering individuals to take charge of their health history and ensure its accurate representation. Our client fosters the therapeutic alliance and promotes collaborative, patient-centered care by offering clinician- and patient-interfacing products.

Understanding the Challenge

The primary goal of this project was to design and implement an ETL pipeline with a holistic approach. The pipeline was envisioned to address various critical aspects of data management and processing as follows-

Automated Data Retrieval - The customer wanted to develop mechanisms for seamless and automated extraction of diverse datasets with minimum manual intervention to enhance efficiency in data acquisition.

Efficient Communication with Data Providers - They aimed to establish robust communication channels with external data sources to ensure reliable and timely data updates from various providers.

Centralized Data Storage - A centralized repository for storing medical records data was required to enhance accessibility and streamline retrieval processes for authorized users.

Facilitation of Data Transformation - The client wanted to create a flexible framework for smooth and efficient data transformation to support the restructuring of data that aligns with specific formatting requirements.

HIPAA Compliance - The client wanted to integrate stringent security measures to maintain compliance with HIPAA regulations and safeguard patient information to uphold data privacy and confidentiality standards.

Scalability - They wanted to design the pipeline to accommodate the evolving and expanding volume of healthcare data and ensure scalability to meet the dynamic needs of the healthcare ecosystem.

While implementing the ETL Pipeline with the above approach, the client eventually wanted to improve accessibility for clinicians, enhance the usability of medical records data, provide a user-friendly interface for easy navigation and interpretation of medical records, and foster a more efficient and intuitive user experience for healthcare professionals. The client aimed to elevate patient care experience, empower healthcare providers with accurate and comprehensive data insights, and facilitate a more informed and personalized approach to patient care. All while maintaining the highest standards of data security and compliance.

"Velotio's team executed the ETL pipeline seamlessly, contributing to our strategic advantage in the healthcare technology sector. The Catan facilitated an impressive 70% reduction in time for clinicians, and the data-driven analytics played a key role in improving patient satisfaction."

Product Lead, Healthtech company

How We Made It Happen

Velotio was crucial in helping the client design, develop, and implement the ETL pipeline named "Catan." Our expertise in GoLang, microservices architecture, data integration, and commitment to compliance played a pivotal role in the successful execution of this project. The team for the project comprised senior software engineers and front-end developers. The successful implementation of the ETL pipeline provided the client with a competitive edge in the healthcare technology sector, enabling them to stand out by offering innovative, efficient, and patient-centered healthcare solutions.

Below is an overview of the overall process and how Velotio assisted the client in achieving their goal of providing personalized, data-driven healthcare.

Project Initiation and Requirement Analysis - Velotio initiated the project by thoroughly understanding the client’s requirements and objectives, which included automating data retrieval, efficient communication with data providers, centralizing data storage, implementing data transformation, maintaining HIPAA compliance, and ensuring scalability.

Design and Architecture - We designed the architecture of the ETL pipeline, Catan, based on microservices principles. The pipeline was constructed using GoLang, which is known for its performance and suitability for microservices.

Automated Data Retrieval - We implemented the Catan consumer service, which helped continuously monitor event-driven systems, specifically Kafka topics. For example, if a new patient enters the system, Catan will immediately initiate data retrieval processes. This automation and the retrieval and processing of medical records data helped clinicians save valuable time and streamline their workflows. It also enabled faster decision-making, more accurate diagnoses, and enhanced patient care.

Efficient Communication with Data Providers - The Catan was designed to communicate efficiently with various data providers, including Healthgorilla. We also developed the necessary APIs and integration logic for seamless calls to these providers.

Centralized Data Storage - We facilitated the secure storage of patient health data retrieved from data providers in a centralized location, Amazon S3. This centralized storage improved data accessibility and management for downstream processes. This helped provide patient-centered care, fostering stronger alliances between healthcare providers and patients.

Data Transformation - We integrated transformation services into the pipeline responsible for converting raw data into a format usable by downstream systems. By streamlining healthcare data efficiently, the project laid the foundation for data-driven insights and analytics, potentially leading to improvements in healthcare outcomes.

HIPAA Compliance - Our team strictly adhered to HIPAA compliance standards throughout development. Security measures, data encryption, access controls, and audit trails were implemented to protect patient data. This adherence instilled trust in patients, ensuring their sensitive medical information was handled with the utmost care and in compliance with industry regulations.

Scalability - Catan was designed with scalability, ensuring efficient handling of increased data volumes. As the client’s user base grew, the pipeline seamlessly accommodated more data, ensuring readiness for future growth.

Testing and Quality Assurance - We conducted rigorous testing and quality assurance to ensure the pipeline's reliability and accuracy. This included data validation, error handling, and performance testing.

Deployment and Monitoring: - Once development and testing were complete, we assisted in deploying the Catan ETL pipeline into the client’s infrastructure. Continuous monitoring and maintenance were implemented to ensure the pipeline's ongoing performance and reliability.

Support and Iteration - We offered ongoing support and assistance to address any issues or modifications required by the client. The partnership continued to evolve as the project matured and new requirements emerged.

Unlocking the critical aspects of the project 

  • The patient data is quite sensitive, and HIPAA compliance was critical while securely handling this data. We implemented security measures, access controls, encryption, and auditing to meet standards and ensured HIPAA compliance at each project stage.
  • The integration with diverse data providers posed a challenge and resulted in gaps in effective communication. We designed the "Kafka-Consumer" service to handle various sources seamlessly and achieve integrated communication. Also, converting raw healthcare data required meticulous planning; we developed multiple services that effectively handled diverse data sources and formats.
  • The project demanded real-time processing, which added complexity. Our team ensured fast and efficient data ingestion, transformation, and storage with the ETL pipeline.
  • Anticipating user base growth, the ETL pipeline needed scalability. Our architecture assured seamless data handling for ever-increasing data volumes without compromising performance. The system demanded high reliability; we proactively addressed potential issues and ensured seamless data processing despite errors or failures.

Tech Stacks

  1. Golang and Python Microservices
  • Golang was the primary programming language for developing microservices, including the "Kafka-Consumer" service.
  • Python was utilized for the transformation service and data storage service.
  1. gRPC (Google Remote Procedure Call)
  • gRPC was vital in defining and implementing efficient, high-performance communication between microservices. It facilitated seamless communication between various components of the ETL pipeline.
  1. Protocol Buffers (protobuf)
  • Protocol Buffers, commonly known as protobuf, define the data structures and message formats between microservices.
  • Protobuf proved efficient in terms of serialization and deserialization, making it well-suited for high-throughput applications like this.
  1. Kafka
  • Kafka functioned as the messaging system for real-time data ingestion and processing.
  1. AWS S3 (Amazon Simple Storage Service)
  • AWS S3 was the centralized storage solution for securely storing patient health data.
  • Data retrieved from data providers and processed by the ETL pipeline were stored in S3 for subsequent access and analysis.

This tech stack was meticulously chosen to meet the project's requirements, encompassing real-time data processing, efficient communication, data storage, and scalability. Golang and Python microservices, coupled with gRPC and protobuf, ensured robust and high-performance communication between components, while Kafka provided capabilities for real-time data handling. AWS S3 served as a secure and scalable storage solution for healthcare data, maintaining compliance with industry standards such as HIPAA.

How Velotio Made a Difference

Successfully implemented an ETL Pipeline for Automated Data Processing and Retrieval, resulting in a 70% time savings for clinicians in managing medical records data.

Enabled onboarding of over 15K patients by ensuring HIPAA compliance and data security, fostering trust in handling sensitive medical information.

With enhanced data-driven analytics and insights, we achieved a 30% increase in patient satisfaction, innovated healthcare journeys, and gained competitive advantage.

With Velotio, achieve breakthroughs in your product development journey.

Over 90 global customers, including NASDAQ-listed enterprises, unicorn startups, and cutting-edge product companies have trusted us for our technology expertise to deliver delightful digital products.

Talk to us

Work with modern and scalable technologies

We leverage emerging technologies to build products that are designed for scalability and better usability.

Rated 4.6/5 on Clutch

325+ highly skilled engineers

With us as your tech partners, you get access to a pool of digital strategists, engineers, architects, project managers, UI/UX designers, Cloud & DevOps experts, product analysts and QA managers.

At Velotio, we hold ourselves to sky-high standards of excellence and expect the same from our customers.