Case Study
Data Engg & Analytics

Serverless ETL Datalake using Amazon Web Services

About This Project

Without much knowledge of serverless technologies, the customer approached Velotio to set up a serverless datalake that can scale to store petabyte-scale data.


Data Engg & Analytics


About the Client

The customer is a B2B Customer Data Platform providing a unified view of the customer across all platforms, with leading brands like Staples, Walmart, and Cisco as their customers.

Understanding the Challenge

The client wanted to set up a multi-tenant serverless data lake with real-time and batch data ingestion and processing. The data ingestion system needed to support multiple file formats (CSV, TSV, XLS) and different sources - AWS S3 Buckets, FTP, and Dropbox among others.
The current CDP platform was built using traditional technologies like Hadoop, Hive, HDFS, and YARN which was difficult to manage, scale, and upgrade. The new solution needed to have Minimal infrastructure maintenance and remove the undifferentiated heavy lifting of managing infrastructure as demand changes and technologies evolve.

As the client was signing on larger enterprises, the expected data storage was expected to increase 10x from Terabytes to Petabytes but the current platform could not store unprocessed raw data in a cost-effective way. The data warehouse gets data from a range of services. In the current data warehouse, any updates to those services required manual updates to ETL jobs and tables. The response times for these data sources are critical. This required the Velotio team to take a data-driven approach to select a high-performance architecture.

This was our first time working with a remote team, but Velotio’s team didn't miss any deadlines despite having a tight schedule and won our trust early in the project. They excelled at reporting and addressing issues quickly. The communication with our on-site team was also extremely smooth. We're extremely happy with the progress we have made with them.

Director of Engineering, Customer Data Platform

How We Made It Happen

Velotio worked with the customer to understand the existing platform, data characteristics, and end goals.

‍Based on the requirements listed above, Velotio decided to change the data warehouse both operationally and architecturally. From an operational standpoint, we designed a new shared responsibility model for data ingestion. Architecturally, we chose a serverless model over a traditional relational database. These two decisions ended up driving every design and implementation decision that we made in our migration.

Serverless ETL Datalake Using Amazon Web Services

Velotio built the solution on AWS using serverless technologies like AWS Step Functions, AWS Lambda, AWS Glue, AWS Athena, and AWS S3. Velotio built a proof-of-concept in one month to demonstrate the solution addressing all the challenges. The complete solution was built in  4 months.

The team developed the solution as follows:

  • Designed the pipeline for batch processing AWS Step functions, AWS Lambda for basic data sanitization, and AWS Glue for complex batch operations. AWS Glue handles the ETL job scheduling and AWS Glue crawlers manage the metadata in the AWS Glue Data Catalog.
  • Setup AWS Kinesis and Kinesis Firehose to fetch real-time data for data processing.
  • Leveraged AWS S3 and AWS Athena to store raw and processed data. The platform provides the ability to re-process raw data in case there are changes to the ETL rules and parsing data.

How Velotio Made a Difference

The new serverless data analytics reduced the cost of data processing and storage by 10x.

AWS S3 with Athena can easily scale to store and process 10s of petabytes of data.

Leveraging AWS services and the serverless model reduced the ongoing operational costs by 50-60%.

The current platform enables the ability to run Tensorflow-based Machine Learning models and analytics to understand customer behavior.

With Velotio, achieve breakthroughs in your product development journey.

Over 90 global customers, including NASDAQ-listed enterprises, unicorn startups, and cutting-edge product companies have trusted us for our technology expertise to deliver delightful digital products.

Talk to us

Work with modern and scalable technologies

We leverage emerging technologies to build products that are designed for scalability and better usability.

Rated 4.6/5 on Clutch

325+ highly skilled engineers

With us as your tech partners, you get access to a pool of digital strategists, engineers, architects, project managers, UI/UX designers, Cloud & DevOps experts, product analysts and QA managers.

At Velotio, we hold ourselves to sky-high standards of excellence and expect the same from our customers.