Case Study

Built an AWS-based Serverless ETL Datalake in Less Than 4 Months

About This Project

The customer wanted to build an intelligent Customer Data Platform (CDP) that aggregates and organizes all customer interactions, both online and offline, with a brand -- all in one place in real-time. The idea is to provide a unified view of the customer journey to provide insights for better decision-making.

Services

Data Engineering

SaaS

Technologies

About the Client

The New York-based client has received total funding of $32M

Understanding the Challenge

The client wanted Velotio to build and automate a serverless ETL data lake that requires minimal infrastructure maintenance and scales easily with AWS as the data load increases.
It should also support disparate data sources, like real-time stream data, live transactions, historic customer data, or batch records. They wanted to keep their cloud costs within their limited budget and wanted the solution to be deployed and rebuilt easily from the scratch, without any manual intervention.

Velotio's Data Engineering Team is quite skilled and flexible. They were able to understand the business problem and architect a tailor-made solution. The engineering team delivered the solution against very tight deadlines. We were truly impressed.

VP, Engineering at Customer Data Platform Startup

How We Made It Happen

Velotio designed the data-lake and ETL as follows:

  • Data Ingestion: Velotio designed an ingestion system using AWS ECS. Realtime streaming data was sent directly to Kafka using Amazon's Managed Streaming service. Certain data like CSV, TSV, and XLS files were directly loaded to AWS S3.
  • Data Transformation: The team utilized serverless services like AWS Glue (Spark) and AWS Lambda (Python) to sanitize the ingested data. Clean data was then pushed to another AWS S3 Bucket.
  • Serverless Datalake: AWS S3 is used as a datalake with AWS Athena to perform basic queries on it.
  • Advanced Analytics and Machine learning: The customer wanted to perform additional queries.
(Image: Serverless ETL Solution)

How Velotio Made a Difference

Designed a server-less ETL solution with 50% lower costs vs traditional solutions

Delivered the solution in a record time of 4 months in time for the next fundraising round.

With Velotio, achieve breakthroughs in your product development journey.

Over 90 global customers, including NASDAQ-listed enterprises, unicorn startups, and cutting-edge product companies have trusted us for our technology expertise to deliver delightful digital products.

Talk to us

Work with modern and scalable technologies

We leverage emerging technologies to build products that are designed for scalability and better usability.

325+ highly skilled engineers

With us as your tech partners, you get access to a pool of digital strategists, engineers, architects, project managers, UI/UX designers, Cloud & DevOps experts, product analysts and QA managers.

Rated 4.6/5 on Clutch

At Velotio, we hold ourselves to sky-high standards of excellence and expect the same from our customers.

Built an AWS-based Serverless ETL Datalake in Less Than 4 Months

Velotio's Data Engineering Team is quite skilled and flexible. They were able to understand the business problem and architect a tailor-made solution. The engineering team delivered the solution against very tight deadlines. We were truly impressed.

VP, Engineering at Customer Data Platform Startup
Data Engineering
New York
$32M
4 months
Tech Stack Used
AWS Athena
AWS Glue
AWS Lambda
AWS S3
Kafka
Tensorflow

The customer wanted to build an intelligent Customer Data Platform (CDP) that aggregates and organizes all customer interactions, both online and offline, with a brand -- all in one place in real time. The idea is to provide a unified view of the customer journey to provide insights for better decision making.

Data Engineering
Tech Stack Used:
AWS Athena
AWS Glue
AWS Lambda
AWS S3
Kafka
Tensorflow
Results

- Designed a server-less ETL solution with 50% lower costs vs traditional solutions

- Delivered the solution in a record time of 4 months in-time for the next fund raising round.

Talk to us

Business Context

The client wanted Velotio to build and automate a serverless ETL data lake that:

a) Requires minimal infrastructure maintenance

b) Scales easily with AWS as the data load increases 

c) Supports disparate data sources, like real-time stream data, live transactions, historic customer data or batch records 

d) Can keep their cloud costs within their limited budget

e) Can be deployed and rebuilt easily from the scratch, without any manual intervention.

How Velotio Helped?

Velotio designed the data-lake and ETL as follows:

  • Data Ingestion: Velotio designed an ingestion system using AWS ECS. Realtime streaming data was sent directly to Kafka using Amazon's Managed Streaming service. Certain data like CSV, TSV and XLS files were directly loaded to AWS S3.

  • Data Transformation: The team utilized serverless services likes AWS Glue (Spark) and AWS Lambda (Python) to sanitize the ingested data. Clean data was then pushed to another AWS S3 Bucket.

  • Serverless Datalake: AWS S3 is used as a datalake with AWS Athena to perform basic queries on it.

  • Advanced Analytics and Machine learning: The customer wanted to perform additional queries.
Serverless ETL Solution

Our journey together so far

Exclusive office space

Right from renting out an exclusive office space to setting up robust technology architecture, handling payroll and other local administrative task

Dedicated recruitment team

Fast-track your hiring by selecting from our pool of carefully-screened talent pipeline or get dedicated recruiters to build your dream team of highly-skilled engineers that match your precise requirements.

High confidentiality

Ensure foolproof NDAs. We honor it not only at a company level, but also at an individual level as each member who joins your team signs it as well.

About Velotio


Velotio helps you deploy high-performance offshore teams on demand. We build teams that can design, develop and scale your vision in the most efficient way.

Our core areas of expertise include DevOps, Data engineering, ML/AI and Full-stack development. We're amongst one of the top software developers on Clutch with a rating of 4.8/5.

Here are a few reasons why our clients love working with us:
Great technical expertise. We come to the table with solutions, not problems.
We help you quickly add experienced and qualified engineers to your team, as and when you need them.
Soft skills are an important selection criterea for us. All our engineers command good English language skills, both written and oral.  
Quick turnaround inspite of the time difference.

Talk to us