Built an AWS-based Serverless ETL Datalake in Less Than 4 Months

Velotio's Data Engineering Team is quite skilled and flexible. They were able to understand the business problem and architect a tailor-made solution. The engineering team delivered the solution against very tight deadlines. We were truly impressed.

VP, Engineering at Customer Data Platform Startup
Data Engineering
New York
$32M
4 months
Tech Stack Used
AWS Athena
AWS Glue
AWS Lambda
AWS S3
Kafka
Tensorflow

The customer wanted to build an intelligent Customer Data Platform (CDP) that aggregates and organizes all customer interactions, both online and offline, with a brand -- all in one place in real time. The idea is to provide a unified view of the customer journey to provide insights for better decision making.

Data Engineering
Tech Stack Used:
AWS Athena
AWS Glue
AWS Lambda
AWS S3
Kafka
Tensorflow
Results

- Designed a server-less ETL solution with 50% lower costs vs traditional solutions

- Delivered the solution in a record time of 4 months in-time for the next fund raising round.

Talk to us

Business Context

The client wanted Velotio to build and automate a serverless ETL data lake that:

a) Requires minimal infrastructure maintenance

b) Scales easily with AWS as the data load increases 

c) Supports disparate data sources, like real-time stream data, live transactions, historic customer data or batch records 

d) Can keep their cloud costs within their limited budget

e) Can be deployed and rebuilt easily from the scratch, without any manual intervention.

How Velotio Helped?

Velotio designed the data-lake and ETL as follows:

  • Data Ingestion: Velotio designed an ingestion system using AWS ECS. Realtime streaming data was sent directly to Kafka using Amazon's Managed Streaming service. Certain data like CSV, TSV and XLS files were directly loaded to AWS S3.

  • Data Transformation: The team utilized serverless services likes AWS Glue (Spark) and AWS Lambda (Python) to sanitize the ingested data. Clean data was then pushed to another AWS S3 Bucket.

  • Serverless Datalake: AWS S3 is used as a datalake with AWS Athena to perform basic queries on it.

  • Advanced Analytics and Machine learning: The customer wanted to perform additional queries.
Serverless ETL Solution

Our journey together so far

Exclusive office space

Right from renting out an exclusive office space to setting up robust technology architecture, handling payroll and other local administrative task

Dedicated recruitment team

Fast-track your hiring by selecting from our pool of carefully-screened talent pipeline or get dedicated recruiters to build your dream team of highly-skilled engineers that match your precise requirements.

High confidentiality

Ensure foolproof NDAs. We honor it not only at a company level, but also at an individual level as each member who joins your team signs it as well.

About Velotio


Velotio helps you deploy high-performance offshore teams on demand. We build teams that can design, develop and scale your vision in the most efficient way.

Our core areas of expertise include DevOps, Data engineering, ML/AI and Full-stack development. We're amongst one of the top software developers on Clutch with a rating of 4.8/5.

Here are a few reasons why our clients love working with us:
Great technical expertise. We come to the table with solutions, not problems.
We help you quickly add experienced and qualified engineers to your team, as and when you need them.
Soft skills are an important selection criterea for us. All our engineers command good English language skills, both written and oral.  
Quick turnaround inspite of the time difference.

Talk to us