Built an AWS-based Serverless ETL Datalake in Less Than 4 Months

Velotio's Data Engineering Team is quite skilled and flexible. They were able to understand the business problem and architect a tailor-made solution. The engineering team delivered the solution against very tight deadlines. We were truly impressed.

VP, Engineering at Customer Data Platform Startup
Data Engineering
New York
$32M
4 months
Tech Stack Used
AWS Athena
AWS Glue
AWS Lambda
AWS S3
Kafka
Tensorflow

The customer wanted to build an intelligent Customer Data Platform (CDP) that aggregates and organizes all customer interactions, both online and offline, with a brand -- all in one place in real time. The idea is to provide a unified view of the customer journey to provide insights for better decision making.

Data Engineering
Tech Stack Used:
AWS Athena
AWS Glue
AWS Lambda
AWS S3
Kafka
Tensorflow
Results

- Designed a server-less ETL solution with 50% lower costs vs traditional solutions

- Delivered the solution in a record time of 4 months in-time for the next fund raising round.

Talk to us

Business Context

The client wanted Velotio to build and automate a serverless ETL data lake that:

a) Requires minimal infrastructure maintenance

b) Scales easily with AWS as the data load increases 

c) Supports disparate data sources, like real-time stream data, live transactions, historic customer data or batch records 

d) Can keep their cloud costs within their limited budget

e) Can be deployed and rebuilt easily from the scratch, without any manual intervention.

How Velotio Helped?

Velotio designed the data-lake and ETL as follows:

  • Data Ingestion: Velotio designed an ingestion system using AWS ECS. Realtime streaming data was sent directly to Kafka using Amazon's Managed Streaming service. Certain data like CSV, TSV and XLS files were directly loaded to AWS S3.

  • Data Transformation: The team utilized serverless services likes AWS Glue (Spark) and AWS Lambda (Python) to sanitize the ingested data. Clean data was then pushed to another AWS S3 Bucket.

  • Serverless Datalake: AWS S3 is used as a datalake with AWS Athena to perform basic queries on it.

  • Advanced Analytics and Machine learning: The customer wanted to perform additional queries.
Serverless ETL Solution

Our journey together so far

Exclusive office space

Right from renting out an exclusive office space to setting up robust technology architecture, handling payroll and other local administrative task

Dedicated recruitment team

Fast-track your hiring by selecting from our pool of carefully-screened talent pipeline or get dedicated recruiters to build your dream team of highly-skilled engineers that match your precise requirements.

High confidentiality

Ensure foolproof NDAs. We honor it not only at a company level, but also at an individual level as each member who joins your team signs it as well.

About Velotio

Velotio Technologies is an offshore product development partner for mission-driven technology startups across the globe. We combine business expertise and cutting-edge technology to drive success for our customers and help them win in their chosen markets.

Talk to us