Scalable Cloud-based Web Crawling And Scraping Solution for AI Hiring Platform

Velotio quickly adapts to our ever-changing needs. They have excellent business acumen with the ability to prioritize deliverables, while continually exceeding our expectations.

CTO, Washington DC-based startup
Artificial Intelligence / Machine Learning
Washington DC
$2 million
8 months
Tech Stack Used
Python
MySQL
Amazon RDS
Flask
Docker
Scrapy

The client is an AI-based recruitment platform that enables talent discovery and personalised interaction based on organisation alignment and profile matching. Their data-driven hiring solution helps companies spot candidates who best fit their needs and are likely to move, and then approach them through personalized interaction.

Artificial Intelligence / Machine Learning
Tech Stack Used:
Python
MySQL
Amazon RDS
Flask
Docker
Scrapy
Results

- Automated the entire process of searching and storing content in the database.

- Significantly saved resource productivity and efforts.

- Simplified profile management with automated and regular data upload.

Talk to us

Business Challenges

They needed a solution to automate the search, retrieval and storage of publicly available data from multiple websites. This process was done manually and required a dedicated resource to manage the entire process.

A data pipeline and data warehousing solution was needed to manage the movement and transformation of data, as well as quick retrieval and analysis for reporting and decision making.

How Velotio Helped

Velotio developed a crawling and scraping solution that automates searching and extraction of data from multiple sources and uploads the data to the client database for further processing. The solution will crawl, extract and store data based on pre-specified rules. The solution also made it possible to specify the kind of URLs to crawl, the data type to be extracted and stored on the database. The time intervals for the crawling/extraction process and quantum of data extracted be specified as per requirement.

The Solution

The solution was organized into 3 layers – Crawler, Data Extractor and Backend API Layer. Domain specific rules and intelligence was used by the solution to crawl, extract and store data. Basic NLP and machine learning is also leveraged to reduce the effort for scraping websites of platforms.

1. Crawler: It will crawl specific websites and platforms, following domain specific rules to extrapolate data.The data will then be uploaded to the cloud storage (Amazon S3 or Equivalent) for processing.The crawler will then have multiple spiders (processes), most likely customized spiders for each kind of platform.The crawler will run periodically to keep the content up to date.

2. Extractor: The data extractor takes HTML page content and apply the generic, platform/website specific rules to get the relevant data,which will then be saved to a database.

3. API Layer: A REST API-based Server was used to provide facility to access data from persistent storage over HTTP REST APIs.

Database: MySQL was used a relational database as persistent storage for the data. MySQL’s free text feature was used for supporting the search APIs. The solution described above was delivered as a Docker container.

Impact

The client was able to automate the entire process of searching for, extracting and storing content to their database. It resulted in considerable savings in terms of resource productivity and effort to extract data.

Management of profiles on the client platform was simplified due to automated and regular upload to database. As the rules for crawling/scraping of the data can be pre-specified, the client database now consisted of very specific and relevant data.

Velotio's team provided us with the right combination of flexibility and technical skills enabling us to build critical features faster.

CTO, Washington DC-based startup

Our journey together so far

Exclusive office space

Right from renting out an exclusive office space to setting up robust technology architecture, handling payroll and other local administrative task

Dedicated recruitment team

Fast-track your hiring by selecting from our pool of carefully-screened talent pipeline or get dedicated recruiters to build your dream team of highly-skilled engineers that match your precise requirements.

High confidentiality

Ensure foolproof NDAs. We honor it not only at a company level, but also at an individual level as each member who joins your team signs it as well.

About Velotio


Velotio helps you deploy high-performance offshore teams on demand. We build teams that can design, develop and scale your vision in the most efficient way.

Our core areas of expertise include DevOps, Data engineering, ML/AI and Full-stack development. We're amongst one of the top software developers on Clutch with a rating of 4.8/5.

Here are a few reasons why our clients love working with us:
Great technical expertise. We come to the table with solutions, not problems.
We help you quickly add experienced and qualified engineers to your team, as and when you need them.
Soft skills are an important selection criterea for us. All our engineers command good English language skills, both written and oral.  
Quick turnaround inspite of the time difference.

Talk to us