Scalable Cloud-based Web Crawling And Scraping Solution for AI Hiring Platform

Velotio quickly adapts to our ever-changing needs. They have excellent business acumen with the ability to prioritize deliverables, while continually exceeding our expectations.

CTO, Washington DC-based startup
Artificial Intelligence / Machine Learning
Washington DC
$2 million
8 months
Tech Stack Used
Python
MySQL
Amazon RDS
Flask
Docker
Scrapy

The client is an AI-based recruitment platform that enables talent discovery and personalised interaction based on organisation alignment and profile matching. Their data-driven hiring solution helps companies spot candidates who best fit their needs and are likely to move, and then approach them through personalized interaction.

Artificial Intelligence / Machine Learning
Tech Stack Used:
Python
MySQL
Amazon RDS
Flask
Docker
Scrapy
Results

- Automated the entire process of searching and storing content in the database.

- Significantly saved resource productivity and efforts.

- Simplified profile management with automated and regular data upload.

Talk to us

Business Challenges

They needed a solution to automate the search, retrieval and storage of publicly available data from multiple websites. This process was done manually and required a dedicated resource to manage the entire process.

A data pipeline and data warehousing solution was needed to manage the movement and transformation of data, as well as quick retrieval and analysis for reporting and decision making.

How Velotio Helped

Velotio developed a crawling and scraping solution that automates searching and extraction of data from multiple sources and uploads the data to the client database for further processing. The solution will crawl, extract and store data based on pre-specified rules. The solution also made it possible to specify the kind of URLs to crawl, the data type to be extracted and stored on the database. The time intervals for the crawling/extraction process and quantum of data extracted be specified as per requirement.

The Solution

The solution was organized into 3 layers – Crawler, Data Extractor and Backend API Layer. Domain specific rules and intelligence was used by the solution to crawl, extract and store data. Basic NLP and machine learning is also leveraged to reduce the effort for scraping websites of platforms.

1. Crawler: It will crawl specific websites and platforms, following domain specific rules to extrapolate data.The data will then be uploaded to the cloud storage (Amazon S3 or Equivalent) for processing.The crawler will then have multiple spiders (processes), most likely customized spiders for each kind of platform.The crawler will run periodically to keep the content up to date.

2. Extractor: The data extractor takes HTML page content and apply the generic, platform/website specific rules to get the relevant data,which will then be saved to a database.

3. API Layer: A REST API-based Server was used to provide facility to access data from persistent storage over HTTP REST APIs.

Database: MySQL was used a relational database as persistent storage for the data. MySQL’s free text feature was used for supporting the search APIs. The solution described above was delivered as a Docker container.

Impact

The client was able to automate the entire process of searching for, extracting and storing content to their database. It resulted in considerable savings in terms of resource productivity and effort to extract data.

Management of profiles on the client platform was simplified due to automated and regular upload to database. As the rules for crawling/scraping of the data can be pre-specified, the client database now consisted of very specific and relevant data.

Velotio's team provided us with the right combination of flexibility and technical skills enabling us to build critical features faster.

CTO, Washington DC-based startup

Our journey together so far

Exclusive office space

Right from renting out an exclusive office space to setting up robust technology architecture, handling payroll and other local administrative task

Dedicated recruitment team

Fast-track your hiring by selecting from our pool of carefully-screened talent pipeline or get dedicated recruiters to build your dream team of highly-skilled engineers that match your precise requirements.

High confidentiality

Ensure foolproof NDAs. We honor it not only at a company level, but also at an individual level as each member who joins your team signs it as well.

About Velotio

Velotio Technologies is an offshore product development partner for mission-driven technology startups across the globe. We combine business expertise and cutting-edge technology to drive success for our customers and help them win in their chosen markets.

Talk to us