The client needed a solution to automate the search, retrieval, and storage of publicly available data from multiple websites. This process was done manually and required a dedicated resource to manage the entire process. They needed a crawling and scraping solution that automates the searching and extraction of data from multiple sources and uploads the data to the client database for further processing.
The client is an AI-based recruitment platform that enables talent discovery and personalized interaction based on organization alignment and profile matching. Their data-driven hiring solution helps companies spot candidates who best fit their needs and are likely to move, and then approach them through personalized interaction.
A data pipeline and data warehousing solution was needed to manage the movement and transformation of data, as well as quick retrieval and analysis for reporting and decision making.
Velotio developed a crawling and scraping solution that automates the searching and extraction of data from multiple sources and uploads the data to the client database for further processing. The solution will crawl, extract and store data based on pre-specified rules. The solution also made it possible to specify the kind of URLs to crawl, and the data type to be extracted and stored in the database. The time intervals for the crawling/extraction process and quantum of data extracted be specified as per requirement.
“Velotio quickly adapts to our ever-changing needs. They have excellent business acumen with the ability to prioritize deliverables, while continually exceeding our expectations.”
The solution was organized into 3 layers – Crawler, Data Extractor, and Backend API Layer. Domain-specific rules and intelligence was used by the solution to crawl, extract and store data. Basic NLP and machine learning were also leveraged to reduce the effort of scraping websites of platforms.
Database: MySQL was used as a relational database as persistent storage for the data. MySQL’s free text feature was used for supporting the search APIs. The solution described above was delivered as a Docker container.
Automated the entire process of searching and storing content in the database.
Significantly saved resource productivity and efforts.
Simplified profile management with automated and regular data upload.