project case
Reducing human factors increasing data quality through Data Collection Automation Process

%
Reducing Human Factors
%
Reducing Data Collection Time
%
Reducing Yearly Costs
%
Increasing Interest Factors
Background
A performance analytics engine is offered by our client’s web application, which uses validated data and mathematical modeling to improve performance transparency and realistic benchmarking. Data used in the model is being gathered, assessed, and validated by our client’s internal team of analysts and outside suppliers. The model takes Data that are assigned to particular factors as a raw input and offers detailed analysis, demographics analysis, and other outputs.
Before
On a monthly basis, a team member was running a specific task. He had to connect to a specific portal and gather all required data, download them in a CSV file, and import them to their web application. This task was lasting as a 4-day job, 8 hours per day. Due to the needed effort, the team member had to reduce their interest factors to 600 from the 3.000+ that could be included.

Let’s analyze the facts
One Dedicated Employee
35 Hours in monthly basis
3.000€ yearly
Human error or Availability
600 Interest factors collected
Collecting the Data
The yearly cost is been calculated on the client’s country’s yearly salary rate which is 14.000€. Also, to make it simpler we assume that this specific employee is trained to do the job and:
- Always will be available the specific week, every month
- Will do the job without any mistakes
Using the Data
When the Data Collection was accomplished, then the update of the web application was done by uploading CSV files and receiving, in the same way, the results of the update. The employee should inspect the CSV results and proceed with additional adjustments.

The Solution
In this project, our client deals with massive amounts of data, relies on real-time data analysis, stores data in the cloud, and houses data from multiple sources. Data flow itself can be unreliable: there are many points during the transport from one system to another where corruption or bottlenecks can occur. To analyze all that data, we needed a single view of the entire data set.
Data Pipeline
A data pipeline is a series of processes that migrate data from a source to a destination database. An example of a technical dependency may be that after assimilating data from sources, the data is held in a central queue before subjecting it to further validations and then finally dumped into a destination. Typically, this includes loading raw data into a staging table for interim storage and then changing it before ultimately inserting it into the destination reporting tables.

Data Collection (Source)
Data Storage (Destination)
Data Transformation (Processing)

Data Collection (Source)
The source is a Web Portal where the client’s user after successful authentication, had to implement specific filters for retrieving the needed structured data on Web pages. We used Web scraping methods for collecting all the structured data and the Batch processing model for storing in the storage during set time intervals, which are scheduled during off-peak business hours.

Data Transformation (Processing)
The destination (data storage) is a temporary cloud repository where the data arrives at the end of its collection process, for transformation. Transformation changes data to be ready for the production system, including processes for data standardization, sorting, deduplication, validation, and verification.

Data Implementation (Production)
For ensuring data integrity we implement several monitoring components for alerting administrators in case of potential failure scenarios including network congestion or an offline source. Additionally, upon completion, a structured data report is provided for further analysis and history logging.

After
Moving away from a legacy data infrastructure is one of the best actions an enterprise can take towards being more data-driven, more effective in managing the infrastructure, and, in all reality, keeping up with the competition.
Not only our client is saving thousands of euros every month on payrolls alone, but the amount of hours it takes to maintain the old process has also gone from hundreds of hours down to ten or so. This has a huge impact on our client’s business. Implementing anything new would’ve taken at least half a working month with the old system, now it’s a matter of hours to get everything up and running.

Let’s analyze the facts
No Dedicated Employee
1 Hour in monthly basis
500€ yearly
Automated process
3000+ Interest factors collected
Collecting the Data
The investment cost for the initialization & implementation of the system was about 6.000€. The system is a hosting service having Linux OS and a web application built on Python.
The return on investment is 100% and happening in the 7th month of use!
Keeping the legacy system and If we estimate the needed effort for 3.000+ factors, will result in 150 hours of effort, 20 working days, and about 8.500€ cost, only for the 1st year of action.
Using the Data
When the Data Collection was accomplished (20 hrs), the update of the web application was automated. The functionality of receiving the results of the update was kept so the employee should inspect the CSV results and proceed with additional adjustments.
The Return On Investment (ROI) is 100% and happening in the 7th month of use!
Start a Project
Starting a project from scratch is like building a house, starting from the foundation, from the very beginning. A project is a blank slate, and you are the creator.

Change is upon us. Automation can help.
No matter the complexity of your environment or where you are on your IT modernization journey, an IT operations automation strategy can help you improve existing processes. With automation, you can save time, increase quality, improve employee satisfaction, and reduce costs throughout your organization.