project case

Reducing human factors increasing data quality through Data Collection Automation Process

Mar 30, 2023

%

Reducing Human Factors

%

Reducing Data Collection Time

%

Reducing Yearly Costs

%

Increasing Interest Factors

Background

A performance analytics engine is offered by our client’s web application, which uses validated data and mathematical modeling to improve performance transparency and realistic benchmarking. Data used in the model is being gathered, assessed, and validated by our client’s internal team of analysts and outside suppliers. The model takes Data that are assigned to particular factors as a raw input and offers detailed analysis, demographics analysis, and other outputs.



Before

On a monthly basis, a team member was running a specific task. He had to connect to a specific portal and gather all required data, download them in a CSV file, and import them to their web application. This task was lasting as a 4-day job, 8 hours per day. Due to the needed effort, the team member had to reduce their interest factors to 600 from the 3.000+ that could be included.

Let’s analyze the facts



One Dedicated Employee



35 Hours in monthly basis



3.000€ yearly



Human error or Availability



600 Interest factors collected

Collecting the Data

The yearly cost is been calculated on the client’s country’s yearly salary rate which is 14.000€. Also, to make it simpler we assume that this specific employee is trained to do the job and:

Always will be available the specific week, every month
Will do the job without any mistakes

Using the Data

When the Data Collection was accomplished, then the update of the web application was done by uploading CSV files and receiving, in the same way, the results of the update. The employee should inspect the CSV results and proceed with additional adjustments.

The Solution

In this project, our client deals with massive amounts of data, relies on real-time data analysis, stores data in the cloud, and houses data from multiple sources. Data flow itself can be unreliable: there are many points during the transport from one system to another where corruption or bottlenecks can occur. To analyze all that data, we needed a single view of the entire data set.

Data Pipeline

A data pipeline is a series of processes that migrate data from a source to a destination database. An example of a technical dependency may be that after assimilating data from sources, the data is held in a central queue before subjecting it to further validations and then finally dumped into a destination. Typically, this includes loading raw data into a staging table for interim storage and then changing it before ultimately inserting it into the destination reporting tables.

Finally, the production system is connected to the temporary cloud repository with all transformed data (Destination). Using the batch processing model, that loads “batches” of data into the production repository during set time intervals, which are typically scheduled during off-peak business hours. This way, other workloads aren’t impacted as batch processing jobs tend to work with large volumes of data, which can tax the overall system

N

Data Collection (Source)

N

Data Storage (Destination)

N

Data Transformation (Processing)

Data Collection (Source)

The source is a Web Portal where the client’s user after successful authentication, had to implement specific filters for retrieving the needed structured data on Web pages. We used Web scraping methods for collecting all the structured data and the Batch processing model for storing in the storage during set time intervals, which are scheduled during off-peak business hours.

Data Transformation (Processing)

The destination (data storage) is a temporary cloud repository where the data arrives at the end of its collection process, for transformation. Transformation changes data to be ready for the production system, including processes for data standardization, sorting, deduplication, validation, and verification.

Data Implementation (Production)

For ensuring data integrity we implement several monitoring components for alerting administrators in case of potential failure scenarios including network congestion or an offline source. Additionally, upon completion, a structured data report is provided for further analysis and history logging.

After

Moving away from a legacy data infrastructure is one of the best actions an enterprise can take towards being more data-driven, more effective in managing the infrastructure, and, in all reality, keeping up with the competition.

Not only our client is saving thousands of euros every month on payrolls alone, but the amount of hours it takes to maintain the old process has also gone from hundreds of hours down to ten or so. This has a huge impact on our client’s business. Implementing anything new would’ve taken at least half a working month with the old system, now it’s a matter of hours to get everything up and running.

Let’s analyze the facts



No Dedicated Employee



1 Hour in monthly basis



500€ yearly



Automated process



3000+ Interest factors collected

Collecting the Data

The investment cost for the initialization & implementation of the system was about 6.000€. The system is a hosting service having Linux OS and a web application built on Python.

The return on investment is 100% and happening in the 7th month of use!

Keeping the legacy system and If we estimate the needed effort for 3.000+ factors, will result in 150 hours of effort, 20 working days, and about 8.500€ cost, only for the 1st year of action.

Using the Data

When the Data Collection was accomplished (20 hrs), the update of the web application was automated. The functionality of receiving the results of the update was kept so the employee should inspect the CSV results and proceed with additional adjustments.

The Return On Investment (ROI) is 100% and happening in the 7th month of use!

Start a Project

Starting a project from scratch is like building a house, starting from the foundation, from the very beginning. A project is a blank slate, and you are the creator.

Schedule a Free Consultation

Change is upon us. Automation can help.

No matter the complexity of your environment or where you are on your IT modernization journey, an IT operations automation strategy can help you improve existing processes. With automation, you can save time, increase quality, improve employee satisfaction, and reduce costs throughout your organization.

Company

About WEDEVA
Project Stories
Contact WEDEVA
Privacy Policy

What We Offer

Our Services
Process Automation
Web Scrapping
Data Pipelines
Python Hosting Servers