At a glance
- Sector: Government — public finance & property tax
- Our role: We designed and built the automated data-validation platform that cleaned the records
- Timeframe: Delivered 2025
Challenge
A national government's property-tax authority could only bill what it could trust — and it could not trust its data. Its records had to be assembled from a sprawl of disparate sources — postal, electricity, water, mapping and registry systems — and no two spoke the same language. Each came with its own formats, its own level of detail, even its own units: addresses as free-form text in one system and structured fields in another, GPS coordinates in incompatible notations, names following no shared convention. There were no reliable identifiers to stitch the sources together, and the formats had drifted apart over years. On top of that the records were riddled with faults — missing or malformed owner addresses, properties with no identifiable owner at all — and every flaw meant a tax notice that could not be delivered, or revenue that was never billed. Reconciling it by hand, across millions of records drawn from a tangle of mismatched sources, was hopeless.
Approach
We designed and built an automated data-validation platform that ingests records from every source and first makes them comparable — normalising clashing formats, reconciling units and bringing inconsistent GPS coordinates onto a common reference before anything else. Only then does it reconcile records against authoritative postal and registry data and run a rules engine over every owner name and address. Each record is classified and every fault categorised — from a recoverable missing district to an owner that simply does not exist — and, wherever the evidence allows, corrected and enriched automatically rather than handed to a person. What had been a manual, one-off clean-up became a repeatable pipeline the authority could re-run as new data arrived.
Outcome
The authority went from a registry it second-guessed to one it could bill on.
- Over 2 million property records, drawn from a sprawl of mismatched source systems, validated and reconciled into one consistent dataset
- More than 500,000 records corrected or enriched — addresses rebuilt, owners resolved, invalid entries caught
- Roughly one in four records carried an error the old process had missed
- The result: a clean, notification-ready dataset — and an automated pipeline to keep it that way
