Building your data foundation: why entity resolution must come before AI in procurement

Every procurement team knows the frustration:

  • The same supplier appears in five different ways across your systems
  • Invoices with vague descriptions
  • Payment terms that are buried in PDFs.

 Duplicate payments slip through. Discount opportunities vanish. This reality isn’t just an annoying data quality issue; it’s a fundamental barrier to procurement transformation.

The truth is, fragmented data is the silent tax on every P2P operation. Yet most organisations rush toward AI and automation without addressing the underlying chaos in their data.

Why traditional approaches keep failing

For years, organisations have tried to solve data quality problems with Master Data Management (MDM) systems and rules-based classification. These approaches are rigid, labour-intensive, and become outdated the moment your business changes or people leave, taking their knowledge with them.

Think of it like navigating a modern city without a map. You can ask people for directions, but you are reliant on them having a good understanding of the neighbourhood, a good sense of direction, and knowledge of the street names. You might eventually reach your destination, but you’ll waste time, miss opportunities, and take several wrong turns along the way.

But there’s a deeper problem that reveals why traditional MDM fundamentally misses the mark, illustrated neatly by the multiple-address reality.

The multiple address problem

For a single supplier entity, your systems legitimately store:

  • Legal registered address
  • Ship-to address
  • Invoice-to address
  • Delivery-to address
  • Site-specific addresses
  • PO boxes created for specific clients or purposes

Each of these addresses serves a distinct business purpose. You can’t simply “synchronise one address everywhere” when an update comes through; that would break legitimate business processes.

Now multiply this complexity across dozens of systems, hundreds of suppliers, and years of mergers, migrations, and organisational changes. Traditional MDM systems aim to enforce a single “golden record,” but in reality, multiple versions of the truth coexist.

What you actually need isn’t one master record; it’s a way to understand how all these different addresses, identifiers, and attributes relate to each other across your entire procurement ecosystem. This complexity doesn’t exist in isolation. To understand why entity resolution is so challenging, we need to see where all this fragmented data actually lives.

The procurement data maze

The typical procure-to-pay landscape is a myriad of different systems:

  • Contract management systems store supplier agreements and terms, often as standalone, disconnected systems from daily workflows.
  • Sourcing platforms handle RFx processes and supplier selection, generating contracts that may never fully integrate downstream.
  • Procurement systems manage requisitions and purchase orders for indirect spend, sometimes including supplier catalogues.
  • ERP systems serve as the “system of record” for direct spend, inventory, and production purchases, handling everything from goods receipts to invoice matching.
  • AP and payment systems process invoices and execute payments, often through separate automation tools.
  • Warehouse and inventory systems track stock movements and feed reconciliation data back to the ERP.

In an ideal world, these systems would share a unified view of every supplier. The reality is far messier.

Some connections are mandatory: your ERP, procurement system, and AP platform must integrate for basic PO-to-invoice matching. But the majority of connections remain weak, manual, or non-existent:

  • Contracts from sourcing rarely flow cleanly into procurement systems
  • Category responsibility mapping lives in spreadsheets that never sync
  • Warehouse systems operate in semi-isolation, requiring manual batch uploads
  • The same supplier exists across all systems under different IDs, names, and codes

Even when organisations attempt to maintain consistent supplier codes across platforms, the consistency breaks down over time. Different geographies use different naming conventions. Acquisitions introduce new ERP instances. Subsidiaries create separate entries.

The result? A single supplier can appear dozens, even hundreds, of times across your systems, making holistic analysis virtually impossible.

Without a way to connect these fragments, procurement loses leverage, finance loses visibility, and the same supplier appearing under dozens of different identities becomes an accepted reality rather than a solvable problem.

That’s the environment most organisations are trying to overlay with AI and automation. And it explains why so many initiatives fail to deliver.

Why this matters: building the data foundation before automation

Here’s what many organisations miss: AI initiatives built on fragmented data will underperform or fail.

Before you can deploy intelligent automation, predictive analytics, or AI-driven insights, you need three foundational capabilities:

Entity resolution: Connecting fragmented records of the same supplier, customer, or product across systems.

This isn’t about creating another master data system; it’s about building intelligent connections between disparate data sources while respecting that multiple legitimate versions may coexist.

Semantic architecture: Creating a unified understanding of what data means across your organisation.

When one system calls it “vendor” and another calls it “supplier,” your downstream processes need to understand they’re the same concept, even if the underlying systems never change their terminology.

Taxonomy harmonisation: Standardising how you classify, categorise and structure spend, suppliers, and relationships.

This is the prerequisite for everything else. You cannot resolve entities or apply advanced analytics without first normalising your data against a consistent taxonomy; however, it is also vitally important that any harmonisation system retains the flexibility needed to rapidly adapt to shifting business landscapes.

Getting the architecture right

So how do you solve this? Not by replacing all your systems, that’s neither practical nor necessary. The answer lies in creating an architecture that respects what your existing systems do well while adding the missing layer of intelligence.

The hybrid approach

Transactional data should stay in relational databases. Your ERP, AP, and procurement systems are optimised for structured, transactional data. They handle calculations efficiently, maintain referential integrity, and support the day-to-day operations that keep your business running.

Entity relationships and connections belong in a different layer. Once you’ve harmonised and normalised your data, you need a way to understand how entities relate across systems, but you don’t need to move your transactional history to do this. Technologies like knowledge graphs excel at representing these complex relationships and connections, whilst your operational systems continue handling what they do best: processing transactions.

The architecture should link both capabilities: transaction-level KPIs, payment histories, and operational metrics stay where they are. The relationship and entity resolution layer sits on top, creating connections that reveal the true structure of your procurement ecosystem.

Normalisation must come first

Here’s where many initiatives go wrong: they attempt entity resolution on raw, inconsistent data and wonder why the results are disappointing.

Remember the multiple address problem? Without normalisation, your entity resolution process doesn’t know that “ship_to_address,” “delivery_location,” and “shipping_addr” all refer to the same concept. It can’t systematically match entities when every source system uses different terminology.

Without proper normalisation against a known taxonomy, you’ll simply recreate the same problems in your harmonised layer: the same information stored under different property names, inconsistent categorisation across data sources, and no ability to query or analyse patterns systematically.

Keep your foundation current

A harmonised data foundation isn’t a one-time project; it requires ongoing maintenance to deliver sustained value as your data ages quickly. Every time a supplier changes their legal address, updates their tax registration, or modifies their banking details, that information must be detected and cascaded in near real-time. Otherwise, you’re making decisions based on increasingly stale information.

The key is incremental change detection: automated routines that identify what’s changed in source systems and propagate only those deltas through your architecture. This approach reduces processing overhead by avoiding full data reloads, maintains system performance as data volumes grow, and ensures your entity resolution remains up to date.

But here’s the nuance: historical data shouldn’t be discarded. That old address? You still need it. When people search for a company, they may be referring to old paperwork or relying on their memory. Furthermore, you cannot assume all records have been updated, especially in ad hoc tools such as spreadsheets. So knowledge of the past is still important in the present.

The solution is maintaining both current and historical views, keeping old data available as reference points whilst ensuring the active, operational view reflects the latest information.

The value this unlocks

Get this foundation right, and the benefits compound across your entire organisation.

You gain unified visibility by breaking down silos, connecting disparate P2P data into a single, coherent view whilst respecting the source systems’ operational controls. Manual supplier deduplication and spend categorisation are replaced with consistent, accurate processing.

For risk and compliance, you can trace supplier relationships instantly for sanctions screening, ESG assessment, or concentration risk analysis across your true supplier base, not just your fragmented records. Strategic insights emerge naturally: vendor consolidation opportunities, rebate potential, and optimised payment terms across the entire relationship.

Perhaps most importantly, you deliver operational intelligence at the point of work, bringing entity resolution insights back to the systems where people actually work, making every transaction smarter without changing daily workflows.

All of this creates true AI readiness: the foundation for advanced automation, predictive analytics, and intelligent decision support that actually works.

The bottom line

The rush to adopt AI in procurement is understandable. The promises are compelling: automated processing, intelligent recommendations, predictive insights.

But AI built on fragmented, inconsistent data is like building a skyscraper on quicksand. It might look impressive initially, but it won’t deliver sustained value.

The organisations winning at procurement transformation aren’t the ones with the fanciest AI. They’re the ones who took time to build a robust data foundation first:

  • Normalising data against consistent taxonomies before attempting entity resolution
  • Harmonising disparate systems whilst letting each do what it does best
  • Building automated processes to keep their foundation current as business changes
  • Creating bidirectional flows so insights enhance operations at the point of work
  • Maintaining the discipline to prevent the data quality problems that plagued their legacy systems

This foundation work isn’t glamorous. It won’t generate the same excitement as announcing a new AI initiative. But it’s the difference between transformation initiatives that deliver measurable value and those that quietly disappoint.

Only then does automation deliver on its promise.


In our next article, we’ll explore how advanced entity resolution techniques reveal hidden relationships in your procurement ecosystem—and how to apply these capabilities to drive tangible business value.

Choose your location

UK

including EMEA and APAC
Visit our UK site

USA

including North and South America
Visit our USA site

Schedule a demo

Privacy Policy*