# Deduplication

Kernel identifies groups of duplicate accounts and identifies the primary record to preserve. Kernel uses its proprietary, AI-driven algorithm to scan all accounts in the CRM.

The following data points are provided:

<table><thead><tr><th width="245.0625">Data</th><th>Definition</th></tr></thead><tbody><tr><td>Duplicate type</td><td>See table below</td></tr><tr><td>Duplicate group</td><td>A number used to group duplicate accounts into unique groups</td></tr><tr><td>Duplicate of ID</td><td>Salesforce ID of the account of which the account is a duplicate</td></tr><tr><td>Duplicate - Reasoning</td><td>Plain-text reasoning explaining the logic behind the duplicate analysis</td></tr></tbody></table>

### Duplicate types

Each record is associated with one of the following duplicate types

<table><thead><tr><th width="190.5703125">Type</th><th>Definition</th></tr></thead><tbody><tr><td>Primary</td><td><p>Primary record</p><p><em>Note that duplicates of this account may exist, but this is the record that is recommended to survive the merge.</em></p></td></tr><tr><td>Exact</td><td>Two accounts are an exact match when they share the same Kernel ID, or when their legal name, legal country, name, and trading country all match, or when their URL, name, and legal name all align.</td></tr><tr><td>Location</td><td>Physical establishments of the same legal entity sharing the same domain — for example, hotel locations, offices, or stores operating at different URLs under the same root domain. One account must be classified as an Establishment.</td></tr><tr><td>Regional</td><td>The trading presence of a legal entity in a different country. For example, the UK trading entity of a company whose legal registration is in Germany. Identified when two accounts share the same legal name but operate in different trading countries.</td></tr><tr><td>Trading</td><td>The trading entity linked to its legal entity within the same country — for example, the operating brand of a holding company. Identified when two accounts share the same legal name, one as the legal identity and one as the trading identity.</td></tr><tr><td>Website</td><td>Accounts sharing the same URL and name. A softer match than Exact — legal name alignment is not required. Off by default; can be enabled per configuration.</td></tr></tbody></table>

## How Kernel identifies duplicates in your CRM

Kernel's deduplication works in two steps:

{% stepper %}
{% step %}
**Candidate generation**

For each account in your CRM, Kernel will scan the full CRM to create a long-list of potential duplicate candidates.
{% endstep %}

{% step %}
**Candidate selection**

Kernel will crawl the websites of all candidates and use data from the [Website analysis](/legacy/data/website-analysis.md)to determine if the pair is a true duplicate pair. The duplicate type and group will also be calculated.

Kernel uses a contextual, AI-based approach to determine duplicate pairs, e.g. to decide that `amazon.fr` is a regional duplicate of `amazon.com`, but `apollo.de` is not a regional duplicate of `apollo.com`
{% endstep %}
{% endstepper %}

### Primary record selection

When Kernel identifies duplicates, it designates one record as `Primary`:

Selection is determined in the following order:

{% stepper %}
{% step %}

#### Duplicate type priority

The type with the highest priority in the group takes precedence. Where multiple types are present, the hierarchy applied is:

**Exact > Location > Regional > Trading > Website**
{% endstep %}

{% step %}

#### Identity type

For **Regional** and **Trading** groups, the legal entity is preferred over the trading entity.

For **Location** groups, the parent entity is preferred over the establishment.
{% endstep %}

{% step %}

#### CRM field values

CRM fields configured for primary selection are compared across remaining candidates. By default, risk score and last activity date are used. The record with the highest value is preferred.
{% endstep %}
{% endstepper %}

### Duplicate groups

All associated duplicates are assigned a Duplicate group ID. Each duplicate group can only have 1 `Primary` account.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.kernel.ai/legacy/data/deduplication.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
Data	Definition
Duplicate type	See table below
Duplicate group	A number used to group duplicate accounts into unique groups
Duplicate of ID	Salesforce ID of the account of which the account is a duplicate
Duplicate - Reasoning	Plain-text reasoning explaining the logic behind the duplicate analysis
Type	Definition
Primary	Primary record Note that duplicates of this account may exist, but this is the record that is recommended to survive the merge.
Exact	Two accounts are an exact match when they share the same Kernel ID, or when their legal name, legal country, name, and trading country all match, or when their URL, name, and legal name all align.
Location	Physical establishments of the same legal entity sharing the same domain — for example, hotel locations, offices, or stores operating at different URLs under the same root domain. One account must be classified as an Establishment.
Regional	The trading presence of a legal entity in a different country. For example, the UK trading entity of a company whose legal registration is in Germany. Identified when two accounts share the same legal name but operate in different trading countries.
Trading	The trading entity linked to its legal entity within the same country — for example, the operating brand of a holding company. Identified when two accounts share the same legal name, one as the legal identity and one as the trading identity.
Website	Accounts sharing the same URL and name. A softer match than Exact — legal name alignment is not required. Off by default; can be enabled per configuration.