# Deduplication

Kernel identifies groups of duplicate accounts and identifies the primary record to preserve. Kernel uses its proprietary, AI-driven algorithm to scan all accounts in the CRM.

The following data points are provided:

<table><thead><tr><th width="245.0625">Data</th><th>Definition</th></tr></thead><tbody><tr><td>Duplicate type</td><td>See table below</td></tr><tr><td>Duplicate group</td><td>A number used to group duplicate accounts into unique groups</td></tr><tr><td>Duplicate of ID</td><td>Salesforce ID of the account of which the account is a duplicate</td></tr><tr><td>Duplicate - Reasoning</td><td>Plain-text reasoning explaining the logic behind the duplicate analysis</td></tr></tbody></table>

### Duplicate types

Each record is associated with one of the following duplicate types

<table><thead><tr><th width="190.5703125">Type</th><th>Definition</th></tr></thead><tbody><tr><td>Primary</td><td><p>Primary record</p><p><em>Note that duplicates of this account may exist, but this is the record that is recommended to survive the merge.</em></p></td></tr><tr><td>Exact</td><td>Two accounts are an exact match when they share the same Kernel ID, or when their legal name, legal country, name, and trading country all match, or when their URL, name, and legal name all align.</td></tr><tr><td>Location</td><td>Physical establishments of the same legal entity sharing the same domain — for example, hotel locations, offices, or stores operating at different URLs under the same root domain. One account must be classified as an Establishment.</td></tr><tr><td>Regional</td><td>The trading presence of a legal entity in a different country. For example, the UK trading entity of a company whose legal registration is in Germany. Identified when two accounts share the same legal name but operate in different trading countries.</td></tr><tr><td>Trading</td><td>The trading entity linked to its legal entity within the same country — for example, the operating brand of a holding company. Identified when two accounts share the same legal name, one as the legal identity and one as the trading identity.</td></tr><tr><td>Website</td><td>Accounts sharing the same URL and name. A softer match than Exact — legal name alignment is not required. Off by default; can be enabled per configuration.</td></tr></tbody></table>

## How Kernel identifies duplicates in your CRM

Kernel's deduplication works in two steps:

{% stepper %}
{% step %}
**Candidate generation**

For each account in your CRM, Kernel will scan the full CRM to create a long-list of potential duplicate candidates.
{% endstep %}

{% step %}
**Candidate selection**

Kernel will crawl the websites of all candidates and use data from the [website-analysis](https://docs.kernel.ai/legacy/data/website-analysis "mention")to determine if the pair is a true duplicate pair. The duplicate type and group will also be calculated.

Kernel uses a contextual, AI-based approach to determine duplicate pairs, e.g. to decide that `amazon.fr` is a regional duplicate of `amazon.com`, but `apollo.de` is not a regional duplicate of `apollo.com`
{% endstep %}
{% endstepper %}

### Primary record selection

When Kernel identifies duplicates, it designates one record as `Primary`:

Selection is determined in the following order:

{% stepper %}
{% step %}

#### Duplicate type priority

The type with the highest priority in the group takes precedence. Where multiple types are present, the hierarchy applied is:

**Exact > Location > Regional > Trading > Website**
{% endstep %}

{% step %}

#### Identity type

For **Regional** and **Trading** groups, the legal entity is preferred over the trading entity.

For **Location** groups, the parent entity is preferred over the establishment.
{% endstep %}

{% step %}

#### CRM field values

CRM fields configured for primary selection are compared across remaining candidates. By default, risk score and last activity date are used. The record with the highest value is preferred.
{% endstep %}
{% endstepper %}

### Duplicate groups

All associated duplicates are assigned a Duplicate group ID. Each duplicate group can only have 1 `Primary` account.
