Deduplication policies

Duplicate types

Kernel categorizes all accounts into one of five types:

Type
Definition

Primary

Primary record

Note that duplicates of this account may exist, but this is the record that is recommended to survive the merge.

Exact

Exact match found after ‘cleaning’ and standardizing the URLs

Subdomain

Similar to exact match, e.g. shop.ccs.com vs. ccs.com

Regional

fr.amazon.com, amazon.fr, amazon.com are all regional duplicates, but apollo.de and apollo.com are not.

Potential

A catch-all category for all the hard cases that require extensive work, e.g. corporate or careers sites, investor relationship, or product/marketing sites.

Regional account policy

You can treat regional duplicates in one of two ways:

  1. Treat regional sites as subsidiaries (keep separate, e.g., amazon.fr is a child of amazon.com)

  2. Treat regional sites as duplicates (collapse into the global parent)

This setting is relevant for determining the Cleaning action

Primary record selection

When the Cleaning Agent identifies duplicate records in your database, it automatically selects one record as the "primary" and marks the others as duplicates. The primary record becomes the master record that all duplicates will merge into or reference.

How Primary Records Are Selected

The system evaluates all duplicate records using the following criteria, applied in priority order until a clear winner emerges:

1

Geographic scope

Global web presences are preferred over regional variations. For instance, a company's international website takes priority over its country-specific sites.

2

Domain authority

The system prioritizes established domain extensions in this order:

  • Commercial domains (.com) - highest priority

  • Government and educational domains (.gov, .edu)

  • Organizational and network domains (.org, .net)

  • Newer domains (.io, .ai, .tech)

  • Regional domains (.co.uk, .de, .fr) - lower priority

3

Domain Hierarchy

Root domains are preferred over subdomains. The main company website ranks higher than product-specific or functional subdomains.

4

Simplicity

When one domain name is a shorter version of another, the shorter version is selected as the canonical form.

5

Accessibility

Active, functioning websites are preferred over inactive or inaccessible ones.

6

Risk score

The record with the highest Account riskis preserved.

7

Redirect behaviour

Direct destinations are preferred over domains that redirect elsewhere, as the final destination typically represents the authoritative web presence.

Last updated