# Website analysis

Kernel identifies and fixes accounts with missing websites and incorrect websites (gmail.com, bit.ly, etc.). A correct mapping between the company and the website is a prerequisite for cleaning & enrichment.

Kernel uses the following data to make its recommendation:

* Related contact domains
* Billing/Shipping address
* Company name
* Alternative websites, billing emails, account notes, and existing LinkedIn profile

The output of the website analysis is the following data:

<table><thead><tr><th width="199.55078125">Data</th><th>Definition</th></tr></thead><tbody><tr><td>Website status</td><td>Whether the website is functional or not; a non-functional website includes 4xx and 5xx status codes, parking domain pages, “out of business” messages, and absence of content</td></tr><tr><td>Resolved domain</td><td>The final website domain after following all website domains</td></tr><tr><td>Inferred domain</td><td>If the original website was incorrect, malformed, or invalid, the inferred domain shows the correct website</td></tr></tbody></table>

<figure><img src="https://1215786129-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FvRYB7XIKCnmUi9oCEQGV%2Fuploads%2F2sgz4ai7qi1HKPgmoNvI%2Fimage.png?alt=media&#x26;token=afbd4b40-1834-40e6-8d9a-e944910dd65b" alt=""><figcaption></figcaption></figure>

## Hidden duplicates

Account data corrections help flag “obvious” duplicates

<figure><img src="https://1215786129-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FvRYB7XIKCnmUi9oCEQGV%2Fuploads%2FINNKEUsgYvzYVkYiVEzX%2Fimage.png?alt=media&#x26;token=a574423d-6b6b-4d0f-a05f-31c53d7f4871" alt="" width="563"><figcaption></figcaption></figure>

## Website analysis

Kernel automatically cleans and corrects website data to ensure accuracy. The process begins by identifying and handling invalid entries that often originate from form submissions.

### Step 1: Removing Invalid Domains

First, Kernel removes invalid domains by checking them against a static list of common errors that we maintain. This list targets entries such as:

* Public email providers like `gmail.com`, `mail.ru`, and `outlook.com`.
* Placeholder domains such as `test.com`.
* Link shorteners like `bit.ly` and `linktr.ee`. Kernel will first attempt to follow these short links to see if they resolve to a valid corporate website before discarding them.

This step can intelligently separate between `facebook.com`, the company, and `facebook.com/user-profile`, e.g., an influencer on Facebook.

### Step 2: Inferring Missing Domains

For any account that lacks a valid website after the cleansing step, Kernel automatically infers the correct one. Kernel uses a variety of techniques to identify the company's website correctly and accurately assigns the proper domain, turning an incomplete record into an actionable one.

<details>

<summary>Missing domain techniques</summary>

<table><thead><tr><th width="253.4609375">Technique</th><th>Description</th></tr></thead><tbody><tr><td>Check if the CRM name is actually a website</td><td>If the name is <code>Acme.com</code>, Kernel will use this as the website and update the Name to <code>Acme</code></td></tr><tr><td>Existing LinkedIn company profile</td><td>If the account has an associated LinkedIn profile, Kernel will check if the profile has a valid domain associated with it.</td></tr><tr><td>Related contact domains</td><td>If the account has contacts with emails associated with it, Kernel will strip out the domain name and use this as evidence, e.g. marcus@kernel.ai -> kernel.ai</td></tr><tr><td>URL typo</td><td>If the website appears to have been fat-fingered, e.g. <code>delotte.com</code> or <code>hubsport.com</code>, Kernel will replace it with the correctly typed version.</td></tr><tr><td>Alternative websites</td><td>If the CRM has custom account fields used to store domain names, Kernel will use these for evidence as well.</td></tr><tr><td>Address lookup</td><td>If the account has a valid billing or shipping address, Kernel will look up the company and find any associated domains.</td></tr><tr><td>Web search</td><td>Kernel searches across the Internet to find the company's website</td></tr><tr><td>Web search (LinkedIn)</td><td>Kernel will search across all LinkedIn accounts for a suitable match.</td></tr></tbody></table>

</details>

Based on these techniques, Kernel produces a list of candidate websites. Kernel crawls the websites of all candidates, feeding their content into a proprietary AI-based algorithm to determine if the pairing is accurate.

### Step 3: Website verification

Kernel comprehensively verifies each website in your CRM to determine its true operational status. Our multi-layered process goes far beyond a simple ping to deliver a verdict on a site's true business viability.

Key verification features include:

* **URL Path Resolution**: Traces the website's path to its final destination, automatically following redirects (e.g., `frontapp.com` to `front.com`) and handling common URL variations like the `www` prefix.
* **Unrestricted Global Access**: Bypasses regional firewalls and restrictions, such as GDPR, to ensure a reliable connection can be established from anywhere in the world.
* **Intelligent Content Analysis**: Recognizes that a simple "200 OK" status code is not enough. The system analyzes page content to identify and flag non-operational sites, including:
  * Domain Parking: Generic landing pages from services like HugeDomains.
  * Business Closures: Explicit "out of business" or "service unavailable" messages.
  * Domain Misuse: Sites that have been acquired and repurposed for unrelated or illicit activities, such as gambling websites.
* **False Negative Prevention**: Cross-references any unresponsive site against a curated database of known corporate domains. This safeguard prevents legitimate sites from being incorrectly flagged as 'Not working' due to a temporary server glitch or network outage.

This comprehensive approach ensures the final verdict reflects a website's true business viability, not just its momentary technical uptime.

{% hint style="success" %}
The website verification is a crucial factor in determining whether the cleaning action should be "Delete."
{% endhint %}
