# Website analysis

Kernel identifies and fixes accounts with missing websites and incorrect websites (gmail.com, bit.ly, etc.). A correct mapping between the company and the website is a prerequisite for cleaning & enrichment.

Kernel uses the following data to make its recommendation:

* Related contact domains
* Billing/Shipping address
* Company name
* Alternative websites, billing emails, account notes, and existing LinkedIn profile

The output of the website analysis is the following data:

<table><thead><tr><th width="199.55078125">Data</th><th>Definition</th></tr></thead><tbody><tr><td>Website status</td><td>Whether the website is functional or not; a non-functional website includes 4xx and 5xx status codes, parking domain pages, “out of business” messages, and absence of content</td></tr><tr><td>Resolved domain</td><td>The final website domain after following all website domains</td></tr><tr><td>Inferred domain</td><td>If the original website was incorrect, malformed, or invalid, the inferred domain shows the correct website</td></tr></tbody></table>

<figure><img src="/files/YxCNL6Du3p6gpZzkgMNE" alt=""><figcaption></figcaption></figure>

## Hidden duplicates

Account data corrections help flag “obvious” duplicates

<figure><img src="/files/Nh2lhRBxxn8EgQBxbGcs" alt="" width="563"><figcaption></figcaption></figure>

## Website analysis

Kernel automatically cleans and corrects website data to ensure accuracy. The process begins by identifying and handling invalid entries that often originate from form submissions.

### Step 1: Removing Invalid Domains

First, Kernel removes invalid domains by checking them against a static list of common errors that we maintain. This list targets entries such as:

* Public email providers like `gmail.com`, `mail.ru`, and `outlook.com`.
* Placeholder domains such as `test.com`.
* Link shorteners like `bit.ly` and `linktr.ee`. Kernel will first attempt to follow these short links to see if they resolve to a valid corporate website before discarding them.

This step can intelligently separate between `facebook.com`, the company, and `facebook.com/user-profile`, e.g., an influencer on Facebook.

### Step 2: Inferring Missing Domains

For any account that lacks a valid website after the cleansing step, Kernel automatically infers the correct one. Kernel uses a variety of techniques to identify the company's website correctly and accurately assigns the proper domain, turning an incomplete record into an actionable one.

<details>

<summary>Missing domain techniques</summary>

<table><thead><tr><th width="253.4609375">Technique</th><th>Description</th></tr></thead><tbody><tr><td>Check if the CRM name is actually a website</td><td>If the name is <code>Acme.com</code>, Kernel will use this as the website and update the Name to <code>Acme</code></td></tr><tr><td>Existing LinkedIn company profile</td><td>If the account has an associated LinkedIn profile, Kernel will check if the profile has a valid domain associated with it.</td></tr><tr><td>Related contact domains</td><td>If the account has contacts with emails associated with it, Kernel will strip out the domain name and use this as evidence, e.g. marcus@kernel.ai -> kernel.ai</td></tr><tr><td>URL typo</td><td>If the website appears to have been fat-fingered, e.g. <code>delotte.com</code> or <code>hubsport.com</code>, Kernel will replace it with the correctly typed version.</td></tr><tr><td>Alternative websites</td><td>If the CRM has custom account fields used to store domain names, Kernel will use these for evidence as well.</td></tr><tr><td>Address lookup</td><td>If the account has a valid billing or shipping address, Kernel will look up the company and find any associated domains.</td></tr><tr><td>Web search</td><td>Kernel searches across the Internet to find the company's website</td></tr><tr><td>Web search (LinkedIn)</td><td>Kernel will search across all LinkedIn accounts for a suitable match.</td></tr></tbody></table>

</details>

Based on these techniques, Kernel produces a list of candidate websites. Kernel crawls the websites of all candidates, feeding their content into a proprietary AI-based algorithm to determine if the pairing is accurate.

### Step 3: Website verification

Kernel comprehensively verifies each website in your CRM to determine its true operational status. Our multi-layered process goes far beyond a simple ping to deliver a verdict on a site's true business viability.

Key verification features include:

* **URL Path Resolution**: Traces the website's path to its final destination, automatically following redirects (e.g., `frontapp.com` to `front.com`) and handling common URL variations like the `www` prefix.
* **Unrestricted Global Access**: Bypasses regional firewalls and restrictions, such as GDPR, to ensure a reliable connection can be established from anywhere in the world.
* **Intelligent Content Analysis**: Recognizes that a simple "200 OK" status code is not enough. The system analyzes page content to identify and flag non-operational sites, including:
  * Domain Parking: Generic landing pages from services like HugeDomains.
  * Business Closures: Explicit "out of business" or "service unavailable" messages.
  * Domain Misuse: Sites that have been acquired and repurposed for unrelated or illicit activities, such as gambling websites.
* **False Negative Prevention**: Cross-references any unresponsive site against a curated database of known corporate domains. This safeguard prevents legitimate sites from being incorrectly flagged as 'Not working' due to a temporary server glitch or network outage.

This comprehensive approach ensures the final verdict reflects a website's true business viability, not just its momentary technical uptime.

{% hint style="success" %}
The website verification is a crucial factor in determining whether the cleaning action should be "Delete."
{% endhint %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.kernel.ai/legacy/data/website-analysis.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
