Website analysis

Kernel identifies and fixes accounts with missing and incorrect websites

Kernel identifies and fixes accounts with missing websites and incorrect websites (gmail.com, bit.ly, etc.). A correct mapping between the company and the website is a prerequisite for cleaning & enrichment.

Kernel uses the following data to make its recommendation:

  • Related contact domains

  • Billing/Shipping address

  • Company name

  • Alternative websites, billing emails, account notes, and existing LinkedIn profile

The output of the website analysis is the following data:

Data
Definition

Website status

Whether the website is functional or not; a non-functional website includes 4xx and 5xx status codes, parking domain pages, “out of business” messages, and absence of content

Resolved domain

The final website domain after following all website domains

Inferred domain

If the original website was incorrect, malformed, or invalid, the inferred domain shows the correct website

Hidden duplicates

Master data corrections help flag “obvious” duplicates

Website analysis

Kernel automatically cleans and corrects website data to ensure accuracy. The process begins by identifying and handling invalid entries that often originate from form submissions.

Step 1: Removing Invalid Domains

First, Kernel removes invalid domains by checking them against a static list of common errors that we maintain. This list targets entries such as:

  • Public email providers like gmail.com, mail.ru, and outlook.com.

  • Placeholder domains such as test.com.

  • Link shorteners like bit.ly and linktr.ee. Kernel will first attempt to follow these short links to see if they resolve to a valid corporate website before discarding them.

This step can intelligently separate between facebook.com, the company, and facebook.com/user-profile, e.g., an influencer on Facebook.

Step 2: Inferring Missing Domains

For any account that lacks a valid website after the cleansing step, Kernel automatically infers the correct one. Kernel uses a variety of techniques to identify the company's website correctly and accurately assigns the proper domain, turning an incomplete record into an actionable one.

Missing domain techniques
Technique
Description

Check if the CRM name is actually a website

If the name is Acme.com, Kernel will use this as the website and update the Name to Acme

Existing LinkedIn company profile

If the account has an associated LinkedIn profile, Kernel will check if the profile has a valid domain associated with it.

Related contact domains

If the account has contacts with emails associated with it, Kernel will strip out the domain name and use this as evidence, e.g. [email protected] -> kernel.ai

URL typo

If the website appears to have been fat-fingered, e.g. delotte.com or hubsport.com, Kernel will replace it with the correctly typed version.

Alternative websites

If the CRM has custom account fields used to store domain names, Kernel will use these for evidence as well.

Address lookup

If the account has a valid billing or shipping address, Kernel will look up the company and find any associated domains.

Web search

Kernel searches across the Internet to find the company's website

Web search (LinkedIn)

Kernel will search across all LinkedIn accounts for a suitable match.

Based on these techniques, Kernel produces a list of candidate websites. Kernel crawls the websites of all candidates, feeding their content into a proprietary AI-based algorithm to determine if the pairing is accurate.

Step 3: Website verification

Kernel comprehensively verifies each website in your CRM to determine its true operational status. Our multi-layered process goes far beyond a simple ping to deliver a verdict on a site's true business viability.

Key verification features include:

  • URL Path Resolution: Traces the website's path to its final destination, automatically following redirects (e.g., frontapp.com to front.com) and handling common URL variations like the www prefix.

  • Unrestricted Global Access: Bypasses regional firewalls and restrictions, such as GDPR, to ensure a reliable connection can be established from anywhere in the world.

  • Intelligent Content Analysis: Recognizes that a simple "200 OK" status code is not enough. The system analyzes page content to identify and flag non-operational sites, including:

    • Domain Parking: Generic landing pages from services like HugeDomains.

    • Business Closures: Explicit "out of business" or "service unavailable" messages.

    • Domain Misuse: Sites that have been acquired and repurposed for unrelated or illicit activities, such as gambling websites.

  • False Negative Prevention: Cross-references any unresponsive site against a curated database of known corporate domains. This safeguard prevents legitimate sites from being incorrectly flagged as 'Not working' due to a temporary server glitch or network outage.

This comprehensive approach ensures the final verdict reflects a website's true business viability, not just its momentary technical uptime.

Last updated