> For the complete documentation index, see [llms.txt](https://docs.kernel.ai/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.kernel.ai/integrations/s3-integration.md).

# S3 file exchange

## What it is

The S3 file exchange lets your team send account data to Kernel and receive enriched results back as files, through a shared Amazon S3 bucket. Nothing is installed in your CRM, and no direct system access is required in either direction.

It works with any source system — Salesforce, Dynamics, HubSpot, Snowflake, a CDP, or internal databases — because the only interface is a file.

## At a glance

* Your team drops an account export into the bucket; Kernel returns an enriched results file to the same bucket
* Access is scoped to a dedicated, per-customer AWS identity — no shared credentials
* Supports one-off deliveries, scheduled refreshes, and automated pipeline ingestion
* Cross-account IAM role assumption, customer-owned encryption keys (KMS), and region preferences are all supported

## How it works

```mermaid
sequenceDiagram
    autonumber
    participant You as Your team
    participant S3 as S3 bucket<br/>(input/ · output/)
    participant K as Kernel

    You->>S3: Upload account export to input/
    S3->>K: Kernel picks up the file
    Note over K: Identity resolution · deduplication<br/>hierarchy building · enrichment
    K->>S3: Results file written to output/
    S3->>You: Ingest results into CRM,<br/>warehouse, or CDP
```

The bucket has two folders: **`input/`** for files you send to Kernel, and **`output/`** for files Kernel sends back. Each side only ever writes to its own folder, which keeps the exchange easy to audit.

## Setup

Setup is a one-time exchange of access details and typically completes within a day or two.

```mermaid
flowchart LR
    A["You create the<br/>S3 bucket"] --> B["Kernel shares its<br/>AWS principal ARN"]
    B --> C["You grant that principal<br/>access in the bucket policy"]
    C --> D["Both sides test<br/>read + write"]
    D --> E["Exchange is live"]

    classDef step fill:#f0f4f1,stroke:#7c9885,color:#1a1a1a
    class A,B,C,D,E step
```

### Hosting options

| Option                     | How it works                                                                        | Choose this if                                                          |
| -------------------------- | ----------------------------------------------------------------------------------- | ----------------------------------------------------------------------- |
| **You host** (recommended) | You create the bucket in your AWS account and grant access to Kernel's AWS identity | You want the data to stay in your AWS account with your own audit trail |
| **Kernel hosts**           | Kernel creates the bucket and grants access to your AWS user                        | You don't have an AWS team readily available                            |

### For security teams: cross-account role assumption

If your security policy requires it, Kernel supports the stricter cross-account pattern:

* You provision an IAM role in your account; Kernel's service assumes it using `sts:AssumeRole` with an **ExternalID**
* Permissions can be scoped per direction — write-only on `input/`, read-only on `output/` (from your side's perspective), giving you least-privilege access and a single audit trail in your account
* **Customer-owned KMS encryption keys** on the bucket are fully supported
* Kernel's preferred AWS region is `eu-west-1`, but the bucket can live in whichever region your data residency requires

To set this up, your team shares the role ARN and ExternalID; Kernel shares the principal ARN to add to your trust policy.

## File format

### What you send

A **CSV file**, one row per account. Only two fields are strictly required, but every additional field improves match accuracy:

| Field                        | Required    | Notes                                                             |
| ---------------------------- | ----------- | ----------------------------------------------------------------- |
| Your unique record ID        | Yes         | Passed through untouched — used to join results back to your data |
| Account name                 | Yes         |                                                                   |
| Website URL                  | Recommended | The strongest matching signal after name                          |
| Related domains              | Recommended | Comma-separated                                                   |
| Street, city, state, country | Optional    |                                                                   |
| LinkedIn URL                 | Optional    |                                                                   |
| Legal entity name            | Optional    |                                                                   |
| Your internal parent ID      | Optional    | Helps validate hierarchy output                                   |

The exact column set is agreed during onboarding — Kernel provides an input template.

### What you get back

A results file with one row per input record, designed to join directly onto your data using your record ID:

| Field               | Description                                                             |
| ------------------- | ----------------------------------------------------------------------- |
| Your record ID      | Passthrough from your input file                                        |
| `kern_id`           | Kernel's persistent entity identifier ([KERN ID](/concepts/kern-id.md)) |
| `kern_parent_id`    | KERN ID of the identified parent entity                                 |
| Resolved name / URL | The verified identity of the entity                                     |
| `cleaning_action`   | Recommendation: **Associate**, **Merge**, **Delete**, or **None**       |
| Reasoning           | Why the parent / action was assigned, with sources                      |

Depending on your engagement scope, results can also include entity type and subtype, operational status, website status, duplicate groupings, and firmographic enrichment. See [Managing actions](/managing-actions.md) for what each recommendation means.

## Delivery cadence

| Pattern                 | How it works                                                                                                    |
| ----------------------- | --------------------------------------------------------------------------------------------------------------- |
| **One-off delivery**    | A single bulk exchange — typical for proof-of-concepts and initial cleanups                                     |
| **Scheduled refresh**   | The same exchange repeated on an agreed cadence (e.g. quarterly or bi-annually) to keep data current            |
| **Automated ingestion** | Kernel drops refreshed files into the bucket and your pipeline picks them up automatically — no manual handoffs |

For record-by-record processing of new accounts as they're created, the file exchange pairs with Kernel's API — see the [API documentation](/developer/api.md).

## FAQs

<details>

<summary>Do you support file formats other than CSV?</summary>

CSV is required for bulk processing. If your export pipeline produces Parquet or JSON, convert to CSV before dropping the file in the bucket.

</details>

<details>

<summary>Is data encrypted?</summary>

Yes. All transfers use TLS, and files are encrypted at rest in S3. If you host the bucket, you can use your own KMS encryption keys.

</details>

<details>

<summary>Who can access the bucket?</summary>

Only the two principals in the bucket policy: your team's identity and the dedicated AWS identity Kernel provisions for your engagement. Kernel creates a separate identity per customer — credentials are never shared across engagements.

</details>

<details>

<summary>Can we move to a direct integration later?</summary>

Yes — this is a common path. Many customers start with the file exchange (it requires no CRM installation or security review of a package) and move to the [Salesforce integration](/integrations/salesforce-integration.md) or API once procurement and security processes complete. The data model is the same, so nothing is rebuilt.

</details>

<details>

<summary>What does Kernel need from us to get started?</summary>

Three things: the bucket name and region, the bucket ARN, and confirmation that Kernel's principal has been added to the bucket policy. Kernel provides its principal ARN as soon as the bucket exists.

</details>


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.kernel.ai/integrations/s3-integration.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
