Connecting sources

Four ways to get content into VDF AI Data

Each method fits a different shape of source. Most teams end up using more than one — files for snapshots, connected apps for living docs, databases for structured data, and pasted text for quick moments.

Method	Best for	Stays current?	Reusable across products?
Direct file upload	Snapshots, one-off references, archived documents	No (manual re-upload)	Yes
Connected app	Living content — folders, spaces, projects that update	Yes	Yes
Database connection	Your operational data — customers, orders, telemetry, catalogs	Yes (queried live)	Yes
Pasted text in a prompt	Quick context for a single conversation	No	No

For anything that changes over time, prefer connected apps or database connections. For finished snapshots, upload directly.

Databases are first-class sources. VDF AI Data isn't just for documents. The same product surface that searches your wiki can read from your data warehouse, your transactional database, or your analytics store — and use that data in EDA, feature discovery, semantic search, and fine-tuning workflows.

Uploading files directly

The simplest pattern. Drag a file in, it processes, it’s referenceable.

What to consider before uploading

Will this content change? If yes, a connected app is usually better. Uploads are snapshots.
Who should see this? Set visibility immediately after upload. The default may not match what you want.
Is it the right format? Plain text, PDF, DOCX, XLSX, CSV, PPTX, and transcripts all work well. Scanned PDFs require OCR — slower and sometimes lower quality.
Is it a sensible size? Very large files (hundreds of pages, gigabyte-scale spreadsheets) may take longer to process. Splitting into two files often gives better results than uploading a monster.

A useful naming convention

Files in Data are easier to reference when their names tell you what they are. A few patterns that pay off:

MSA-AcmeCorp-2025.pdf is searchable. final_v3.pdf is not.
Date prefixes (2025-03-12-call-transcript.txt) make chronological searches easier.
Project tags in the name (alpha-onboarding-checklist.docx) help filter when you have hundreds of files.

Connecting apps

This is where the real power lives. A connection means VDF AI can read live content from an app — and you don’t have to upload, re-upload, or manually sync.

Common apps you can connect

Google — Drive folders, Docs, Sheets, Slides
Microsoft — OneDrive, SharePoint, Outlook, Teams
Confluence — Spaces and pages
Jira — Projects, boards, individual tickets
GitHub — Repos, issues, pull requests
Slack — Channels and conversations
Zoom — Recorded meetings and transcripts
GitBook — Spaces and documents
Box — Folders and files

Your workspace may have additional connectors. Check the Connections area to see what’s available.

How a connection works

You start the connection.
From the Connections area, pick the app and click "Connect."
The app asks you to authenticate.
Sign in with your account on that platform. This proves to the app that the connection is authorized by you.
The app asks what to share.
You choose what VDF AI should be able to access — a folder, a space, a project, a channel. Start narrow.
The connection becomes active.
VDF AI can now reference content from the scoped area inside any conversation, agent, or network.
The connection refreshes over time.
Some apps auto-refresh; others may need manual reauthorization periodically.

Scoping a connection right

The single best decision you make when connecting is what to scope it to.

Start with the smallest useful scope. A connected folder full of relevant content beats a connected drive full of everything. Narrow scopes produce sharper answers because there's less noise to wade through.

A useful pattern:

Connect by project, not by drive. “Q3 launches” folder, not all of Drive.
Connect by team’s working space, not the company root. Your team’s Confluence space, not the whole Confluence instance.
Connect by active project, not archive. Current Jira project, not every ticket ever.

You can always expand later. Tightening a too-broad connection means re-scoping and possibly re-authenticating.

Connecting databases

For structured data — customers, orders, products, transactions, telemetry, catalogs — VDF AI Data connects directly to your databases. The same Data area that hosts your files and connected apps also holds your database connections, and the same downstream products (search, agents, networks, EDA, feature engineering) can use them.

Supported database types

VDF AI Data ships with first-party connectors for the most common operational and analytical stores. From the Data Connections screen, choose the type that matches your source:

PostgreSQL

The most common transactional database. Works with managed (RDS, Cloud SQL, Azure) and self-hosted Postgres.

MySQL

Including MariaDB-compatible deployments and managed MySQL on every major cloud.

Microsoft SQL Server

On-prem and Azure SQL. Pair with your existing service account or a connection-scoped read-only login.

Oracle

Enterprise Oracle deployments, including the standard listener/service-name configuration.

SAP HANA

SAP HANA Cloud and on-premise. Use a read-scoped database user with access to the schemas you want to make available.

Exasol

For analytics workloads sitting on Exasol's MPP database.

Presto

Connect to your Presto cluster as a federated query layer over multiple underlying stores.

JDBC (generic)

Anything with a JDBC driver. Use this when your source isn't a first-class option above — Snowflake, BigQuery via JDBC, Redshift, Trino, Vertica, and more.

Jira

Jira projects can also be added as a structured connection, useful when you want issues as queryable data rather than documents.

If your store isn’t named above, the JDBC option covers most stores with a published JDBC driver. Email us if you’d like a first-class connector for something specific.

What a database connection looks like

Each connection is a small set of fields. The screen shows them grouped so it’s clear what’s identity, what’s network, and what’s secret.

Field	What it’s for
Name	A friendly label your team will recognize (“Production Orders DB”, “Analytics Warehouse”).
Type	The database type — PostgreSQL, MySQL, Oracle, JDBC, etc.
Status	Where the connection is in its lifecycle (see the next section).
Database / Store	The database, schema, or catalog name to scope this connection to.
Host & port	The network address VDF AI Data uses to reach your database.
Credentials	A read-scoped username and password (or token). Stored encrypted; never shown back to anyone after save.
Description	Optional. A one-line note so your team knows what this connection is for.
Assets (known count)	The expected number of tables, views, or objects on the other side — used to validate the connection is seeing the right scope.

For credentials, you can either paste them directly or reference a secret managed elsewhere (your vault, your platform’s secrets store). Direct paste is faster for a first connection; secret references are the right pattern for production.

Use a read-only database user. VDF AI Data only ever reads — but defense in depth means the database account it uses should also only be able to read. Create a dedicated user, grant SELECT on the schemas you want available, and nothing else.

Connection states

A database connection moves through a small set of states. Watch the status indicator on the connection card.

State	What it means	What to do
Configuring	The connection is being defined; not yet active	Fill in the remaining fields and save
Connected	The connection is live; VDF AI Data can read from it	Use it in downstream products (EDA, search, feature discovery)
Needs attention	Authentication failed, host unreachable, or scope changed	Update credentials or re-scope; re-test
Paused	Temporarily disabled (typically by a workspace admin)	Resume from the connection menu when ready

Scoping a database connection well

The same principle as scoping connected apps — narrower is better.

Scope by database or schema, not “all databases”. “Production Orders” beats “everything the user can see.”
Use a dedicated read-only login. Don’t reuse the application’s database user.
Allow only the network paths you need. From the host running VDF AI Data to the database host — nothing more.
Document the connection. Use the Description field to record who owns the database and where to ask if something changes.

What you can do with a connected database

Once a database is connected, it becomes a first-class source across the rest of the Data area:

Exploratory Data Analysis (EDA) — profile tables, see column stats, find outliers, surface relationships without writing queries.
Feature engineering — build feature lists, run feature discovery across tables, and map feature associations across your schema.
Vector indexing — Vector DB Builder can produce semantic indexes over text-heavy columns so chats and agents can search them by meaning.
Fine-tune data preparation — assemble training datasets from your real production data.
Semantic search — answer natural-language questions over your structured data with citations back to specific tables and rows.

Each of these surfaces is just one click away from the connection — once your database is connected, every other VDF AI Data capability becomes available against it.

Refreshing and re-testing a connection

From the connection’s detail panel you can:

Test the connection. Confirms the host is reachable, the credentials are valid, and the scoped database/schema exists.
Refresh asset inventory. Re-discovers the tables, views, and columns currently in the scoped database.
Update credentials. Replace a password or token without breaking the connection’s identity.
Pause or remove. Disable the connection without deleting its history, or fully remove it when no longer needed.

A useful habit: test every database connection on the day you spin up a new VDF AI Data environment. A 30-second test catches firewall rules and credential drift before a user notices.

A note on what stays on your side

Database connections are pull-on-demand. VDF AI Data doesn’t copy your tables wholesale into its own store — it queries the database when a downstream product asks. That means:

Your data stays in your database of record.
You stay in control of who-can-see-what at the database level.
You can pause or revoke a connection any time and reads stop immediately.

For the vector indexing and fine-tune dataset workflows, the relevant data is read once per build; you can re-build at your cadence. See Searching your knowledge for how semantic search uses both document and database sources.

Keeping connections fresh

A great connection on day one can degrade over time. A few things to watch:

Auto-refresh and manual refresh

Most connections refresh on their own — VDF AI checks for new content periodically. Some workspaces also offer manual refresh: a button that forces an immediate refresh when you know content just changed.

Use manual refresh when:

You just edited a doc and want to query it immediately.
You added a new folder to a connected drive.
You renamed or restructured connected content.

When a connection fails

You’ll see a notification in the Connections area. Common causes:

Permissions changed. Someone removed your access to the scoped area, or the app’s permissions model changed.
Authentication expired. Reauthorize to refresh the token.
The scoped content was moved or deleted. Re-scope to the new location.

Failed connections aren’t catastrophic — they just stop returning new results. Your existing references continue to work until the connection is back.

A monthly cleanup ritual

Once a month, ten minutes:

Open the Connections area.
Look at each connection’s status (active, needs attention, stale).
Reauthorize anything in “needs attention.”
Disconnect anything you no longer use.

A clean connections list produces sharper answers. A bloated one produces noisy ones.

What’s private, what’s shared

Visibility in Data has three usual levels:

Personal — only you can reference this source.
Team-shared — your team can reference it.
Workspace-shared — anyone in your workspace can reference it.

For new sources, the default visibility depends on workspace settings. Check after every upload or connection — visibility is one of the most common surprise misconfigurations.

Sensitive content needs deliberate scoping. Customer-specific data, internal financial details, or HR documents should be scoped narrowly and reviewed on a recurring schedule. See Privacy & Security for the full picture.

Permissions, in plain language

A common question: “Can VDF AI see things I shouldn’t see?”

No. Connections honor the access permissions of the account that authorized them. If your account can see a folder, the connection can see that folder. If your account can’t see a folder, the connection can’t either.

That means:

A team member’s connection sees what their account sees — not what your account sees.
If your access to a folder changes, the connection’s access changes too.
VDF AI can’t “elevate” through a connection — it has only the permissions you granted.

For tighter control over what a workspace’s AI can read, your workspace admin can scope connections at the workspace level.

Removing or replacing a source

To remove an uploaded file: delete it from the Data area. References to it in past conversations remain (as a record of what was asked) but new conversations will no longer see it.

To remove a connection: disconnect from the Connections area. The connection’s content stops being referenceable; the source app is unaffected.

To replace a source: upload the new file or rescope the connection. There’s no “version 2 of the same source” pattern — just remove the old and add the new.

A clean Data area is a multiplier

Teams that succeed with VDF AI tend to share a habit: they treat their Data area as a real piece of team infrastructure. They name files thoughtfully, scope connections tightly, refresh on a cadence, and clean out the stale.

The teams that don’t end up with noisy, drifting Data — and a slow degradation of every answer the AI produces.

Where to go next

Searching your knowledge — how to ask great questions across the sources you’ve connected.
Use cases — six worked examples.
Privacy & Security — the full data-handling picture.

Four ways to get content into VDF AI Data

Uploading files directly

What to consider before uploading

A useful naming convention

Connecting apps

Common apps you can connect

How a connection works

Scoping a connection right

Connecting databases

Supported database types

PostgreSQL

MySQL

Microsoft SQL Server

Oracle

SAP HANA

Exasol

Presto

JDBC (generic)

Jira

What a database connection looks like

Connection states

Scoping a database connection well

What you can do with a connected database

Refreshing and re-testing a connection

A note on what stays on your side

Keeping connections fresh

Auto-refresh and manual refresh

When a connection fails

A monthly cleanup ritual

What’s private, what’s shared

Permissions, in plain language

Removing or replacing a source

A clean Data area is a multiplier

Where to go next

Supported Database Types in VDF AI Data: PostgreSQL, MySQL, SQL Server, Oracle, SAP HANA, Exasol, Presto, JDBC and Jira

The AI Agent Governance Failure Checklist: 12 Controls Enterprises Need Before Scaling Autonomous Workflows

Four ways to get content into VDF AI Data

Uploading files directly

What to consider before uploading

A useful naming convention

Connecting apps

Common apps you can connect

How a connection works

Scoping a connection right

Connecting databases

Supported database types

PostgreSQL

MySQL

Microsoft SQL Server

Oracle

SAP HANA

Exasol

Presto

JDBC (generic)

Jira

What a database connection looks like

Connection states

Scoping a database connection well

What you can do with a connected database

Refreshing and re-testing a connection

A note on what stays on your side

Keeping connections fresh

Auto-refresh and manual refresh

When a connection fails

A monthly cleanup ritual

What’s private, what’s shared

Permissions, in plain language

Removing or replacing a source

A clean Data area is a multiplier

Where to go next

Supported Database Types in VDF AI Data: PostgreSQL, MySQL, SQL Server, Oracle, SAP HANA, Exasol, Presto, JDBC and Jira

The AI Agent Governance Failure Checklist: 12 Controls Enterprises Need Before Scaling Autonomous Workflows

Request a Demo

Thank You!