VDF AI Data

Connecting sources

Bring files, connected apps, and databases into VDF AI as reliable, governed sources — and keep them fresh, scoped, and well-permissioned over time.

3 min read
On this page

Four ways to get content into VDF AI Data

Each method fits a different shape of source. Most teams end up using more than one — files for snapshots, connected apps for living docs, databases for structured data, and pasted text for quick moments.

MethodBest forStays current?Reusable across products?
Direct file uploadSnapshots, one-off references, archived documentsNo (manual re-upload)Yes
Connected appLiving content — folders, spaces, projects that updateYesYes
Database connectionYour operational data — customers, orders, telemetry, catalogsYes (queried live)Yes
Pasted text in a promptQuick context for a single conversationNoNo

For anything that changes over time, prefer connected apps or database connections. For finished snapshots, upload directly.

Databases are first-class sources. VDF AI Data isn't just for documents. The same product surface that searches your wiki can read from your data warehouse, your transactional database, or your analytics store — and use that data in EDA, feature discovery, semantic search, and fine-tuning workflows.

Uploading files directly

The simplest pattern. Drag a file in, it processes, it’s referenceable.

What to consider before uploading

  • Will this content change? If yes, a connected app is usually better. Uploads are snapshots.
  • Who should see this? Set visibility immediately after upload. The default may not match what you want.
  • Is it the right format? Plain text, PDF, DOCX, XLSX, CSV, PPTX, and transcripts all work well. Scanned PDFs require OCR — slower and sometimes lower quality.
  • Is it a sensible size? Very large files (hundreds of pages, gigabyte-scale spreadsheets) may take longer to process. Splitting into two files often gives better results than uploading a monster.

A useful naming convention

Files in Data are easier to reference when their names tell you what they are. A few patterns that pay off:

  • MSA-AcmeCorp-2025.pdf is searchable. final_v3.pdf is not.
  • Date prefixes (2025-03-12-call-transcript.txt) make chronological searches easier.
  • Project tags in the name (alpha-onboarding-checklist.docx) help filter when you have hundreds of files.

Connecting apps

This is where the real power lives. A connection means VDF AI can read live content from an app — and you don’t have to upload, re-upload, or manually sync.

Common apps you can connect

  • Google — Drive folders, Docs, Sheets, Slides
  • Microsoft — OneDrive, SharePoint, Outlook, Teams
  • Confluence — Spaces and pages
  • Jira — Projects, boards, individual tickets
  • GitHub — Repos, issues, pull requests
  • Slack — Channels and conversations
  • Zoom — Recorded meetings and transcripts
  • GitBook — Spaces and documents
  • Box — Folders and files

Your workspace may have additional connectors. Check the Connections area to see what’s available.

How a connection works

  1. You start the connection.

    From the Connections area, pick the app and click "Connect."

  2. The app asks you to authenticate.

    Sign in with your account on that platform. This proves to the app that the connection is authorized by you.

  3. The app asks what to share.

    You choose what VDF AI should be able to access — a folder, a space, a project, a channel. Start narrow.

  4. The connection becomes active.

    VDF AI can now reference content from the scoped area inside any conversation, agent, or network.

  5. The connection refreshes over time.

    Some apps auto-refresh; others may need manual reauthorization periodically.

Scoping a connection right

The single best decision you make when connecting is what to scope it to.

Start with the smallest useful scope. A connected folder full of relevant content beats a connected drive full of everything. Narrow scopes produce sharper answers because there's less noise to wade through.

A useful pattern:

  • Connect by project, not by drive. “Q3 launches” folder, not all of Drive.
  • Connect by team’s working space, not the company root. Your team’s Confluence space, not the whole Confluence instance.
  • Connect by active project, not archive. Current Jira project, not every ticket ever.

You can always expand later. Tightening a too-broad connection means re-scoping and possibly re-authenticating.

Connecting databases

For structured data — customers, orders, products, transactions, telemetry, catalogs — VDF AI Data connects directly to your databases. The same Data area that hosts your files and connected apps also holds your database connections, and the same downstream products (search, agents, networks, EDA, feature engineering) can use them.

Supported database types

VDF AI Data ships with first-party connectors for the most common operational and analytical stores. From the Data Connections screen, choose the type that matches your source:

PostgreSQL

The most common transactional database. Works with managed (RDS, Cloud SQL, Azure) and self-hosted Postgres.

MySQL

Including MariaDB-compatible deployments and managed MySQL on every major cloud.

Microsoft SQL Server

On-prem and Azure SQL. Pair with your existing service account or a connection-scoped read-only login.

Oracle

Enterprise Oracle deployments, including the standard listener/service-name configuration.

SAP HANA

SAP HANA Cloud and on-premise. Use a read-scoped database user with access to the schemas you want to make available.

Exasol

For analytics workloads sitting on Exasol's MPP database.

Presto

Connect to your Presto cluster as a federated query layer over multiple underlying stores.

JDBC (generic)

Anything with a JDBC driver. Use this when your source isn't a first-class option above — Snowflake, BigQuery via JDBC, Redshift, Trino, Vertica, and more.

Jira

Jira projects can also be added as a structured connection, useful when you want issues as queryable data rather than documents.

If your store isn’t named above, the JDBC option covers most stores with a published JDBC driver. Email us if you’d like a first-class connector for something specific.

What a database connection looks like

Each connection is a small set of fields. The screen shows them grouped so it’s clear what’s identity, what’s network, and what’s secret.

FieldWhat it’s for
NameA friendly label your team will recognize (“Production Orders DB”, “Analytics Warehouse”).
TypeThe database type — PostgreSQL, MySQL, Oracle, JDBC, etc.
StatusWhere the connection is in its lifecycle (see the next section).
Database / StoreThe database, schema, or catalog name to scope this connection to.
Host & portThe network address VDF AI Data uses to reach your database.
CredentialsA read-scoped username and password (or token). Stored encrypted; never shown back to anyone after save.
DescriptionOptional. A one-line note so your team knows what this connection is for.
Assets (known count)The expected number of tables, views, or objects on the other side — used to validate the connection is seeing the right scope.

For credentials, you can either paste them directly or reference a secret managed elsewhere (your vault, your platform’s secrets store). Direct paste is faster for a first connection; secret references are the right pattern for production.

Use a read-only database user. VDF AI Data only ever reads — but defense in depth means the database account it uses should also only be able to read. Create a dedicated user, grant SELECT on the schemas you want available, and nothing else.

Connection states

A database connection moves through a small set of states. Watch the status indicator on the connection card.

StateWhat it meansWhat to do
ConfiguringThe connection is being defined; not yet activeFill in the remaining fields and save
ConnectedThe connection is live; VDF AI Data can read from itUse it in downstream products (EDA, search, feature discovery)
Needs attentionAuthentication failed, host unreachable, or scope changedUpdate credentials or re-scope; re-test
PausedTemporarily disabled (typically by a workspace admin)Resume from the connection menu when ready

Scoping a database connection well

The same principle as scoping connected apps — narrower is better.

  • Scope by database or schema, not “all databases”. “Production Orders” beats “everything the user can see.”
  • Use a dedicated read-only login. Don’t reuse the application’s database user.
  • Allow only the network paths you need. From the host running VDF AI Data to the database host — nothing more.
  • Document the connection. Use the Description field to record who owns the database and where to ask if something changes.

What you can do with a connected database

Once a database is connected, it becomes a first-class source across the rest of the Data area:

  • Exploratory Data Analysis (EDA) — profile tables, see column stats, find outliers, surface relationships without writing queries.
  • Feature engineering — build feature lists, run feature discovery across tables, and map feature associations across your schema.
  • Vector indexing — Vector DB Builder can produce semantic indexes over text-heavy columns so chats and agents can search them by meaning.
  • Fine-tune data preparation — assemble training datasets from your real production data.
  • Semantic search — answer natural-language questions over your structured data with citations back to specific tables and rows.

Each of these surfaces is just one click away from the connection — once your database is connected, every other VDF AI Data capability becomes available against it.

Refreshing and re-testing a connection

From the connection’s detail panel you can:

  • Test the connection. Confirms the host is reachable, the credentials are valid, and the scoped database/schema exists.
  • Refresh asset inventory. Re-discovers the tables, views, and columns currently in the scoped database.
  • Update credentials. Replace a password or token without breaking the connection’s identity.
  • Pause or remove. Disable the connection without deleting its history, or fully remove it when no longer needed.

A useful habit: test every database connection on the day you spin up a new VDF AI Data environment. A 30-second test catches firewall rules and credential drift before a user notices.

A note on what stays on your side

Database connections are pull-on-demand. VDF AI Data doesn’t copy your tables wholesale into its own store — it queries the database when a downstream product asks. That means:

  • Your data stays in your database of record.
  • You stay in control of who-can-see-what at the database level.
  • You can pause or revoke a connection any time and reads stop immediately.

For the vector indexing and fine-tune dataset workflows, the relevant data is read once per build; you can re-build at your cadence. See Searching your knowledge for how semantic search uses both document and database sources.

Keeping connections fresh

A great connection on day one can degrade over time. A few things to watch:

Auto-refresh and manual refresh

Most connections refresh on their own — VDF AI checks for new content periodically. Some workspaces also offer manual refresh: a button that forces an immediate refresh when you know content just changed.

Use manual refresh when:

  • You just edited a doc and want to query it immediately.
  • You added a new folder to a connected drive.
  • You renamed or restructured connected content.

When a connection fails

You’ll see a notification in the Connections area. Common causes:

  • Permissions changed. Someone removed your access to the scoped area, or the app’s permissions model changed.
  • Authentication expired. Reauthorize to refresh the token.
  • The scoped content was moved or deleted. Re-scope to the new location.

Failed connections aren’t catastrophic — they just stop returning new results. Your existing references continue to work until the connection is back.

A monthly cleanup ritual

Once a month, ten minutes:

  1. Open the Connections area.
  2. Look at each connection’s status (active, needs attention, stale).
  3. Reauthorize anything in “needs attention.”
  4. Disconnect anything you no longer use.

A clean connections list produces sharper answers. A bloated one produces noisy ones.

What’s private, what’s shared

Visibility in Data has three usual levels:

  • Personal — only you can reference this source.
  • Team-shared — your team can reference it.
  • Workspace-shared — anyone in your workspace can reference it.

For new sources, the default visibility depends on workspace settings. Check after every upload or connection — visibility is one of the most common surprise misconfigurations.

Sensitive content needs deliberate scoping. Customer-specific data, internal financial details, or HR documents should be scoped narrowly and reviewed on a recurring schedule. See Privacy & Security for the full picture.

Permissions, in plain language

A common question: “Can VDF AI see things I shouldn’t see?”

No. Connections honor the access permissions of the account that authorized them. If your account can see a folder, the connection can see that folder. If your account can’t see a folder, the connection can’t either.

That means:

  • A team member’s connection sees what their account sees — not what your account sees.
  • If your access to a folder changes, the connection’s access changes too.
  • VDF AI can’t “elevate” through a connection — it has only the permissions you granted.

For tighter control over what a workspace’s AI can read, your workspace admin can scope connections at the workspace level.

Removing or replacing a source

To remove an uploaded file: delete it from the Data area. References to it in past conversations remain (as a record of what was asked) but new conversations will no longer see it.

To remove a connection: disconnect from the Connections area. The connection’s content stops being referenceable; the source app is unaffected.

To replace a source: upload the new file or rescope the connection. There’s no “version 2 of the same source” pattern — just remove the old and add the new.

A clean Data area is a multiplier

Teams that succeed with VDF AI tend to share a habit: they treat their Data area as a real piece of team infrastructure. They name files thoughtfully, scope connections tightly, refresh on a cadence, and clean out the stale.

The teams that don’t end up with noisy, drifting Data — and a slow degradation of every answer the AI produces.

Where to go next