Four ways to get content into VDF AI Data
Each method fits a different shape of source. Most teams end up using more than one — files for snapshots, connected apps for living docs, databases for structured data, and pasted text for quick moments.
| Method | Best for | Stays current? | Reusable across products? |
|---|---|---|---|
| Direct file upload | Snapshots, one-off references, archived documents | No (manual re-upload) | Yes |
| Connected app | Living content — folders, spaces, projects that update | Yes | Yes |
| Database connection | Your operational data — customers, orders, telemetry, catalogs | Yes (queried live) | Yes |
| Pasted text in a prompt | Quick context for a single conversation | No | No |
For anything that changes over time, prefer connected apps or database connections. For finished snapshots, upload directly.
Databases are first-class sources. VDF AI Data isn't just for documents. The same product surface that searches your wiki can read from your data warehouse, your transactional database, or your analytics store — and use that data in EDA, feature discovery, semantic search, and fine-tuning workflows.
Uploading files directly
The simplest pattern. Drag a file in, it processes, it’s referenceable.
What to consider before uploading
- Will this content change? If yes, a connected app is usually better. Uploads are snapshots.
- Who should see this? Set visibility immediately after upload. The default may not match what you want.
- Is it the right format? Plain text, PDF, DOCX, XLSX, CSV, PPTX, and transcripts all work well. Scanned PDFs require OCR — slower and sometimes lower quality.
- Is it a sensible size? Very large files (hundreds of pages, gigabyte-scale spreadsheets) may take longer to process. Splitting into two files often gives better results than uploading a monster.
A useful naming convention
Files in Data are easier to reference when their names tell you what they are. A few patterns that pay off:
MSA-AcmeCorp-2025.pdfis searchable.final_v3.pdfis not.- Date prefixes (
2025-03-12-call-transcript.txt) make chronological searches easier. - Project tags in the name (
alpha-onboarding-checklist.docx) help filter when you have hundreds of files.
Connecting apps
This is where the real power lives. A connection means VDF AI can read live content from an app — and you don’t have to upload, re-upload, or manually sync.
Common apps you can connect
- Google — Drive folders, Docs, Sheets, Slides
- Microsoft — OneDrive, SharePoint, Outlook, Teams
- Confluence — Spaces and pages
- Jira — Projects, boards, individual tickets
- GitHub — Repos, issues, pull requests
- Slack — Channels and conversations
- Zoom — Recorded meetings and transcripts
- GitBook — Spaces and documents
- Box — Folders and files
Your workspace may have additional connectors. Check the Connections area to see what’s available.
How a connection works
-
You start the connection.
From the Connections area, pick the app and click "Connect."
-
The app asks you to authenticate.
Sign in with your account on that platform. This proves to the app that the connection is authorized by you.
-
The app asks what to share.
You choose what VDF AI should be able to access — a folder, a space, a project, a channel. Start narrow.
-
The connection becomes active.
VDF AI can now reference content from the scoped area inside any conversation, agent, or network.
-
The connection refreshes over time.
Some apps auto-refresh; others may need manual reauthorization periodically.
Scoping a connection right
The single best decision you make when connecting is what to scope it to.
Start with the smallest useful scope. A connected folder full of relevant content beats a connected drive full of everything. Narrow scopes produce sharper answers because there's less noise to wade through.
A useful pattern:
- Connect by project, not by drive. “Q3 launches” folder, not all of Drive.
- Connect by team’s working space, not the company root. Your team’s Confluence space, not the whole Confluence instance.
- Connect by active project, not archive. Current Jira project, not every ticket ever.
You can always expand later. Tightening a too-broad connection means re-scoping and possibly re-authenticating.
Connecting databases
For structured data — customers, orders, products, transactions, telemetry, catalogs — VDF AI Data connects directly to your databases. The same Data area that hosts your files and connected apps also holds your database connections, and the same downstream products (search, agents, networks, EDA, feature engineering) can use them.
Supported database types
VDF AI Data ships with first-party connectors for the most common operational and analytical stores. From the Data Connections screen, choose the type that matches your source:
PostgreSQL
The most common transactional database. Works with managed (RDS, Cloud SQL, Azure) and self-hosted Postgres.
MySQL
Including MariaDB-compatible deployments and managed MySQL on every major cloud.
Microsoft SQL Server
On-prem and Azure SQL. Pair with your existing service account or a connection-scoped read-only login.
Oracle
Enterprise Oracle deployments, including the standard listener/service-name configuration.
SAP HANA
SAP HANA Cloud and on-premise. Use a read-scoped database user with access to the schemas you want to make available.
Exasol
For analytics workloads sitting on Exasol's MPP database.
Presto
Connect to your Presto cluster as a federated query layer over multiple underlying stores.
JDBC (generic)
Anything with a JDBC driver. Use this when your source isn't a first-class option above — Snowflake, BigQuery via JDBC, Redshift, Trino, Vertica, and more.
Jira
Jira projects can also be added as a structured connection, useful when you want issues as queryable data rather than documents.
If your store isn’t named above, the JDBC option covers most stores with a published JDBC driver. Email us if you’d like a first-class connector for something specific.
What a database connection looks like
Each connection is a small set of fields. The screen shows them grouped so it’s clear what’s identity, what’s network, and what’s secret.
| Field | What it’s for |
|---|---|
| Name | A friendly label your team will recognize (“Production Orders DB”, “Analytics Warehouse”). |
| Type | The database type — PostgreSQL, MySQL, Oracle, JDBC, etc. |
| Status | Where the connection is in its lifecycle (see the next section). |
| Database / Store | The database, schema, or catalog name to scope this connection to. |
| Host & port | The network address VDF AI Data uses to reach your database. |
| Credentials | A read-scoped username and password (or token). Stored encrypted; never shown back to anyone after save. |
| Description | Optional. A one-line note so your team knows what this connection is for. |
| Assets (known count) | The expected number of tables, views, or objects on the other side — used to validate the connection is seeing the right scope. |
For credentials, you can either paste them directly or reference a secret managed elsewhere (your vault, your platform’s secrets store). Direct paste is faster for a first connection; secret references are the right pattern for production.
Use a read-only database user. VDF AI Data only ever reads — but defense in depth means the database account it uses should also only be able to read. Create a dedicated user, grant SELECT on the schemas you want available, and nothing else.
Connection states
A database connection moves through a small set of states. Watch the status indicator on the connection card.
| State | What it means | What to do |
|---|---|---|
| Configuring | The connection is being defined; not yet active | Fill in the remaining fields and save |
| Connected | The connection is live; VDF AI Data can read from it | Use it in downstream products (EDA, search, feature discovery) |
| Needs attention | Authentication failed, host unreachable, or scope changed | Update credentials or re-scope; re-test |
| Paused | Temporarily disabled (typically by a workspace admin) | Resume from the connection menu when ready |
Scoping a database connection well
The same principle as scoping connected apps — narrower is better.
- Scope by database or schema, not “all databases”. “Production Orders” beats “everything the user can see.”
- Use a dedicated read-only login. Don’t reuse the application’s database user.
- Allow only the network paths you need. From the host running VDF AI Data to the database host — nothing more.
- Document the connection. Use the Description field to record who owns the database and where to ask if something changes.
What you can do with a connected database
Once a database is connected, it becomes a first-class source across the rest of the Data area:
- Exploratory Data Analysis (EDA) — profile tables, see column stats, find outliers, surface relationships without writing queries.
- Feature engineering — build feature lists, run feature discovery across tables, and map feature associations across your schema.
- Vector indexing — Vector DB Builder can produce semantic indexes over text-heavy columns so chats and agents can search them by meaning.
- Fine-tune data preparation — assemble training datasets from your real production data.
- Semantic search — answer natural-language questions over your structured data with citations back to specific tables and rows.
Each of these surfaces is just one click away from the connection — once your database is connected, every other VDF AI Data capability becomes available against it.
Refreshing and re-testing a connection
From the connection’s detail panel you can:
- Test the connection. Confirms the host is reachable, the credentials are valid, and the scoped database/schema exists.
- Refresh asset inventory. Re-discovers the tables, views, and columns currently in the scoped database.
- Update credentials. Replace a password or token without breaking the connection’s identity.
- Pause or remove. Disable the connection without deleting its history, or fully remove it when no longer needed.
A useful habit: test every database connection on the day you spin up a new VDF AI Data environment. A 30-second test catches firewall rules and credential drift before a user notices.
A note on what stays on your side
Database connections are pull-on-demand. VDF AI Data doesn’t copy your tables wholesale into its own store — it queries the database when a downstream product asks. That means:
- Your data stays in your database of record.
- You stay in control of who-can-see-what at the database level.
- You can pause or revoke a connection any time and reads stop immediately.
For the vector indexing and fine-tune dataset workflows, the relevant data is read once per build; you can re-build at your cadence. See Searching your knowledge for how semantic search uses both document and database sources.
Keeping connections fresh
A great connection on day one can degrade over time. A few things to watch:
Auto-refresh and manual refresh
Most connections refresh on their own — VDF AI checks for new content periodically. Some workspaces also offer manual refresh: a button that forces an immediate refresh when you know content just changed.
Use manual refresh when:
- You just edited a doc and want to query it immediately.
- You added a new folder to a connected drive.
- You renamed or restructured connected content.
When a connection fails
You’ll see a notification in the Connections area. Common causes:
- Permissions changed. Someone removed your access to the scoped area, or the app’s permissions model changed.
- Authentication expired. Reauthorize to refresh the token.
- The scoped content was moved or deleted. Re-scope to the new location.
Failed connections aren’t catastrophic — they just stop returning new results. Your existing references continue to work until the connection is back.
A monthly cleanup ritual
Once a month, ten minutes:
- Open the Connections area.
- Look at each connection’s status (active, needs attention, stale).
- Reauthorize anything in “needs attention.”
- Disconnect anything you no longer use.
A clean connections list produces sharper answers. A bloated one produces noisy ones.
What’s private, what’s shared
Visibility in Data has three usual levels:
- Personal — only you can reference this source.
- Team-shared — your team can reference it.
- Workspace-shared — anyone in your workspace can reference it.
For new sources, the default visibility depends on workspace settings. Check after every upload or connection — visibility is one of the most common surprise misconfigurations.
Sensitive content needs deliberate scoping. Customer-specific data, internal financial details, or HR documents should be scoped narrowly and reviewed on a recurring schedule. See Privacy & Security for the full picture.
Permissions, in plain language
A common question: “Can VDF AI see things I shouldn’t see?”
No. Connections honor the access permissions of the account that authorized them. If your account can see a folder, the connection can see that folder. If your account can’t see a folder, the connection can’t either.
That means:
- A team member’s connection sees what their account sees — not what your account sees.
- If your access to a folder changes, the connection’s access changes too.
- VDF AI can’t “elevate” through a connection — it has only the permissions you granted.
For tighter control over what a workspace’s AI can read, your workspace admin can scope connections at the workspace level.
Removing or replacing a source
To remove an uploaded file: delete it from the Data area. References to it in past conversations remain (as a record of what was asked) but new conversations will no longer see it.
To remove a connection: disconnect from the Connections area. The connection’s content stops being referenceable; the source app is unaffected.
To replace a source: upload the new file or rescope the connection. There’s no “version 2 of the same source” pattern — just remove the old and add the new.
A clean Data area is a multiplier
Teams that succeed with VDF AI tend to share a habit: they treat their Data area as a real piece of team infrastructure. They name files thoughtfully, scope connections tightly, refresh on a cadence, and clean out the stale.
The teams that don’t end up with noisy, drifting Data — and a slow degradation of every answer the AI produces.
Where to go next
- Searching your knowledge — how to ask great questions across the sources you’ve connected.
- Use cases — six worked examples.
- Privacy & Security — the full data-handling picture.