VDF AI Data

Connecting databases

How to connect VDF AI Data to your transactional and analytical databases — PostgreSQL, MySQL, Oracle, SQL Server, SAP HANA, Presto, Exasol, Jira, and anything with a JDBC driver — and how to keep those connections healthy over time.

3 min read
On this page

Why a database connection is different from a file upload

A file upload is a snapshot. A database connection is a living source. The same row your team edits in your operational system this morning is the row VDF AI sees this afternoon — without anyone having to export, upload, or re-sync.

That live link is what turns VDF AI Data from a document-search layer into a real working layer over your business: customers, orders, products, inventory, telemetry, support tickets, anything that lives in a database. Every other thing the platform does — semantic search, exploratory analysis, feature engineering, fine-tuning preparation — can sit on top of it.

You don't need to be a database administrator to do this. If you have a hostname, a database name, and a read-only account, you have everything you need. The screens guide the rest.

Before you start

A handful of things make the first connection go smoothly. Have them ready:

  • A hostname. The address your database listens on. Often something like db-prod.acme.internal or a cloud-provider endpoint.
  • A database or schema name. The specific store you want VDF AI Data to look at — not the whole server.
  • A read-only username and password. Created just for VDF AI Data, with permission to read the schemas you want surfaced and nothing else.
  • Network reachability. From wherever VDF AI Data runs, your database must be reachable. If you use a private network, your platform team may need to allowlist a single egress address.

Always use a dedicated read-only account. Don't reuse the same login your application uses. A separate account makes it easy to audit what VDF AI Data is doing and to revoke access in one click if you ever need to.

The databases you can connect

VDF AI Data ships with first-party support for the most common operational and analytical databases.

PostgreSQL

The most common transactional database. Works with cloud-managed Postgres (RDS, Cloud SQL, Azure Database) and self-hosted instances.

MySQL

Includes MariaDB-compatible deployments and managed MySQL on every major cloud.

Microsoft SQL Server

On-premises SQL Server and Azure SQL. Pairs cleanly with an Active Directory service account or a connection-scoped login.

Oracle

Enterprise Oracle deployments, including the standard listener and service-name configuration.

SAP HANA

SAP HANA Cloud and on-premise. Useful when your team's data of record lives inside SAP.

Presto

Federated query layer over multiple underlying stores. One connection, many backends.

Exasol

For high-performance analytics on Exasol's MPP database.

Jira (as a structured source)

Connect Jira projects as queryable data — issues, fields, transitions — rather than as documents.

JDBC (everything else)

Anything with a published JDBC driver: Snowflake, BigQuery, Redshift, Trino, Vertica, and many more.

If your store isn’t listed by name, JDBC is almost always the answer. Tell us what you’re connecting to and we’ll confirm the right driver to use.

What setting up a connection looks like

You’ll see a single short form. Each field has a clear purpose.

FieldWhat it’s for
NameA friendly label your team recognizes — “Production Orders DB,” “Analytics Warehouse.”
TypeThe database type you’re connecting to.
Host & portHow VDF AI Data reaches your database over the network.
Database / schemaThe specific store this connection should look at.
CredentialsA read-only username and password. Stored encrypted; never displayed back.
DescriptionOptional. A line about what this connection is for, who owns it, and where to ask if something changes.

Two patterns we recommend from day one:

  1. Names that describe purpose, not infrastructure. “Customer-360 Warehouse” reads better than “redshift-prod-1.” Future you, six months from now, will be grateful.
  2. A short description on every connection. A sentence is enough. “Read-only mirror of our operational CRM, refreshed nightly. Owner: data-platform@” gives a teammate everything they need to act on it.

Testing a connection

After saving, run the Test connection action. VDF AI Data does three things:

  1. Reaches the host on the network.
  2. Authenticates with the credentials you provided.
  3. Confirms the database or schema you named exists and is readable.

If any step fails, the screen tells you exactly which one. Most early failures are network or scope, not credentials — a friendly reminder that your firewall rules and your service-account permissions are the things to check first.

Connection states, in plain language

Every connection moves through a short lifecycle. The status indicator on the connection card tells you where you are.

StateWhat it meansWhat to do
ConfiguringThe connection is being set up; not active yetFinish filling in the fields and save
ConnectedLive and ready; downstream products can read from itStart using it — discovery, EDA, search, fine-tuning
Needs attentionA test failed — authentication, host, or scopeUpdate what’s wrong and re-test
PausedTemporarily disabled by a workspace adminResume when you’re ready

Scoping a connection well

The most important decision is what to scope it to. The same principle that applies to connected apps applies twice as much to databases.

Connect a database to a schema, not an entire server. Narrower scope produces sharper answers and tighter security. You can always add a second connection later for another schema.

A few patterns that pay off:

  • One connection per logical purpose. “Production Orders,” “Marketing Events,” “Support Tickets” — not one mega-connection across everything.
  • Read-only at the database level. Defense in depth. Even if a downstream product tried to write, the database wouldn’t let it.
  • Document the owner. Use the Description field for “who owns this database and where to ask if something changes.”
  • Pair the connection with a refresh cadence. Decide once whether asset inventory refreshes nightly, on-demand, or both — and stick with it.

What you can do once a database is connected

A connected database becomes a first-class source across the rest of VDF AI Data. From the connection’s detail panel, every other capability is one click away:

You don’t have to plan to use all of these on day one. Connect, discover, and decide where to go from what you see.

What stays in your database (and what doesn’t)

This is the question almost everyone asks first. The short answer: your data stays in your database.

  • VDF AI Data queries your database on demand — it does not copy your tables wholesale.
  • Vector indexes and fine-tuning datasets are produced from a snapshot read at build time; you decide when to rebuild.
  • Pausing or removing a connection stops all reads immediately.
  • The database’s own access control stays in charge — VDF AI Data only ever sees what the connection account can see.

For the full picture of how VDF AI Data handles your data, see Privacy & Security.

Keeping connections healthy

A connection is a living relationship. A few small habits keep yours sharp.

Test on the day the environment was set up

Five minutes of testing right after setup catches the things that break later — firewall rules, password rotation policies, schema drift.

Refresh asset inventory after upstream changes

When your team adds or removes tables, the connection’s view of what’s available won’t update automatically. A manual Refresh inventory action keeps it in sync.

Rotate credentials on a calendar

If your organization rotates database passwords, set a reminder to update the connection at the same time. The connection’s status will move to “needs attention” the moment a rotation lands.

Document ownership

The Description field is small. Use it. “Owned by data-platform@. Source of truth for orders. Escalate via #data-platform on Slack.” That one line saves a teammate forty minutes.

A short troubleshooting list

SymptomLikely causeWhat to try
”Host unreachable” on testFirewall, VPC, or routingCheck egress allowlist; ask your platform team
”Authentication failed” on testWrong credentials or rotated passwordUpdate credentials and re-test
”Database not found” on testWrong database/schema name or the account can’t see itConfirm the name and the account’s grant
Connection works, but no tables appearAsset inventory hasn’t been run yetTrigger Refresh inventory from the connection
Worked yesterday, fails todayPassword rotation, network change, or the database was pausedRe-test; update credentials if needed

Where to go next