Skip to content

External tables (dbt-external-tables)

This adapter provides compatibility macros for the dbt-external-tables package, enabling you to define external data sources that query files stored in Azure Blob Storage, Azure Data Lake Storage, or Fabric OneLake directly from your Fabric Data Warehouse.

How it works

The dbt-external-tables package includes a Fabric plugin, but it uses CREATE EXTERNAL TABLE syntax designed for Azure Synapse, which Fabric Data Warehouse does not support. Instead, Fabric Data Warehouse uses the T-SQL OPENROWSET(BULK ...) function for external data access.

This adapter provides override macros that replace the package's Fabric plugin. When you run dbt run-operation stage_external_sources, the overridden macros create views that wrap OPENROWSET queries. This means:

  • External sources are queryable as regular views via {{ source('my_external', 'my_table') }}
  • Data is always fresh -- OPENROWSET reads directly from the file on each query
  • No refresh is needed (the refresh_external_table macro is a no-op)

Supported file formats: Parquet, CSV, JSONL.

Setup

1. Install dbt-external-tables

Add the package to your packages.yml:

packages:
  - package: dbt-labs/dbt_external_tables
    version: "0.11.0"  # or latest

Then run:

dbt deps

2. Configure dispatch

Add the following to your dbt_project.yml so that dbt-external-tables uses the OPENROWSET macros from this adapter instead of the Synapse-style macros bundled with the package:

dispatch:
  - macro_namespace: dbt_external_tables
    search_order: ['dbt', 'dbt_external_tables']

The dbt entry in the search order tells dbt to check the adapter's built-in macros before falling back to the package's defaults. This is how the adapter's OPENROWSET-based macros take priority over the package's CREATE EXTERNAL TABLE macros.

Quick start

You can try OPENROWSET with a CSV file stored in your Fabric OneLake. First, upload a file to your lakehouse's Files section, then define a source pointing to its OneLake URL.

Add this source to your sources.yml:

sources:
  - name: my_external
    schema: dbo
    tables:
      - name: sample_csv
        external:
          location: "https://onelake.dfs.fabric.microsoft.com/<workspace-id>/<lakehouse-id>/Files/data/sample.csv"
          file_format: csv
          options:
            header_row: "true"
            fieldterminator: ","
            parser_version: "2.0"
        columns:
          - name: id
            data_type: int
          - name: name
            data_type: "varchar(100)"
          - name: amount
            data_type: "decimal(10,2)"

Replace <workspace-id> and <lakehouse-id> with your Fabric workspace and lakehouse GUIDs. You can find these in the OneLake URL shown in the Fabric portal.

Stage the source and query it:

dbt run-operation stage_external_sources
-- In a dbt model
select *
from {{ source('my_external', 'sample_csv') }}
where amount > 20.00

Source configuration

Define external sources in your sources.yml files. The external property controls how OPENROWSET is called.

Parquet

sources:
  - name: my_external
    schema: dbo
    tables:
      - name: sales
        external:
          location: "https://onelake.dfs.fabric.microsoft.com/<workspace-id>/<lakehouse-id>/Files/data/sales.parquet"
          file_format: parquet
        columns:
          - name: id
            data_type: int
          - name: amount
            data_type: "decimal(10,2)"
          - name: sale_date
            data_type: datetime2(6)

For Parquet files, defining columns is optional. If omitted, OPENROWSET will automatically infer the schema from the file. Defining columns explicitly generates a WITH clause that gives you precise control over data types.

CSV

sources:
  - name: my_external
    schema: dbo
    tables:
      - name: transactions
        external:
          location: "https://myaccount.blob.core.windows.net/container/data/transactions.csv"
          file_format: csv
          options:
            header_row: "true"
            fieldterminator: ","
            parser_version: "2.0"
        columns:
          - name: id
            data_type: int
          - name: description
            data_type: "varchar(200)"
          - name: amount
            data_type: "decimal(10,2)"

JSONL

sources:
  - name: my_external
    schema: dbo
    tables:
      - name: events
        external:
          location: "https://myaccount.dfs.core.windows.net/container/events/latest.jsonl"
          file_format: jsonl

Auto-detected format

If file_format is not specified, the adapter detects the format from the file extension:

Extension Detected format
.parquet PARQUET
.csv, .tsv CSV
.jsonl, .ldjson, .ndjson JSONL

If the extension is not recognized and file_format is not set, OPENROWSET is called without a FORMAT option and Fabric will attempt to detect the format automatically.

Configuration reference

external.location

Required. The full URL to the file. Supported storage locations:

  • Azure Blob Storage: https://<account>.blob.core.windows.net/<container>/<path>
  • Azure Data Lake Storage Gen2: https://<account>.dfs.core.windows.net/<container>/<path>
  • Fabric OneLake: https://onelake.dfs.fabric.microsoft.com/<workspace-id>/<lakehouse-id>/Files/<path>

Wildcards are supported (e.g., *.parquet to read all Parquet files in a folder).

external.file_format

Optional. One of: parquet, csv, jsonl. Case-insensitive. If omitted, the format is detected from the file extension.

external.options

Optional dictionary of additional OPENROWSET options. These are format-specific:

Option Applies to Description
header_row CSV true or false -- whether the first row contains column names
fieldterminator CSV Character separating fields (default: ,)
rowterminator CSV Character separating rows (default: \r\n)
fieldquote CSV Character used to quote fields
escapechar CSV Escape character for special characters
parser_version CSV 1.0 or 2.0 (2.0 required for HEADER_ROW)
firstrow CSV, JSONL Row number to start reading from
codepage CSV, JSONL Character encoding
data_source All Name of an external data source (when using relative paths)
rows_per_batch All Hint for batch processing
maxerrors All Maximum number of errors before failing

Columns with data_type

When columns are defined with data_type, the adapter generates a WITH clause that explicitly maps columns to specific SQL types. This is recommended for CSV and JSONL files where automatic type inference may not produce the desired types.

Usage

After configuring your sources, stage the external tables:

dbt run-operation stage_external_sources

This creates views in your Fabric Data Warehouse. You can then reference them in your models:

select *
from {{ source('my_external', 'sales') }}
where sale_date > '2024-01-01'

To recreate all views (e.g., after changing the source configuration):

dbt run-operation stage_external_sources --vars 'ext_full_refresh: true'

Limitations

  • Fabric Data Warehouse does not support the DELTA format in OPENROWSET. For Delta Lake tables, use cross-database queries to a Lakehouse instead.
  • OPENROWSET queries may be slower than querying data stored directly in the warehouse. For frequently accessed data, consider ingesting it into warehouse tables.
  • Authentication to external storage is handled by Fabric. The files must be accessible from your Fabric workspace (either publicly accessible or via configured storage credentials).