Detailed comparison with microsoft/dbt-fabricspark¶

This report provides a detailed technical comparison between the FabricSpark adapter in this package and Microsoft's dedicated dbt-fabricspark repository. Both target the same compute engine -- Microsoft Fabric Lakehouse with Spark SQL via Livy sessions -- but take fundamentally different architectural approaches.

	dbt-fabric-samdebruyn	microsoft/dbt-fabricspark
PyPI package	`dbt-fabric-samdebruyn[spark]`	`dbt-fabricspark`
Latest version	v1.11.3b0	v1.11.0

Last updated: 2026-05-16

Architecture¶

This is the most significant difference and influences nearly every other comparison point.

This adapter: multiple inheritance from dbt-spark¶

This adapter's FabricSpark adapter uses multiple inheritance: FabricSparkAdapter(BaseFabricAdapter, SparkAdapter). It inherits from dbt-spark's SparkAdapter and a shared BaseFabricAdapter also used by the T-SQL adapter.

Plugin registration declares dependencies=["spark"], so dbt-spark's macros are available at runtime.
Adapter code is thin (~749 LOC) because it delegates heavily to dbt-spark and the shared base.
Macros (24 files) are primarily overrides of dbt-spark macros for Fabric-specific behavior.

Upstream: standalone SQLAdapter¶

The upstream is fully standalone: FabricSparkAdapter(SQLAdapter). No dbt-spark dependency.

Plugin registration has no dependencies -- all Spark SQL behavior is self-contained.
Adapter code is significantly larger (~4,387 LOC) because it reimplements everything dbt-spark would provide.
Macros (34 files) include utility functions normally inherited from dbt-spark.

Aspect	dbt-fabric-samdebruyn	microsoft/dbt-fabricspark
Code reuse	High (inherits dbt-spark + shared base)	None (self-contained)
Maintenance burden	Lower per-adapter, coupled to dbt-spark	Higher total LOC, no external coupling
dbt-spark compatibility	Automatic (inherits macros/behaviors)	Manual (must reimplement)
Customization surface	Limited by what dbt-spark exposes	Full control

Features¶

Materializations¶

Materialization	dbt-fabric-samdebruyn	microsoft/dbt-fabricspark
Table	Yes (via dbt-spark)	Yes (custom implementation)
View	Not yet (Spark SQL views are not yet supported in schema-enabled Lakehouses)	Yes
Incremental	append, merge, insert_overwrite, microbatch	append, merge, insert_overwrite, microbatch
Snapshot	Yes	Yes
Ephemeral	Yes	Yes
Materialized View / Lake View	Yes (standard dbt MV pattern)	Yes (Fabric-specific MLV with REST API refresh)
Clone	Yes	Yes
Seed	Yes (via dbt-spark)	Yes (custom implementation)

Notable differences:

View: The upstream supports Spark SQL views, but these are not yet supported in schema-enabled Lakehouses. Schema-enabled Lakehouses are the default when creating new Lakehouses. Microsoft has announced that Spark SQL view support is coming. This adapter will add view support once it becomes available (#163).
Materialized Lake View: The upstream uses Fabric REST API for on-demand and scheduled refresh. This adapter uses standard CREATE OR REPLACE without REST API calls.

Authentication methods¶

Method	dbt-fabric-samdebruyn	microsoft/dbt-fabricspark
Azure CLI	Yes	Yes
Service Principal	Yes	Yes
Token Credential	Yes	Yes
Workload Identity	Yes (federated OIDC)	No
Static Access Token	Yes	Yes
Fabric Notebook	No	Yes

Livy session management¶

Feature	dbt-fabric-samdebruyn	microsoft/dbt-fabricspark
High-concurrency Livy	Yes (HC-only, instance-based lifecycle)	Yes (default on, `atexit` cleanup)
Session creation	`FabricApiClient` singleton	`LivySessionManager` with static globals
Session reuse	Deterministic session tag (HC)	Via `session_id_file` + `reuse_session` flag (singleton) / deterministic session tag (HC)
HC session cleanup	Connection manager `close()` path	`atexit` handler (fragile — see Code quality)
Polling interval	Fixed 3 seconds	Adaptive (configurable)
Session idle timeout	15 min default	30 min default, configurable
Local Livy mode	No	Yes (`livy_mode: local`)
Statement timeout	24 hours	12 hours (configurable)
Thread-safe token refresh	No	Yes (`_token_lock`)

Unique to this adapter¶

Feature	Description
Purview integration	Sync dbt metadata to Microsoft Purview
Python model support	Submit Python models to Livy
Workload identity auth	Federated OIDC for CI/CD
Shared T-SQL + Spark	One package, two adapters
Capability declarations	`SchemaMetadataByRelations`, `TableLastModifiedMetadata`
PEP 249 cursor	Proper type conversion for all Spark SQL types

Unique to upstream¶

Feature	Description
MLV REST API	On-demand refresh, scheduled refresh via Fabric API
OneLake shortcuts	`ShortcutClient` for shortcut CRUD
Fabric Notebook auth	Ambient auth inside notebooks
Local Livy mode	Connect to local Livy for development
Spark SQL views	`CREATE OR REPLACE VIEW` support (not yet available in schema-enabled Lakehouses)
Cross-workspace 4-part naming	Full read+write for `workspace.lakehouse.schema.table`
Credential validation	UUID format, HTTPS domain whitelist

Lakehouse schema support¶

Aspect	dbt-fabric-samdebruyn	microsoft/dbt-fabricspark
Schema detection	Via dbt-spark	Auto-detected via API, process-level cache
Schema-enabled naming	Always 3-part	Dynamic: 3-part or 2-part based on detection
Non-schema mode	Not explicitly handled	Full support with identifier prefixing

Test suite¶

Metric	dbt-fabric-samdebruyn	microsoft/dbt-fabricspark
Test files	60	50
Test classes	~183	~141
Unit/functional split	All integration	Unit (mock) + functional (real infra)
Schema mode toggle	No	Yes (`--schema-mode` CLI flag)
Session sharding	No	Yes (`--session-id-files` for xdist workers)
Fail-fast sentinel	No	Yes (cross-worker abort on first failure)
Session reuse assertion	No	Yes (verifies no extra sessions created)

This adapter covers that upstream does not: Purview tests, broader dbt-tests-adapter base class coverage (183 vs 141 classes).

Upstream covers that this adapter does not: Unit tests (mock-based), cross-workspace tests, MLV lifecycle tests, OneLake shortcut tests, dual schema-mode testing, fail-fast sentinel, session reuse verification.

dbt Core compatibility¶

Aspect	dbt-fabric-samdebruyn	microsoft/dbt-fabricspark
dbt-adapters	>=1.22.6, <2.0	>=1.7, <2.0
dbt-common	>=1.37.3, <2.0	>=1.10, <2.0
dbt-core (dev)	>=1.9.6, <1.13.0	>=1.8.0
dbt-spark	>=1.10.1 (optional)	Not used
Python	>=3.11, <3.14	>=3.10, <3.14
azure-identity	>=1.12.0	>=1.21.0

dbt best practices¶

Practice	dbt-fabric-samdebruyn	microsoft/dbt-fabricspark
Inherits official base	Yes (SparkAdapter + BaseFabricAdapter)	Partially (SQLAdapter only)
Capability declarations	Yes	No
`@available` methods	Yes (inherited)	Yes (MLV, schema detection)
Plugin dependencies	`dependencies=["spark"]`	None
Dispatch fallback	dbt-spark macros available	Must reimplement everything

Maturity¶

	dbt-fabric-samdebruyn	microsoft/dbt-fabricspark
Total commits	500+ since Jan 2025	329 total, ~278 since Jan 2025
Release tags	67+ (v1.4.0rc1 to v1.11.3b0)	8 (v1.7.0rc1 to v1.11.0)
Python	3.11-3.13	3.10-3.13
Documentation	Docs website + development guide	README + CONTRIBUTING.md
Code style	ruff, PEP 604, line-length 99	ruff, older typing style

Both repositories use the MIT License and the hatchling build system.

Code quality¶

A detailed review of the upstream's Python source code reveals several significant issues that affect reliability and maintainability.

Global mutable state¶

The upstream stores critical runtime state in module-level and class-level global variables:

Authentication token (livysession.py line 35): A single accessToken: AccessToken = None global shared by all threads. While a _token_lock protects the refresh path, other code reads accessToken.token after releasing the lock, creating a data race in multi-threaded dbt runs.
Livy session (livysession.py line 1327): LivySessionManager.livy_global_session is a class variable mutated from multiple threads. The lock only protects connect()/disconnect(), but is_new_session_required is set outside the lock at multiple call sites.
Connection managers (connections.py line 93): A class-level connection_managers = {} dict mutated at runtime, with no cleanup between test runs.
Relation state (relation.py lines 44-45): _schemas_enabled and _identifier_prefix are ClassVar attributes mutated at connection time, meaning all relation instances across all threads share the same value.

This adapter uses proper instance-based encapsulation: FabricTokenProvider (per-scope token caching), FabricApiClient (singleton with thread-safe session lock), and no module-level mutable state.

atexit handler for session cleanup¶

The upstream registers atexit handlers at module import time (in both singleton_livy.py and concurrent_livy.py) to delete Livy sessions and HC sessions on process exit. This is fragile: atexit handlers run in undefined order, logging/network may already be torn down, and merely importing the module registers the handler even if no session was created. The HC implementation adds a second atexit handler with a global _active_sessions set, compounding the global mutable state problem.

This adapter manages session lifecycle through dbt's normal connection manager close() path.

Exception swallowing¶

Both LivySession.__exit__ and LivyCursor.__exit__ return True (livysession.py lines 489-495, 855-859), which suppresses all exceptions — including database errors, timeouts, and KeyboardInterrupt — inside any with block using these objects.

Misleading security comment with actual regex bug¶

_getLivySQL() (livysession.py lines 980-988) contains alarming security comments ("repurcursions of code injection... arbritary Python code") about code that now just strips SQL block comments. The comment was left behind from a previous implementation. Additionally, re.sub(r"\s*/\*(.|\n)*?\*/\s*", "\n", sql, re.DOTALL) passes re.DOTALL (integer value 16) as the count parameter instead of as flags=re.DOTALL, meaning it limits replacements to 16 instead of enabling dotall mode.

Dead code and copy-paste artifacts¶

Thrift exception handling (connections.py lines 97-113): References thrift_resp.status.errorMessage, a pattern from Apache Thrift used by dbt-spark. This adapter uses Livy over HTTP, not Thrift — this code path is dead.
AWS logging (connections.py lines 39-46): Sets botocore and boto3 (AWS libraries) to DEBUG level at import time. These are leftovers from a Spark/Databricks ancestor.
Hardcoded 2028 timestamp (livysession.py lines 194-198): The int_tests auth path creates a token with expires_on = 1845972874 (a date in 2028), bypassing all token refresh logic.
Duplicated functions: _parse_retry_after is copied identically in both livysession.py and mlv_api.py, using the deprecated datetime.utcnow().
Dead parameter: get_headers() has a tokenPrint parameter that logs the full bearer token when True, but is never called with True.

Inconsistent style¶

The upstream mixes camelCase (tokenPrint, accessToken, _submitLivyCode, _getLivySQL) with snake_case throughout. Pre-3.9 typing aliases (Dict, List, Optional, Union) are used despite targeting Python 3.13.

Summary¶

This adapter deliberately targets schema-enabled Lakehouses, which is the default when creating new Lakehouses in the Fabric portal (schemas are enabled by default). This means some upstream features that only work without schemas (e.g., Spark SQL views) are not yet supported. Microsoft has announced that Spark SQL views are coming to schema-enabled Lakehouses, and this adapter will add support when they become available (#163).

This adapter takes a code-reuse approach (thin adapter on dbt-spark), while the upstream takes a self-contained approach (everything reimplemented). The fork's approach results in dramatically less code (749 LOC vs 4,387 LOC) with proper instance-based lifecycle management and no global mutable state.

The upstream has more Fabric-specific features (MLV REST API refresh, OneLake shortcuts, cross-workspace 4-part naming, local Livy mode), while this adapter offers broader dbt ecosystem integration (dbt-spark inheritance, Purview, capability declarations, shared T-SQL + Spark in one package) and significantly higher code quality.