Using Unity Catalog with Structured Streaming

This page shows how to use Structured Streaming with Unity Catalog to manage data governance for your incremental and streaming workloads on Azure Databricks.

What Structured Streaming functionality does Unity Catalog support?

Unity Catalog doesn't add any explicit limits for Structured Streaming sources and sinks available on Azure Databricks.

With Unity Catalog and Structured Streaming you can:

For Structured Streaming checkpoints, you must use paths in external locations managed by Unity Catalog. To learn more about securely connecting storage with Unity Catalog, see Connect to cloud object storage using Unity Catalog.

Read a Unity Catalog view as a stream

In Databricks Runtime 14.3 LTS and above, you can use Structured Streaming to read from views registered with Unity Catalog. The underlying tables must use the Delta Lake format. For other limitations, see Limitations.

To read a view with Structured Streaming, use the .table() method with the view's identifier:

df = (spark.readStream
  .table("demoView")
)

Users must have SELECT privileges on the target view.

If you modify the view definition to add or change the tables referenced in the view, you can't use the same streaming checkpoint.

Supported streaming options

The streaming reader applies options to the files and metadata of the underlying Delta tables for the specified view.

The following options are supported:

  • maxFilesPerTrigger
  • maxBytesPerTrigger
  • ignoreDeletes
  • skipChangeCommits
  • withEventTimeOrder
  • startingTimestamp
  • startingVersion

Reads on views with UNION ALL don't support the withEventTimeOrder and startingVersion options.

If you provide unsupported options, such as readChangeFeed, Spark raises this exception:

AnalysisException: [UNSUPPORTED_STREAMING_OPTIONS_FOR_VIEW.UNSUPPORTED_OPTION] Unsupported for streaming a view. Reason: option <option> is not supported.

Supported streaming operations

Supported operations include:

Operation Description Operator Example
Project Controls column-level permissions SELECT... FROM... CREATE VIEW project_view AS SELECT id, value FROM source_table
Filter Controls row-level permissions WHERE... CREATE VIEW filter_view AS SELECT * FROM source_table WHERE value > 100
Union all Results from multiple tables UNION ALL CREATE VIEW union_view AS SELECT id, value FROM source_table1 UNION ALL SELECT * FROM source_table2

Unsupported operations include aggregations, sorting, and table-valued functions such as table_changes(). For detail on table-valued functions, see Table-valued function (TVF) invocation.

If you stream from a view with an unsupported operation, Spark raises this exception:

UnsupportedOperationException: [UNEXPECTED_OPERATOR_IN_STREAMING_VIEW] Unexpected operator <operator> in the CREATE VIEW statement as a streaming source. A streaming view query must consist only of SELECT, WHERE, and UNION ALL operations.

Limitations