DataStreamReader

Interface used to load a streaming DataFrame from external storage systems (for example, file systems and key-value stores). Use spark.readStream to access this.

Syntax

# Access through SparkSession
spark.readStream

Methods

Method	Description
`format(source)`	Specifies the input data source format.
`schema(schema)`	Specifies the schema of the streaming DataFrame.
`option(key, value)`	Adds an input option for the underlying data source.
`options(**options)`	Adds multiple input options for the underlying data source.
`load(path)`	Loads the streaming DataFrame from the given path and returns it.
`json(path)`	Loads a JSON file stream and returns a DataFrame.
`orc(path)`	Loads an ORC file stream and returns a DataFrame.
`parquet(path)`	Loads a Parquet file stream and returns a DataFrame.
`text(path)`	Loads a text file stream and returns a DataFrame.
`csv(path)`	Loads a CSV file stream and returns a DataFrame.
`xml(path)`	Loads an XML file stream and returns a DataFrame.
`table(tableName)`	Loads a streaming Delta table and returns a DataFrame.
`name(source_name)`	Assigns a name to the streaming source for checkpoint evolution.
`changes(tableName)`	Returns row-level changes (Change Data Capture) from the specified table as a streaming DataFrame.

Examples

spark.readStream
# <...streaming.readwriter.DataStreamReader object ...>

Load a rate stream, apply a transformation, write to the console, and stop after 3 seconds.

import time
df = spark.readStream.format("rate").load()
df = df.selectExpr("value % 3 as v")
q = df.writeStream.format("console").start()
time.sleep(3)
q.stop()

Feedback

Was this page helpful?

Last updated on 2026-04-17