Share via


DataStreamReader

Interface used to load a streaming DataFrame from external storage systems (for example, file systems and key-value stores). Use spark.readStream to access this.

Syntax

# Access through SparkSession
spark.readStream

Methods

Method Description
format(source) Specifies the input data source format.
schema(schema) Specifies the schema of the streaming DataFrame.
option(key, value) Adds an input option for the underlying data source.
options(**options) Adds multiple input options for the underlying data source.
load(path) Loads the streaming DataFrame from the given path and returns it.
json(path) Loads a JSON file stream and returns a DataFrame.
orc(path) Loads an ORC file stream and returns a DataFrame.
parquet(path) Loads a Parquet file stream and returns a DataFrame.
text(path) Loads a text file stream and returns a DataFrame.
csv(path) Loads a CSV file stream and returns a DataFrame.
xml(path) Loads an XML file stream and returns a DataFrame.
table(tableName) Loads a streaming Delta table and returns a DataFrame.
name(source_name) Assigns a name to the streaming source for checkpoint evolution.
changes(tableName) Returns row-level changes (Change Data Capture) from the specified table as a streaming DataFrame.

Examples

spark.readStream
# <...streaming.readwriter.DataStreamReader object ...>

Load a rate stream, apply a transformation, write to the console, and stop after 3 seconds.

import time
df = spark.readStream.format("rate").load()
df = df.selectExpr("value % 3 as v")
q = df.writeStream.format("console").start()
time.sleep(3)
q.stop()