Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Interface used to load a streaming DataFrame from external storage systems (for example, file systems and key-value stores). Use spark.readStream to access this.
Syntax
# Access through SparkSession
spark.readStream
Methods
| Method | Description |
|---|---|
format(source) |
Specifies the input data source format. |
schema(schema) |
Specifies the schema of the streaming DataFrame. |
option(key, value) |
Adds an input option for the underlying data source. |
options(**options) |
Adds multiple input options for the underlying data source. |
load(path) |
Loads the streaming DataFrame from the given path and returns it. |
json(path) |
Loads a JSON file stream and returns a DataFrame. |
orc(path) |
Loads an ORC file stream and returns a DataFrame. |
parquet(path) |
Loads a Parquet file stream and returns a DataFrame. |
text(path) |
Loads a text file stream and returns a DataFrame. |
csv(path) |
Loads a CSV file stream and returns a DataFrame. |
xml(path) |
Loads an XML file stream and returns a DataFrame. |
table(tableName) |
Loads a streaming Delta table and returns a DataFrame. |
name(source_name) |
Assigns a name to the streaming source for checkpoint evolution. |
changes(tableName) |
Returns row-level changes (Change Data Capture) from the specified table as a streaming DataFrame. |
Examples
spark.readStream
# <...streaming.readwriter.DataStreamReader object ...>
Load a rate stream, apply a transformation, write to the console, and stop after 3 seconds.
import time
df = spark.readStream.format("rate").load()
df = df.selectExpr("value % 3 as v")
q = df.writeStream.format("console").start()
time.sleep(3)
q.stop()