Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Interface used to write a streaming DataFrame to external storage systems (for example, file systems and key-value stores). Use df.writeStream to access this.
Syntax
# Access through DataFrame
df.writeStream
Methods
| Method | Description |
|---|---|
outputMode(outputMode) |
Specifies how data of a streaming DataFrame is written to the sink. Options are append, complete, and update. |
format(source) |
Specifies the output data source format. |
option(key, value) |
Adds an output option for the underlying data source. |
options(**options) |
Adds multiple output options for the underlying data source. |
partitionBy(*cols) |
Partitions the output by the given columns on the file system. |
clusterBy(*cols) |
Clusters the output by the given columns. |
queryName(queryName) |
Specifies the name of the streaming query. |
trigger(**kwargs) |
Sets the trigger for the streaming query execution. |
foreach(f) |
Sets the output of the streaming query to be processed by the given function or object. |
foreachBatch(func) |
Sets the output of each microbatch to be processed by the given function. |
start(path) |
Starts the execution of the streaming query and returns a StreamingQuery object. |
table(tableName) |
Alias for toTable(). Writes data to the specified table and returns a StreamingQuery object. |
toTable(tableName) |
Starts the execution of the streaming query, continually outputting results to the given table. |
Examples
Load a rate stream, apply a transformation, write to the console, and stop after 3 seconds.
import time
df = spark.readStream.format("rate").load()
df = df.selectExpr("value % 3 as v")
q = df.writeStream.format("console").start()
time.sleep(3)
q.stop()