Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Sets the output of the streaming query to be processed using the provided function. Supported only in micro-batch execution mode (that is, when the trigger is not continuous). In every micro-batch, the provided function is called with the output rows as a DataFrame and the batch identifier. The batch ID can be used to deduplicate and transactionally write the output to external systems.
Syntax
foreachBatch(func)
Parameters
| Parameter | Type | Description |
|---|---|---|
func |
callable | A function that takes a DataFrame and a batch ID (int) as input. |
Returns
DataStreamWriter
Notes
In Spark Connect mode, the provided function does not have access to variables defined outside of it.
Examples
import time
df = spark.readStream.format("rate").load()
def func(batch_df, batch_id):
batch_df.collect()
q = df.writeStream.foreachBatch(func).start()
time.sleep(3)
q.stop()