DataStreamWriter.
start
Streams the contents of the DataFrame to a data source.
DataFrame
The data source is specified by the format and a set of options. If format is not specified, the default data source configured by spark.sql.sources.default will be used.
format
options
spark.sql.sources.default
Note
Evolving.
path – the path in a Hadoop supported file system
format – the format used to save
outputMode –
streaming sink.
append: Only the new rows in the streaming DataFrame/Dataset will be written to the sink
complete: All the rows in the streaming DataFrame/Dataset will be written to the sink every time these is some updates
update: only the rows that were updated in the streaming DataFrame/Dataset will be written to the sink every time there are some updates. If the query doesn’t contain aggregations, it will be equivalent to append mode.
partitionBy – names of partitioning columns
queryName – unique name for the query
options – All other string options. You may want to provide a checkpointLocation for most streams, however it is not required for a memory stream.
>>> sq = sdf.writeStream.format('memory').queryName('this_query').start() >>> sq.isActive True >>> sq.name 'this_query' >>> sq.stop() >>> sq.isActive False >>> sq = sdf.writeStream.trigger(processingTime='5 seconds').start( ... queryName='that_query', outputMode="append", format='memory') >>> sq.name 'that_query' >>> sq.isActive True >>> sq.stop()
New in version 2.0.