DataFrameWriter.
parquet
Saves the content of the DataFrame in Parquet format at the specified path.
DataFrame
path – the path in any Hadoop supported file system
mode –
specifies the behavior of the save operation when data already exists.
append: Append contents of this DataFrame to existing data.
append
overwrite: Overwrite existing data.
overwrite
ignore: Silently ignore this operation if data already exists.
ignore
error or errorifexists (default case): Throw an exception if data already exists.
error
errorifexists
partitionBy – names of partitioning columns
compression – compression codec to use when saving to file. This can be one of the known case-insensitive shorten names (none, uncompressed, snappy, gzip, lzo, brotli, lz4, and zstd). This will override spark.sql.parquet.compression.codec. If None is set, it uses the value specified in spark.sql.parquet.compression.codec.
spark.sql.parquet.compression.codec
>>> df.write.parquet(os.path.join(tempfile.mkdtemp(), 'data'))
New in version 1.4.