pyspark.sql.DataFrameWriter.parquet¶

DataFrameWriter.parquet(path, mode=None, partitionBy=None, compression=None)[source]¶

Saves the content of the DataFrame in Parquet format at the specified path.

Parameters

path – the path in any Hadoop supported file system
mode –
specifies the behavior of the save operation when data already exists.
- append: Append contents of this DataFrame to existing data.
- overwrite: Overwrite existing data.
- ignore: Silently ignore this operation if data already exists.
- error or errorifexists (default case): Throw an exception if data already exists.
partitionBy – names of partitioning columns
compression – compression codec to use when saving to file. This can be one of the known case-insensitive shorten names (none, uncompressed, snappy, gzip, lzo, brotli, lz4, and zstd). This will override spark.sql.parquet.compression.codec. If None is set, it uses the value specified in spark.sql.parquet.compression.codec.

>>> df.write.parquet(os.path.join(tempfile.mkdtemp(), 'data'))

New in version 1.4.