pyspark.sql.DataFrameWriter.parquet

DataFrameWriter.parquet(path, mode=None, partitionBy=None, compression=None)[source]

Saves the content of the DataFrame in Parquet format at the specified path.

Parameters
  • path – the path in any Hadoop supported file system

  • mode

    specifies the behavior of the save operation when data already exists.

    • append: Append contents of this DataFrame to existing data.

    • overwrite: Overwrite existing data.

    • ignore: Silently ignore this operation if data already exists.

    • error or errorifexists (default case): Throw an exception if data already exists.

  • partitionBy – names of partitioning columns

  • compression – compression codec to use when saving to file. This can be one of the known case-insensitive shorten names (none, uncompressed, snappy, gzip, lzo, brotli, lz4, and zstd). This will override spark.sql.parquet.compression.codec. If None is set, it uses the value specified in spark.sql.parquet.compression.codec.

>>> df.write.parquet(os.path.join(tempfile.mkdtemp(), 'data'))

New in version 1.4.