pyspark.sql.DataFrameWriter.partitionBy

DataFrameWriter.partitionBy(*cols)[source]

Partitions the output by the given columns on the file system.

If specified, the output is laid out on the file system similar to Hive’s partitioning scheme.

Parameters

cols – name of columns

>>> df.write.partitionBy('year', 'month').parquet(os.path.join(tempfile.mkdtemp(), 'data'))

New in version 1.4.