pyspark.sql.DataFrameWriter.partitionBy¶

DataFrameWriter.partitionBy(*cols)[source]¶

Partitions the output by the given columns on the file system.

If specified, the output is laid out on the file system similar to Hive’s partitioning scheme.

>>> df.write.partitionBy('year', 'month').parquet(os.path.join(tempfile.mkdtemp(), 'data'))

New in version 1.4.