DataFrameWriter.
bucketBy
Buckets the output by the given columns.If specified, the output is laid out on the file system similar to Hive’s bucketing scheme.
numBuckets – the number of buckets to save
col – a name of a column, or a list of names.
cols – additional names (optional). If col is a list it should be empty.
Note
Applicable for file-based data sources in combination with DataFrameWriter.saveAsTable().
DataFrameWriter.saveAsTable()
>>> (df.write.format('parquet') ... .bucketBy(100, 'year', 'month') ... .mode("overwrite") ... .saveAsTable('bucketed_table'))
New in version 2.3.