pyspark.sql.functions.spark_partition_id

pyspark.sql.functions.spark_partition_id()[source]

A column for partition ID.

Note

This is indeterministic because it depends on data partitioning and task scheduling.

>>> df.repartition(1).select(spark_partition_id().alias("pid")).collect()
[Row(pid=0), Row(pid=0)]

New in version 1.6.