pyspark.sql.DataFrame.sortWithinPartitions

DataFrame.sortWithinPartitions(*cols, **kwargs)[source]

Returns a new DataFrame with each partition sorted by the specified column(s).

Parameters
  • cols – list of Column or column names to sort by.

  • ascending – boolean or list of boolean (default True). Sort ascending vs. descending. Specify list for multiple sort orders. If a list is specified, length of the list must equal length of the cols.

>>> df.sortWithinPartitions("age", ascending=False).show()
+---+-----+
|age| name|
+---+-----+
|  2|Alice|
|  5|  Bob|
+---+-----+

New in version 1.6.