pyspark.sql.DataFrame.explain

DataFrame.explain(extended=None, mode=None)[source]

Prints the (logical and physical) plans to the console for debugging purpose.

Parameters
  • extended – boolean, default False. If False, prints only the physical plan.

  • mode

    specifies the expected output format of plans.

    • simple: Print only a physical plan.

    • extended: Print both logical and physical plans.

    • codegen: Print a physical plan and generated codes if they are available.

    • cost: Print a logical plan and statistics if they are available.

    • formatted: Split explain output into two sections: a physical plan outline and node details.

>>> df.explain()
== Physical Plan ==
*(1) Scan ExistingRDD[age#0,name#1]
>>> df.explain(True)
== Parsed Logical Plan ==
...
== Analyzed Logical Plan ==
...
== Optimized Logical Plan ==
...
== Physical Plan ==
...
>>> df.explain(mode="formatted")
== Physical Plan ==
* Scan ExistingRDD (1)
(1) Scan ExistingRDD [codegen id : 1]
Output [2]: [age#0, name#1]
...

Changed in version 3.0.0: Added optional argument mode to specify the expected output format of plans.

New in version 1.3.