pyspark.sql.DataFrame.checkpoint

DataFrame.checkpoint(eager=True)[source]

Returns a checkpointed version of this Dataset. Checkpointing can be used to truncate the logical plan of this DataFrame, which is especially useful in iterative algorithms where the plan may grow exponentially. It will be saved to files inside the checkpoint directory set with SparkContext.setCheckpointDir().

Parameters

eager – Whether to checkpoint this DataFrame immediately

Note

Experimental

New in version 2.1.