pyspark.sql.DataFrame.sameSemantics

DataFrame.sameSemantics(other)[source]

Returns True when the logical query plans inside both DataFrames are equal and therefore return same results.

Note

The equality comparison here is simplified by tolerating the cosmetic differences such as attribute names.

Note

This API can compare both DataFrames very fast but can still return False on the DataFrame that return the same results, for instance, from different plans. Such false negative semantic can be useful when caching as an example.

Note

DeveloperApi

>>> df1 = spark.range(10)
>>> df2 = spark.range(10)
>>> df1.withColumn("col1", df1.id * 2).sameSemantics(df2.withColumn("col1", df2.id * 2))
True
>>> df1.withColumn("col1", df1.id * 2).sameSemantics(df2.withColumn("col1", df2.id + 2))
False
>>> df1.withColumn("col1", df1.id * 2).sameSemantics(df2.withColumn("col0", df2.id * 2))
True

New in version 3.1.