pyspark.sql.functions.
array_except
Collection function: returns an array of the elements in col1 but not in col2, without duplicates.
col1 – name of column containing array
col2 – name of column containing array
>>> from pyspark.sql import Row >>> df = spark.createDataFrame([Row(c1=["b", "a", "c"], c2=["c", "d", "a", "f"])]) >>> df.select(array_except(df.c1, df.c2)).collect() [Row(array_except(c1, c2)=['b'])]
New in version 2.4.