pyspark.sql.functions.zip_with

pyspark.sql.functions.zip_with(col1, col2, f)[source]

Merge two given arrays, element-wise, into a single array using a function. If one array is shorter, nulls are appended at the end to match the length of the longer array, before applying the function.

Parameters
  • col1 – name of the first column or expression

  • col2 – name of the second column or expression

  • f – a binary function (x1: Column, x2: Column) -> Column... Can use methods of pyspark.sql.Column, functions defined in pyspark.sql.functions and Scala UserDefinedFunctions. Python UserDefinedFunctions are not supported (SPARK-27052).

Returns

a pyspark.sql.Column

>>> df = spark.createDataFrame([(1, [1, 3, 5, 8], [0, 2, 4, 6])], ("id", "xs", "ys"))
>>> df.select(zip_with("xs", "ys", lambda x, y: x ** y).alias("powers")).show(truncate=False)
+---------------------------+
|powers                     |
+---------------------------+
|[1.0, 9.0, 625.0, 262144.0]|
+---------------------------+
>>> df = spark.createDataFrame([(1, ["foo", "bar"], [1, 2, 3])], ("id", "xs", "ys"))
>>> df.select(zip_with("xs", "ys", lambda x, y: concat_ws("_", x, y)).alias("xs_ys")).show()
+-----------------+
|            xs_ys|
+-----------------+
|[foo_1, bar_2, 3]|
+-----------------+

New in version 3.1.