pyspark.sql.DataFrame.drop

DataFrame.drop(*cols)[source]

Returns a new DataFrame that drops the specified column. This is a no-op if schema doesn’t contain the given column name(s).

Parameters

cols – a string name of the column to drop, or a Column to drop, or a list of string name of the columns to drop.

>>> df.drop('age').collect()
[Row(name='Alice'), Row(name='Bob')]
>>> df.drop(df.age).collect()
[Row(name='Alice'), Row(name='Bob')]
>>> df.join(df2, df.name == df2.name, 'inner').drop(df.name).collect()
[Row(age=5, height=85, name='Bob')]
>>> df.join(df2, df.name == df2.name, 'inner').drop(df2.name).collect()
[Row(age=5, name='Bob', height=85)]
>>> df.join(df2, 'name', 'inner').drop('age', 'height').collect()
[Row(name='Bob')]

New in version 1.4.