pyspark.sql.DataFrame.dropna

DataFrame.dropna(how='any', thresh=None, subset=None)[source]

Returns a new DataFrame omitting rows with null values. DataFrame.dropna() and DataFrameNaFunctions.drop() are aliases of each other.

Parameters
  • how – ‘any’ or ‘all’. If ‘any’, drop a row if it contains any nulls. If ‘all’, drop a row only if all its values are null.

  • thresh – int, default None If specified, drop rows that have less than thresh non-null values. This overwrites the how parameter.

  • subset – optional list of column names to consider.

>>> df4.na.drop().show()
+---+------+-----+
|age|height| name|
+---+------+-----+
| 10|    80|Alice|
+---+------+-----+

New in version 1.3.1.