pyspark.sql.
Row
A row in DataFrame. The fields in it can be accessed:
DataFrame
like attributes (row.key)
row.key
like dictionary values (row[key])
row[key]
key in row will search through row keys.
key in row
Row can be used to create a row object by using named arguments. It is not allowed to omit a named argument to represent that the value is None or missing. This should be explicitly set to None in this case.
NOTE: As of Spark 3.0.0, Rows created from named arguments no longer have field names sorted alphabetically and will be ordered in the position as entered. To enable sorting for Rows compatible with Spark 2.x, set the environment variable “PYSPARK_ROW_FIELD_SORTING_ENABLED” to “true”. This option is deprecated and will be removed in future versions of Spark. For Python versions < 3.6, the order of named arguments is not guaranteed to be the same as entered, see https://www.python.org/dev/peps/pep-0468. In this case, a warning will be issued and the Row will fallback to sort the field names automatically.
NOTE: Examples with Row in pydocs are run with the environment variable “PYSPARK_ROW_FIELD_SORTING_ENABLED” set to “true” which results in output where fields are sorted.
>>> row = Row(name="Alice", age=11) >>> row Row(age=11, name='Alice') >>> row['name'], row['age'] ('Alice', 11) >>> row.name, row.age ('Alice', 11) >>> 'name' in row True >>> 'wrong_key' in row False
Row also can be used to create another Row like class, then it could be used to create Row objects, such as
>>> Person = Row("name", "age") >>> Person <Row('name', 'age')> >>> 'name' in Person True >>> 'wrong_key' in Person False >>> Person("Alice", 11) Row(name='Alice', age=11)
This form can also be used to create rows as tuple values, i.e. with unnamed fields. Beware that such Row objects have different equality semantics:
>>> row1 = Row("Alice", 11) >>> row2 = Row(name="Alice", age=11) >>> row1 == row2 False >>> row3 = Row(a="Alice", b=11) >>> row1 == row3 True
__init__
Initialize self. See help(type(self)) for accurate signature.
Methods
Initialize self.
asDict([recursive])
asDict
Return as a dict
count
Return number of occurrences of value.
index
Return first index of value.