pyspark.sql.DataFrameWriter.json¶

DataFrameWriter.json(path, mode=None, compression=None, dateFormat=None, timestampFormat=None, lineSep=None, encoding=None, ignoreNullFields=None)[source]¶

Saves the content of the DataFrame in JSON format (JSON Lines text format or newline-delimited JSON) at the specified path.

Parameters

path – the path in any Hadoop supported file system
mode –
specifies the behavior of the save operation when data already exists.
- append: Append contents of this DataFrame to existing data.
- overwrite: Overwrite existing data.
- ignore: Silently ignore this operation if data already exists.
- error or errorifexists (default case): Throw an exception if data already exists.
compression – compression codec to use when saving to file. This can be one of the known case-insensitive shorten names (none, bzip2, gzip, lz4, snappy and deflate).
dateFormat – sets the string that indicates a date format. Custom date formats follow the formats at `datetime pattern`_. This applies to date type. If None is set, it uses the default value, yyyy-MM-dd.
timestampFormat – sets the string that indicates a timestamp format. Custom date formats follow the formats at `datetime pattern`_. This applies to timestamp type. If None is set, it uses the default value, yyyy-MM-dd'T'HH:mm:ss[.SSS][XXX].
encoding – specifies encoding (charset) of saved json files. If None is set, the default UTF-8 charset will be used.
lineSep – defines the line separator that should be used for writing. If None is set, it uses the default value, \n.
ignoreNullFields – Whether to ignore null fields when generating JSON objects. If None is set, it uses the default value, true.

>>> df.write.json(os.path.join(tempfile.mkdtemp(), 'data'))

New in version 1.4.

pyspark.sql.DataFrameWriter.jdbc pyspark.sql.DataFrameWriter.mode