pyspark.sql.DataFrameWriter.json

DataFrameWriter.json(path, mode=None, compression=None, dateFormat=None, timestampFormat=None, lineSep=None, encoding=None, ignoreNullFields=None)[source]

Saves the content of the DataFrame in JSON format (JSON Lines text format or newline-delimited JSON) at the specified path.

Parameters
  • path – the path in any Hadoop supported file system

  • mode

    specifies the behavior of the save operation when data already exists.

    • append: Append contents of this DataFrame to existing data.

    • overwrite: Overwrite existing data.

    • ignore: Silently ignore this operation if data already exists.

    • error or errorifexists (default case): Throw an exception if data already exists.

  • compression – compression codec to use when saving to file. This can be one of the known case-insensitive shorten names (none, bzip2, gzip, lz4, snappy and deflate).

  • dateFormat – sets the string that indicates a date format. Custom date formats follow the formats at `datetime pattern`_. This applies to date type. If None is set, it uses the default value, yyyy-MM-dd.

  • timestampFormat – sets the string that indicates a timestamp format. Custom date formats follow the formats at `datetime pattern`_. This applies to timestamp type. If None is set, it uses the default value, yyyy-MM-dd'T'HH:mm:ss[.SSS][XXX].

  • encoding – specifies encoding (charset) of saved json files. If None is set, the default UTF-8 charset will be used.

  • lineSep – defines the line separator that should be used for writing. If None is set, it uses the default value, \n.

  • ignoreNullFields – Whether to ignore null fields when generating JSON objects. If None is set, it uses the default value, true.

>>> df.write.json(os.path.join(tempfile.mkdtemp(), 'data'))

New in version 1.4.