pyspark.sql.DataFrameWriter.orc

DataFrameWriter.orc(path, mode=None, partitionBy=None, compression=None)[source]

Saves the content of the DataFrame in ORC format at the specified path.

Parameters
  • path – the path in any Hadoop supported file system

  • mode

    specifies the behavior of the save operation when data already exists.

    • append: Append contents of this DataFrame to existing data.

    • overwrite: Overwrite existing data.

    • ignore: Silently ignore this operation if data already exists.

    • error or errorifexists (default case): Throw an exception if data already exists.

  • partitionBy – names of partitioning columns

  • compression – compression codec to use when saving to file. This can be one of the known case-insensitive shorten names (none, snappy, zlib, and lzo). This will override orc.compress and spark.sql.orc.compression.codec. If None is set, it uses the value specified in spark.sql.orc.compression.codec.

>>> orc_df = spark.read.orc('python/test_support/sql/orc_partitioned')
>>> orc_df.write.orc(os.path.join(tempfile.mkdtemp(), 'data'))

New in version 1.5.