pyspark.sql.DataFrameWriter.orc¶

DataFrameWriter.orc(path, mode=None, partitionBy=None, compression=None)[source]¶

Saves the content of the DataFrame in ORC format at the specified path.

Parameters

path – the path in any Hadoop supported file system
mode –
specifies the behavior of the save operation when data already exists.
- append: Append contents of this DataFrame to existing data.
- overwrite: Overwrite existing data.
- ignore: Silently ignore this operation if data already exists.
- error or errorifexists (default case): Throw an exception if data already exists.
partitionBy – names of partitioning columns
compression – compression codec to use when saving to file. This can be one of the known case-insensitive shorten names (none, snappy, zlib, and lzo). This will override orc.compress and spark.sql.orc.compression.codec. If None is set, it uses the value specified in spark.sql.orc.compression.codec.

>>> orc_df = spark.read.orc('python/test_support/sql/orc_partitioned')
>>> orc_df.write.orc(os.path.join(tempfile.mkdtemp(), 'data'))

New in version 1.5.