Welcome to Spark Python API Docs!

Contents:

pyspark package

Subpackages

pyspark.sql module

Module Context
pyspark.sql.types module
pyspark.sql.functions module

pyspark.streaming module

Module contents
pyspark.streaming.kafka module

pyspark.ml package

ML Pipeline APIs
pyspark.ml.param module
pyspark.ml.feature module
pyspark.ml.classification module
pyspark.ml.clustering module
pyspark.ml.recommendation module
pyspark.ml.regression module
pyspark.ml.tuning module
pyspark.ml.evaluation module

pyspark.mllib package

pyspark.mllib.classification module
pyspark.mllib.clustering module
pyspark.mllib.evaluation module
pyspark.mllib.feature module
pyspark.mllib.fpm module
pyspark.mllib.linalg module
pyspark.mllib.linalg.distributed module
pyspark.mllib.random module
pyspark.mllib.recommendation module
pyspark.mllib.regression module
pyspark.mllib.stat module
pyspark.mllib.tree module
pyspark.mllib.util module

Contents

Core classes:

pyspark.SparkContext

Main entry point for Spark functionality.

pyspark.RDD

A Resilient Distributed Dataset (RDD), the basic abstraction in Spark.

pyspark.sql.SQLContext

Main entry point for DataFrame and SQL functionality.

pyspark.sql.DataFrame

A distributed collection of data grouped into named columns.

Indices and tables