Welcome to Spark Python API Docs!¶
Contents:
pyspark package¶
Subpackages¶
pyspark.ml package¶
ML Pipeline APIs¶
pyspark.ml.param module¶
pyspark.ml.feature module¶
pyspark.ml.classification module¶
pyspark.ml.clustering module¶
pyspark.ml.recommendation module¶
pyspark.ml.regression module¶
pyspark.ml.tuning module¶
pyspark.ml.evaluation module¶
pyspark.mllib package¶
pyspark.mllib.classification module¶
pyspark.mllib.clustering module¶
pyspark.mllib.evaluation module¶
pyspark.mllib.feature module¶
pyspark.mllib.fpm module¶
pyspark.mllib.linalg module¶
pyspark.mllib.linalg.distributed module¶
pyspark.mllib.random module¶
pyspark.mllib.recommendation module¶
pyspark.mllib.regression module¶
pyspark.mllib.stat module¶
pyspark.mllib.tree module¶
pyspark.mllib.util module¶
Contents¶
Core classes:¶
pyspark.SparkContext
Main entry point for Spark functionality.
pyspark.RDD
A Resilient Distributed Dataset (RDD), the basic abstraction in Spark.
pyspark.sql.SQLContext
Main entry point for DataFrame and SQL functionality.
pyspark.sql.DataFrame
A distributed collection of data grouped into named columns.