pyspark.sql.DataFrameReader.jdbc¶

DataFrameReader.jdbc(url, table, column=None, lowerBound=None, upperBound=None, numPartitions=None, predicates=None, properties=None)[source]¶

Construct a DataFrame representing the database table named table accessible via JDBC URL url and connection properties.

Partitions of the table will be retrieved in parallel if either column or predicates is specified. lowerBound`, ``upperBound and numPartitions is needed when column is specified.

If both column and predicates are specified, column will be used.

Note

Don’t create too many partitions in parallel on a large cluster; otherwise Spark might crash your external database systems.

Parameters

url – a JDBC URL of the form jdbc:subprotocol:subname
table – the name of the table
column – the name of a column of numeric, date, or timestamp type that will be used for partitioning; if this parameter is specified, then numPartitions, lowerBound (inclusive), and upperBound (exclusive) will form partition strides for generated WHERE clause expressions used to split the column column evenly
lowerBound – the minimum value of column used to decide partition stride
upperBound – the maximum value of column used to decide partition stride
numPartitions – the number of partitions
predicates – a list of expressions suitable for inclusion in WHERE clauses; each one defines one partition of the DataFrame
properties – a dictionary of JDBC database connection arguments. Normally at least properties “user” and “password” with their corresponding values. For example { ‘user’ : ‘SYSTEM’, ‘password’ : ‘mypassword’ }

Returns

a DataFrame

New in version 1.4.

pyspark.sql.DataFrameReader.format pyspark.sql.DataFrameReader.json