Configuration Reference

Cassandra Authentication Parameters

All parameters should be prefixed with spark.cassandra.

Property Name	Default	Description
`auth.conf.factory`	DefaultAuthConfFactory	Name of a Scala module or class implementing AuthConfFactory providing custom authentication configuration

All parameters should be prefixed with spark.cassandra.

Property Name	Default	Description
`connection.compression`		Compression to use (LZ4, SNAPPY or NONE)
`connection.factory`	DefaultConnectionFactory	Name of a Scala module or class implementing CassandraConnectionFactory providing connections to the Cassandra cluster
`connection.host`	localhost	Contact point to connect to the Cassandra cluster
`connection.keep_alive_ms`	250	Period of time to keep unused connections open
`connection.local_dc`	None	The local DC to connect to (other nodes will be ignored)
`connection.port`	9042	Cassandra native connection port
`connection.reconnection_delay_ms.max`	60000	Maximum period of time to wait before reconnecting to a dead node
`connection.reconnection_delay_ms.min`	1000	Minimum period of time to wait before reconnecting to a dead node
`connection.timeout_ms`	5000	Maximum period of time to attempt connecting to a node
`query.retry.count`	10	Number of times to retry a timed-out query
`query.retry.delay`	4 * 1.5	The delay between subsequent retries (can be constant, like 1000; linearly increasing, like 1000+100; or exponential, like 1000*2)
`read.timeout_ms`	120000	Maximum period of time to wait for a read to return

All parameters should be prefixed with spark.cassandra.

Property Name	Default	Description
`table.size.in.bytes`	None	Used by DataFrames Internally, will be updated in a future release to retrieve size from C*. Can be set manually now

All parameters should be prefixed with spark.cassandra.

Property Name	Default	Description
`sql.cluster`	default	Sets the default Cluster to inherit configuration from
`sql.keyspace`	None	Sets the default keyspace

All parameters should be prefixed with spark.cassandra.

Property Name	Default	Description
`connection.ssl.enabled`	false	Enable secure connection to Cassandra cluster
`connection.ssl.enabledAlgorithms`	Set(TLS_RSA_WITH_AES_128_CBC_SHA, TLS_RSA_WITH_AES_256_CBC_SHA)	SSL cipher suites
`connection.ssl.protocol`	TLS	SSL protocol
`connection.ssl.trustStore.password`	None	Trust store password
`connection.ssl.trustStore.path`	None	Path for the trust store being used
`connection.ssl.trustStore.type`	JKS	Trust store type

All parameters should be prefixed with spark.cassandra.

Property Name	Default	Description
`input.consistency.level`	LOCAL_ONE	Consistency level to use when reading
`input.fetch.size_in_rows`	1000	Number of CQL rows fetched per driver request
`input.metrics`	true	Sets whether to record connector specific metrics on write
`input.split.size_in_mb`	64	Approx amount of data to be fetched into a Spark partition

All parameters should be prefixed with spark.cassandra.

Property Name	Default	Description
`output.batch.grouping.buffer.size`	1000	How many batches per single Spark task can be stored in memory before sending to Cassandra
`output.batch.grouping.key`	Partition	Determines how insert statements are grouped into batches. Available values are `none` : a batch may contain any statements `replica_set` : a batch may contain only statements to be written to the same replica set `partition` : a batch may contain only statements for rows sharing the same partition key value
`output.batch.size.bytes`	1024	Maximum total size of the batch in bytes. Overridden by spark.cassandra.output.batch.size.rows
`output.batch.size.rows`	None	Number of rows per single batch. The default is 'auto' which means the connector will adjust the number of rows based on the amount of data in each row
`output.concurrent.writes`	5	Maximum number of batches executed in parallel by a single Spark task
`output.consistency.level`	LOCAL_ONE	Consistency level for writing
`output.metrics`	true	Sets whether to record connector specific metrics on write
`output.throughput_mb_per_sec`	2.147483647E9	(Floating points allowed) Maximum write throughput allowed per single core in MB/s. Limit this on long (+8 hour) runs to 70% of your max throughput as seen on a smaller job for stability