All parameters should be prefixed with spark.cassandra.
Property Name | Default | Description |
---|---|---|
auth.conf.factory |
DefaultAuthConfFactory | Name of a Scala module or class implementing AuthConfFactory providing custom authentication configuration |
All parameters should be prefixed with spark.cassandra.
Property Name | Default | Description |
---|---|---|
connection.compression |
Compression to use (LZ4, SNAPPY or NONE) | |
connection.factory |
DefaultConnectionFactory | Name of a Scala module or class implementing CassandraConnectionFactory providing connections to the Cassandra cluster |
connection.host |
localhost | Contact point to connect to the Cassandra cluster |
connection.keep_alive_ms |
250 | Period of time to keep unused connections open |
connection.local_dc |
None | The local DC to connect to (other nodes will be ignored) |
connection.port |
9042 | Cassandra native connection port |
connection.reconnection_delay_ms.max |
60000 | Maximum period of time to wait before reconnecting to a dead node |
connection.reconnection_delay_ms.min |
1000 | Minimum period of time to wait before reconnecting to a dead node |
connection.timeout_ms |
5000 | Maximum period of time to attempt connecting to a node |
query.retry.count |
10 | Number of times to retry a timed-out query |
query.retry.delay |
4 * 1.5 | The delay between subsequent retries (can be constant, like 1000; linearly increasing, like 1000+100; or exponential, like 1000*2) |
read.timeout_ms |
120000 | Maximum period of time to wait for a read to return |
All parameters should be prefixed with spark.cassandra.
Property Name | Default | Description |
---|---|---|
table.size.in.bytes |
None | Used by DataFrames Internally, will be updated in a future release to retrieve size from C*. Can be set manually now |
All parameters should be prefixed with spark.cassandra.
Property Name | Default | Description |
---|---|---|
sql.cluster |
default | Sets the default Cluster to inherit configuration from |
sql.keyspace |
None | Sets the default keyspace |
All parameters should be prefixed with spark.cassandra.
Property Name | Default | Description |
---|---|---|
connection.ssl.enabled |
false | Enable secure connection to Cassandra cluster |
connection.ssl.enabledAlgorithms |
Set(TLS_RSA_WITH_AES_128_CBC_SHA, TLS_RSA_WITH_AES_256_CBC_SHA) | SSL cipher suites |
connection.ssl.protocol |
TLS | SSL protocol |
connection.ssl.trustStore.password |
None | Trust store password |
connection.ssl.trustStore.path |
None | Path for the trust store being used |
connection.ssl.trustStore.type |
JKS | Trust store type |
All parameters should be prefixed with spark.cassandra.
Property Name | Default | Description |
---|---|---|
input.consistency.level |
LOCAL_ONE | Consistency level to use when reading |
input.fetch.size_in_rows |
1000 | Number of CQL rows fetched per driver request |
input.metrics |
true | Sets whether to record connector specific metrics on write |
input.split.size_in_mb |
64 | Approx amount of data to be fetched into a Spark partition |
All parameters should be prefixed with spark.cassandra.
Property Name | Default | Description |
---|---|---|
output.batch.grouping.buffer.size |
1000 | How many batches per single Spark task can be stored in memory before sending to Cassandra |
output.batch.grouping.key |
Partition | Determines how insert statements are grouped into batches. Available values are
|
output.batch.size.bytes |
1024 | Maximum total size of the batch in bytes. Overridden by spark.cassandra.output.batch.size.rows |
output.batch.size.rows |
None | Number of rows per single batch. The default is 'auto' which means the connector will adjust the number of rows based on the amount of data in each row |
output.concurrent.writes |
5 | Maximum number of batches executed in parallel by a single Spark task |
output.consistency.level |
LOCAL_ONE | Consistency level for writing |
output.metrics |
true | Sets whether to record connector specific metrics on write |
output.throughput_mb_per_sec |
2.147483647E9 | *(Floating points allowed)* Maximum write throughput allowed per single core in MB/s. Limit this on long (+8 hour) runs to 70% of your max throughput as seen on a smaller job for stability |