Releases: ClickHouse/spark-clickhouse-connector
v0.8.1
v0.8.0
Breaking Changes
- gRPC support is removed, now HTTP is the only option
- project groupId is renamed from
com.github.housepower
tocom.clickhouse.spark
- class
xenon.clickhouse.ClickHouseCatalog
is renamed tocom.clickhouse.spark.ClickHouseCatalog
What's Changed
- Support Spark 3.5
- Upgrade to Java client version 0.6.3
- Tested against cloud
- Added dedicated user agent
v0.7.3
v0.7.2
Change Logs
- Build: Upgrade clickhouse-java 0.4.6 (#235)
- Test: Upload ClickHouse server logs when CI failed (#249)
- Spark: ClickHouse FixedString map to Spark BinaryType (#251)
- Spark: Fix ArrayIndexOutOfBoundsException when all columns are pruned and agg pushdown does not take effects (#256)
- Core: Support parse Date64 which contains nano seconds (#258)
v0.7.1
Change Logs
- Spark: Fix Decimal precision in JSON mode on reading
v0.7.0
Notable Changes
This release supports Spark 3.3 and 3.4, and is compatible w/ clickhouse-jdbc:0.4.5.
Since this version, gRPC is deprecated, and may be removed in the future.
Change Logs
- Core: Bump clickhouse-java 0.4.5 (#211)
- Core: Deprecate gRPC protocol (#233)
- Spark: Initial support Spark 3.4 (#228)
- Spark: Polish configuration's doc
- Spark: Fix custom options (#231)
- Spark 3.3: Remove ConfigurationSuite
- Docs: Mention 0.6.0 important changes
- Docs: Add Compatible Matrix
- Docs: Fix link for configuration page
- Docs: Correct the Spark version for integration tests
- Test: Test against ClickHouse 23.3 (#232)
- Playground: Update Kyuubi 1.7.0
v0.6.1
v0.6.0
Notable Changes
This release only supports Spark 3.3, and is compatible w/ clickhouse-jdbc:0.3.2-patch11.
The default protocol is changed to HTTP, as suggested by ClickHouse/clickhouse-java#1252 (comment)
gRPC is experimental and problematic, I should probably drop it someday to avoid confusion.
Change Logs
- Core: Respect ClickHouse ORDER BY Clause default behavior
- Spark: Change default protocol to HTTP (#190)
- Spark: Fix Decimal reading in JSON format (#220)
- Spark: Support
Date
type as partition column indropPartition
(#218) - Spark: Support
tcp_port
in catalog option (#223) - Spark: Fix timestamp value transformation (#216)
- Spark: Use clickhouse java client to parse schema (#215)
- Spark: Allow setting arbitrary options for ClickHouseClient (#203)
- Spark: Support reading Bool type (#207)
- Spark: Rename and reorganize functions (#198)
- Spark: Simplify spark.clickhouse.write.format values
- Spark: Support RowBinary format in reading (#195)
- Spark: Support read metrics (#191)
- Spark: Test parse LowCardinality column definition (#217)
- Playground: Switch minio image back to bitnami/minio
- Playground: Restructure directories and upgrade components (#212)
- Playground: Remove python
- Playground: Fix S3 magic committer confs
- Playground: Use eclipse-temurin:8-focal as base image (#188)
- Docs: Syntax improvements
- Docs: Remove incubating from Kyuubi reference (#209)
- Docs: Bump mkdocs-material 9.0.9
- Docs: Remove unused var spark_version
- Docs: Auto generate configuration docs
- Docs: Fix documentation --jars/--packages usage (#186)
- Docs: polish sentence
- Docs: Supply demo for native SQL execution
- Docs: Use docker compose V2 command
- Docs: Update Rebalance image
- Docs: Improve sentence
- Docs: Enrich Catalog Management
- Infra: Enable spotless (#208)
- Infra: Upgrade CI runner image and actions (#214)
- Build: Polish gradle scripts
- Build: Bump Spark 3.3.2 (#219)
- Build: Bump Gradle 7.6 (#213)
- Build: Bump testcontainers-scala 0.40.12
- Build: Bump gradle rat plugin 0.8.0
- Build: Bump gradle scoverage plugin 7.0.1 (#193)
- Build: Remove unused snapshot repo
- Build: Remove Spark 3.2 support (#189)
- Build: Bump Jackson 2.13.4 (#192)
- Build: Rename SonarQube workflow
- Build: Testing w/ multi clickhouse versions (#183)
- Test: Allow testing w/ non-grpc versions (#182)
- Test: Correct configuring log4j2
v0.5.0
Notable Changes
As of 0.5.0, this connector switches from ClickHouse raw gRPC Client to ClickHouse Official Java Client, which brings HTTP protocol support, extending the range of supported versions of ClickHouse Server. In the meanwhile, the gzip
, zstd
write compression support has been removed, and currently supported codecs are none
, lz4
(default).
If you upgrade from the previous versions, ONE of the following jars should be used w/ spark-clickhouse-connector-3.3_2.12-0.5.0.jar instead of a single clickhouse-spark-runtime-3.3_2.12-0.4.0.jar.
- clickhouse-jdbc-0.3.2-patch11-all.jar
- clickhouse-jdbc-0.3.2-patch11-grpc.jar
- clickhouse-jdbc-0.3.2-patch11-http.jar
- clickhouse-grpc-client-0.3.2-patch11-okhttp.jar
- clickhouse-grpc-client-0.3.2-patch11-netty.jar
- clickhouse-grpc-client-0.3.2-patch11-shaded.jar
- clickhouse-http-client-0.3.2-patch11-shaded.jar
If you want to connect ClickHouse through gRPC, using
$SPARK_HOME/bin/spark-shell \
--conf spark.sql.catalog.clickhouse=xenon.clickhouse.ClickHouseCatalog \
--conf spark.sql.catalog.clickhouse.host=<clickhouse-host> \
--conf spark.sql.catalog.clickhouse.protocol=grpc \
--conf spark.sql.catalog.clickhouse.grpc_port=<clickhouse-grpc-port> \
--conf spark.sql.catalog.clickhouse.user=<username> \
--conf spark.sql.catalog.clickhouse.password=<password> \
--conf spark.sql.catalog.clickhouse.database=<default-database> \
--jars /path/clickhouse-spark-runtime-3.3_2.12:0.5.0.jar,/path/clickhouse-jdbc-0.3.2-patch11-all.jar
and if you prefer to use http, just change
--conf spark.sql.catalog.clickhouse.protocol=grpc \
--conf spark.sql.catalog.clickhouse.grpc_port=<clickhouse-grpc-port> \
to
--conf spark.sql.catalog.clickhouse.protocol=http \
--conf spark.sql.catalog.clickhouse.http_port=<clickhouse-http-port> \
Change Logs
- Core: Deserializer consumes InputStream instead of ByteString (#162)
- Core: Throw ClickHouseException instead of gRPC Exception (#163)
- Core: Rename CHException (#164)
- Core: Use ClickHouse Java client
- Core: Remove gRPC
- Core: Support compression on reading
- Core: Simplify deserializeStream (#173)
- Core: CHException should propagate root cause (#181)
- Spark: Use ClickHouse Java client
- Spark: Support compression on reading
- Spark: Add column comment when create clickhouse table (#176)
- Spark: Fix reading decimal values (#180)
- Spark 3.3: Remove zstd support in writing (#166)
- Spark 3.3: Support write metrics (#169)
- Docs: Remove gzip compression
- Docs: Mention HTTP support
- Docs: Upgrade mkdocs-material
- Docs: Replace versions w/ variables
- Docs: Add spark.clickhouse.read.compression.codec
- Docs: Enrich internal docs
- Build: Bump clickhouse-jdbc 0.3.2-patch11
- Build: Bump gradle rat plugin 0.7.1
- Build: Remove scala-xml version restriction (#175)
- Build: Algin Jackson version w/ Spark (#177)
- Build: Bump Gradle 7.5.1
- Playground: Switch to ClickHouse Java client
- Playground: Fix dev setup
- Test: Remove unused SparkClickHouseSingleTestHelper
- Test: Bump testcontainers-scala 0.40.10 (#168)
v0.4.0
Notable Changes
- Core: Fix DistributedEngineSpec#is_distributed
- Core: Support parse ColumnExprPrecedence
- Core: Replace Using by tryWithResource
- Spark: Support ignore unsupported transform
- Spark: Support constructing InputPartition by virtual col _partition_id
- Spark: Improve writer's memory usage efficiency
- Spark: Improve writer's log format
- Spark: Reorganize test suites
- Spark 3.2: Bump Spark 3.2.2 (#158)
- Spark 3.2: Support GZIP, LZ4 in write
- Spark 3.3: Support GZIP, LZ4, ZSTD in write
- Spark 3.3: Support writing format ArrowStream
- Spark 3.3: Cast non-nullable if the table column is not null
- Spark 3.3: ArrowStream should close out in each batch
- Spark 3.3: Fix ArrowStream writer summary
- Spark 3.3: Fix ArrowStream writer memory leak and add metrics
- Spark 3.3: Count serialize time of writeRow
- Spark 3.3: Remove spark.clickhouse.write.batchSize upper bound limitation
- Build: Bump gRPC 1.47.0 (#150)
- Build: Switch default Maven Central mirror to Apache
- Build: Daily SonarQube report
- Build: Shade Jackson to avoid class conflict (#153)
- Build: Bump Gradle 7.5
- Test: Aglin isTesting w/ Spark
- Test: Bump clickhouse-jdbc 0.3.2-patch10 (#151)
- Test: Remove obsolete settings (#156)
- Test: Bump testcontainers-scala 0.40.8
- Docs: Document Spark versions support policy
- Docs: Add overview image
- Docs: Update developers docs
- Docs: Basic internal docs
- Playground: Expose ports of clickhouse-s1r1
- Playground: Add back iceberg
- Playground: Bump Iceberg 0.14.0
- Playground: Upgrade Kyuubi TPC-DS/TPC-H connector version