Skip to content

Commit

Permalink
0.8.0 Release (#400)
Browse files Browse the repository at this point in the history
* initial new JSON spiking

* Basic Variant type reads

* Checkpoint for JSON data type reads

* Checkpoint for JSON data type reads

* Checkpoint for JSON data type reads

* Some lint and test cleanup

* Exclude Python 3.13 testing

* Clean up TLS test configuration

* Fix tls test lint

* Cloud test fixes

* Fix lint

* Add HTTP streaming buffer

* Improve variant type handling

* Add new mechanism for DateTime64 binding

* Add tls_mode client parameter, update changelog
  • Loading branch information
genzgd authored Sep 26, 2024
1 parent b90cdf9 commit d113091
Show file tree
Hide file tree
Showing 63 changed files with 1,045 additions and 530 deletions.
19 changes: 0 additions & 19 deletions .docker/clickhouse/single_node/config.xml
Original file line number Diff line number Diff line change
Expand Up @@ -37,24 +37,5 @@
<table>session_log</table>
</session_log>

<http_options_response>
<header>
<name>Access-Control-Allow-Origin</name>
<value>*</value>
</header>
<header>
<name>Access-Control-Allow-Headers</name>
<value>accept, origin, x-requested-with, content-type, authorization</value>
</header>
<header>
<name>Access-Control-Allow-Methods</name>
<value>POST, GET, OPTIONS</value>
</header>
<header>
<name>Access-Control-Max-Age</name>
<value>86400</value>
</header>
</http_options_response>

<custom_settings_prefixes>SQL_</custom_settings_prefixes>
</clickhouse>
2 changes: 1 addition & 1 deletion .docker/clickhouse/single_node_tls/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
FROM clickhouse/clickhouse-server:24.3-alpine
FROM clickhouse/clickhouse-server:24.8-alpine
COPY .docker/clickhouse/single_node_tls/certificates /etc/clickhouse-server/certs
RUN chown clickhouse:clickhouse -R /etc/clickhouse-server/certs \
&& chmod 600 /etc/clickhouse-server/certs/* \
Expand Down
2 changes: 2 additions & 0 deletions .docker/clickhouse/single_node_tls/config.xml
Original file line number Diff line number Diff line change
Expand Up @@ -43,4 +43,6 @@
<partition_by>toYYYYMM(event_date)</partition_by>
<flush_interval_milliseconds>1000</flush_interval_milliseconds>
</query_log>

<custom_settings_prefixes>SQL_</custom_settings_prefixes>
</clickhouse>
17 changes: 1 addition & 16 deletions .github/workflows/clickhouse_ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ jobs:
- name: "Add distribution info" # This lets SQLAlchemy find entry points
run: python setup.py develop

- name: run ClickHouse Cloud SMT tests
- name: run ClickHouse Cloud tests
env:
CLICKHOUSE_CONNECT_TEST_PORT: 8443
CLICKHOUSE_CONNECT_TEST_CLOUD: 'True'
Expand All @@ -42,18 +42,3 @@ jobs:
run: pytest tests/integration_tests
- name: remove latest container
run: docker compose down -v

- name: run ClickHouse Cloud tests
env:
CLICKHOUSE_CONNECT_TEST_PORT: 8443
CLICKHOUSE_CONNECT_TEST_INSERT_QUORUM: 3
CLICKHOUSE_CONNECT_TEST_HOST: ${{ secrets.INTEGRATIONS_TEAM_TESTS_CLOUD_HOST }}
CLICKHOUSE_CONNECT_TEST_PASSWORD: ${{ secrets.INTEGRATIONS_TEAM_TESTS_CLOUD_PASSWORD }}
run: pytest tests/integration_tests

- name: Run ClickHouse Container (HEAD)
run: CLICKHOUSE_VERSION=head docker compose up -d clickhouse
- name: Run HEAD tests
run: pytest tests/integration_tests
- name: remove head container
run: docker compose down -v
13 changes: 4 additions & 9 deletions .github/workflows/on_push.yml
Original file line number Diff line number Diff line change
Expand Up @@ -58,17 +58,15 @@ jobs:
strategy:
matrix:
python-version:
- '3.8'
- '3.9'
- '3.10'
- '3.11'
- '3.12'
clickhouse-version:
- '23.8'
- '23.12'
- '24.1'
- '24.2'
- '24.3'
- '24.6'
- '24.7'
- '24.8'
- latest

name: Local Tests Py=${{ matrix.python-version }} CH=${{ matrix.clickhouse-version }}
Expand Down Expand Up @@ -99,14 +97,11 @@ jobs:
sudo echo "127.0.0.1 server1.clickhouse.test" | sudo tee -a /etc/hosts
- name: Run tests
env:
CLICKHOUSE_CONNECT_TEST_TLS: 1
CLICKHOUSE_CONNECT_TEST_DOCKER: 'False'
CLICKHOUSE_CONNECT_TEST_FUZZ: 50
SQLALCHEMY_SILENCE_UBER_WARNING: 1
run: pytest tests
- name: Run TLS tests
env:
CLICKHOUSE_CONNECT_TEST_TLS: 1
run: pytest tests/tls

check-secret:
runs-on: ubuntu-latest
Expand Down
67 changes: 66 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,75 @@
### WARNING -- Impending Breaking Change - Server Settings in DSN
When creating a DBAPI Connection method using the Connection constructor or a SQLAlchemy DSN, the library currently
converts any unrecognized keyword argument/query parameter to a ClickHouse server setting. Starting in the next minor
release (0.8.0), unrecognized arguments/keywords for these methods of creating a DBAPI connection will raise an exception
release (0.9.0), unrecognized arguments/keywords for these methods of creating a DBAPI connection will raise an exception
instead of being passed as ClickHouse server settings. This is in conjunction with some refactoring in Client construction.
The supported method of passing ClickHouse server settings is to prefix such arguments/query parameters with`ch_`.

## 0.8.0, 2024-09-26
### Experimental Feature - "New" JSON/Dynamic/Variant DataTypes
#### Usage Notes
- JSON data can be inserted as either a Python dictionary or a JSON string containing a JSON object `{}`. Other
forms of JSON data are not supported
- Valid formats for the JSON type are 'native', which returns a Python dictionary, or 'string', which returns a JSON string
- Any value can be inserted into a Variant column, and ClickHouse will try to correctly determine the correct Variant
Type for the value, based on its String representation.
- More complete documentation for the new types will be provided in the future.

#### Known limitations:
- Each of these types must be enabled in the ClickHouse settings before using. The "new" JSON type is available started
with the 24.8 release
- Returned JSON objects will only return the `max_dynamic_paths` number of elements (which defaults to 1024). This
will be fixed in a future release.
- Inserts into `Dynamic` columns will always be the String representation of the Python value. This will be fixed
in a future release.
- The implementation for the new types has not been optimized in C code, so performance may be somewhat slower than for
simpler, established data types.

This is the first time that a new `clickhouse_connect` features has been labeled "experimental", but these new
datatypes are complex and still experimental in ClickHouse server. Current test coverage for these types is also
quite limited. Please don't hesitate to report issues with the new types.

### Bug Fixes
- When operating ClickHouse Server in `strict` TLS mode, HTTPS connections [require](https://github.com/ClickHouse/poco/blob/master/NetSSL_OpenSSL/include/Poco/Net/Context.h#L84-L89) a client certificate even if that
certificate is not used for authentication. A new client parameter `tls_mode='strict'` can be used in this situation where
username/password authentication is being used with client certificates. Other valid values for the new `tls_mode` setting
are `'proxy'` when TLS termination occurs at a proxy, and `'mutual'` to specify mutual TLS authentication is used by
the ClickHouse server. If `tls_mode` is not set, and a client certificate and key are provided, `mutual` is assumed.
- The server timezone was not being used for parameter binding if parameters were sent as a list instead of a dictionary.
This should fully fix the reopened https://github.com/ClickHouse/clickhouse-connect/issues/377.
- String port numbers (such as from environmental variables) are now correctly interpreted to determine the correct interface/protocol.
Fixes https://github.com/ClickHouse/clickhouse-connect/issues/395
- Insert commands with a `SELECT FROM ... LIMIT 0` will no longer raise an exception. Closes https://github.com/ClickHouse/clickhouse-connect/issues/389.

### Improvements
- Some low level errors for problems with Native format inserts and queries now include the relevant column name in the
error message. Thanks to [Angus Holder](https://github.com/angusholder) for the PR!
- There is a new intermediate buffer for HTTP streaming/chunked queries. The buffer will store raw data from the HTTP request
until it is actually requested in a stream. This allows some lag between reading the data from ClickHouse and processing
the same data. Previously, if processing the data stream fell 30 seconds behind the ClickHouse HTTP writes to the stream,
the ClickHouse server would close the connection, aborting the query and stream processing. This will now be mitigated by
storing the data stream in the new intermediate buffer. By default, this buffer is set to 10 megabytes, but for slow
processing of large queries where memory is not an issue, the buffer size can be increasing using the new `common` setting
`http_buffer_size`. This is a fix in some cases of https://github.com/ClickHouse/clickhouse-connect/issues/399, but note that
slow processing of large queries will still cause connection and processing failures if the data cannot be buffered.
- It is now possible to correctly bind `DateTime64` type parameters when calling Client `query` methods through one of two approaches:
- Wrap the Python `datetime.datetime` value in the new DT64Param class, e.g.
```python
query = 'SELECT {p1:DateTime64(3)}' # Server side binding with dictionary
parameters={'p1': DT64Param(dt_value)}

query = 'SELECT %s as string, toDateTime64(%s,6) as dateTime' # Client side binding with list
parameters=['a string', DT64Param(datetime.now())]
```
- If using a dictionary of parameter values, append the string `_64` to the parameter name
```python
query = 'SELECT {p1:DateTime64(3)}, {a1:Array(DateTime(3))}' # Server side binding with dictionary

parameters={'p1_64': dt_value, 'a1_64': [dt_value1, dt_value2]}
```
This closes https://github.com/ClickHouse/clickhouse-connect/issues/396, see also the similar issue https://github.com/ClickHouse/clickhouse-connect/issues/212


## 0.7.19, 2024-08-23
### Bug Fix
- Insertion of large strings was triggering an exception. This has been fixed.
Expand Down
2 changes: 1 addition & 1 deletion clickhouse_connect/__version__.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
version = '0.7.19'
version = '0.8.0'
2 changes: 1 addition & 1 deletion clickhouse_connect/cc_sqlalchemy/datatypes/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@

from clickhouse_connect.datatypes.base import ClickHouseType, TypeDef, EMPTY_TYPE_DEF
from clickhouse_connect.datatypes.registry import parse_name, type_map
from clickhouse_connect.driver.query import str_query_value
from clickhouse_connect.driver.binding import str_query_value

logger = logging.getLogger(__name__)

Expand Down
4 changes: 2 additions & 2 deletions clickhouse_connect/cc_sqlalchemy/ddl/custom.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
from sqlalchemy.sql.ddl import DDL
from sqlalchemy.exc import ArgumentError

from clickhouse_connect.driver.query import quote_identifier
from clickhouse_connect.driver.binding import quote_identifier


# pylint: disable=too-many-ancestors,abstract-method
Expand Down Expand Up @@ -31,7 +31,7 @@ def __init__(self, name: str, engine: str = None, zoo_path: str = None, shard_na
super().__init__(stmt)


# pylint: disable=too-many-ancestors,abstract-method
# pylint: disable=too-many-ancestors,abstract-method
class DropDatabase(DDL):
"""
Alternative DDL statement for built in SqlAlchemy DropSchema DDL class
Expand Down
2 changes: 1 addition & 1 deletion clickhouse_connect/cc_sqlalchemy/dialect.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
from clickhouse_connect.cc_sqlalchemy.sql.ddlcompiler import ChDDLCompiler
from clickhouse_connect.cc_sqlalchemy import ischema_names, dialect_name
from clickhouse_connect.cc_sqlalchemy.sql.preparer import ChIdentifierPreparer
from clickhouse_connect.driver.query import quote_identifier, format_str
from clickhouse_connect.driver.binding import quote_identifier, format_str


# pylint: disable=too-many-public-methods,no-self-use,unused-argument
Expand Down
2 changes: 1 addition & 1 deletion clickhouse_connect/cc_sqlalchemy/sql/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

from sqlalchemy import Table

from clickhouse_connect.driver.query import quote_identifier
from clickhouse_connect.driver.binding import quote_identifier


def full_table(table_name: str, schema: Optional[str] = None) -> str:
Expand Down
2 changes: 1 addition & 1 deletion clickhouse_connect/cc_sqlalchemy/sql/ddlcompiler.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
from sqlalchemy.sql.compiler import DDLCompiler

from clickhouse_connect.cc_sqlalchemy.sql import format_table
from clickhouse_connect.driver.query import quote_identifier
from clickhouse_connect.driver.binding import quote_identifier


class ChDDLCompiler(DDLCompiler):
Expand Down
2 changes: 1 addition & 1 deletion clickhouse_connect/cc_sqlalchemy/sql/preparer.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
from sqlalchemy.sql.compiler import IdentifierPreparer

from clickhouse_connect.driver.query import quote_identifier
from clickhouse_connect.driver.binding import quote_identifier


class ChIdentifierPreparer(IdentifierPreparer):
Expand Down
3 changes: 3 additions & 0 deletions clickhouse_connect/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -81,3 +81,6 @@ def _init_common(name: str, options: Sequence[Any], default: Any):
_init_common('use_protocol_version', (True, False), True)

_init_common('max_error_size', (), 1024)

# HTTP raw data buffer for streaming queries. This should not be reduced below 64KB to ensure compatibility with LZ4 compression
_init_common('http_buffer_size', (), 10 * 1024 * 1024)
2 changes: 2 additions & 0 deletions clickhouse_connect/datatypes/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,6 @@
import clickhouse_connect.datatypes.special
import clickhouse_connect.datatypes.string
import clickhouse_connect.datatypes.temporal
import clickhouse_connect.datatypes.dynamic
import clickhouse_connect.datatypes.registry
import clickhouse_connect.datatypes.postinit
28 changes: 14 additions & 14 deletions clickhouse_connect/datatypes/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@

from abc import ABC
from math import log
from typing import NamedTuple, Dict, Type, Any, Sequence, MutableSequence, Optional, Union, Collection
from typing import NamedTuple, Dict, Type, Any, Sequence, MutableSequence, Union, Collection

from clickhouse_connect.driver.common import array_type, int_size, write_array, write_uint64, low_card_version
from clickhouse_connect.driver.context import BaseQueryContext
Expand Down Expand Up @@ -94,6 +94,10 @@ def name(self):
name = f'{wrapper}({name})'
return name

@property
def insert_name(self):
return self.name

def data_size(self, sample: Sequence) -> int:
if self.low_card:
values = set(sample)
Expand All @@ -104,10 +108,13 @@ def data_size(self, sample: Sequence) -> int:
d_size += 1
return d_size

def _data_size(self, _sample: Collection) -> int:
def _data_size(self, sample: Collection) -> int:
if self.byte_size:
return self.byte_size
return 0
total = 0
for x in sample:
total += len(str(x))
return total / len(sample) + 1

def write_column_prefix(self, dest: bytearray):
"""
Expand All @@ -119,7 +126,7 @@ def write_column_prefix(self, dest: bytearray):
if self.low_card:
write_uint64(low_card_version, dest)

def read_column_prefix(self, source: ByteSource):
def read_column_prefix(self, source: ByteSource, _ctx: QueryContext):
"""
Read the low cardinality version. Like the write method, this has to happen immediately for container classes
:param source: The native protocol binary read buffer
Expand All @@ -139,7 +146,7 @@ def read_column(self, source: ByteSource, num_rows: int, ctx: QueryContext) -> S
:param ctx: QueryContext for query specific settings
:return: The decoded column data as a sequence and the updated location pointer
"""
self.read_column_prefix(source)
self.read_column_prefix(source, ctx)
return self.read_column_data(source, num_rows, ctx)

def read_column_data(self, source: ByteSource, num_rows: int, ctx: QueryContext) -> Sequence:
Expand Down Expand Up @@ -274,18 +281,11 @@ def _write_column_low_card(self, column: Sequence, dest: bytearray, ctx: InsertC
write_uint64(len(index), dest)
self._write_column_binary(index, dest, ctx)
write_uint64(len(keys), dest)
write_array(array_type(1 << ix_type, False), keys, dest, ctx)
write_array(array_type(1 << ix_type, False), keys, dest, ctx.column_name)

def _active_null(self, _ctx: QueryContext) -> Any:
return None

def _first_value(self, column: Sequence) -> Optional[Any]:
if self.nullable:
return next((x for x in column if x is not None), None)
if len(column):
return column[0]
return None


EMPTY_TYPE_DEF = TypeDef()
NULLABLE_TYPE_DEF = TypeDef(wrappers=('Nullable',))
Expand Down Expand Up @@ -338,7 +338,7 @@ def _finalize_column(self, column: Sequence, ctx: QueryContext) -> Sequence:
def _write_column_binary(self, column: Union[Sequence, MutableSequence], dest: bytearray, ctx: InsertContext):
if len(column) and self.nullable:
column = [0 if x is None else x for x in column]
write_array(self._array_type, column, dest, ctx)
write_array(self._array_type, column, dest, ctx.column_name)

def _active_null(self, ctx: QueryContext):
if ctx.as_pandas and ctx.use_extended_dtypes:
Expand Down
Loading

0 comments on commit d113091

Please sign in to comment.