Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include the column name in the error message for an unexpected NULL #397

Merged
merged 5 commits into from
Sep 19, 2024

Conversation

angusholder
Copy link
Contributor

Previously if you inserted a NULL into a column that isn't Nullable, the error you got was

Unable to create Python array. This is usually caused by trying to insert None values into a ClickHouse column that is not Nullable

which is unhelpful for working out which column is the problem. I've modified that error path so it can include the column name in the error, like so:

Failed to write column 'bus_voltage': Unable to create Python array. This is usually caused by trying to insert None values into a ClickHouse column that is not Nullable

I felt like this was a pretty small change so I didn't add tests or file an issue, I hope that's okay.

@@ -198,3 +199,12 @@ def _convert_numpy(self, np_array):
data[ix] = data[ix].tolist()
self.column_oriented = True
return data

def start_column(self, name: str):
Copy link
Contributor Author

@angusholder angusholder Sep 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This gets called during the insert to tell us what the current column being inserted is, so I store its name here so we have it if we run into an error during inserting that column

self._column_name = name

def make_data_error(self, error_message: str) -> DataError:
if self._column_name is not None:
Copy link
Contributor Author

@angusholder angusholder Sep 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's where we use the column name that was stored by start_column(). I don't know if it's possible to reach here by doing a column insert without start_column() having been called, but I handled None just in case

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not possible, so it's safe to remove the None check.

@@ -54,8 +55,8 @@ def write_array(code: str, column: Sequence, dest: MutableSequence):
buff = struct.Struct(f'<{len(column)}{code}')
dest += buff.pack(*column)
except (TypeError, OverflowError, struct.error) as ex:
raise DataError('Unable to create Python array. This is usually caused by trying to insert None ' +
'values into a ClickHouse column that is not Nullable') from ex
raise ctx.make_data_error('Unable to create Python array. This is usually caused by trying to insert None ' +
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the only error I really needed improving, but I modified the other places where DataError could be raised too, in string.py

Copy link
Collaborator

@genzgd genzgd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much, I agree it's hard to track down obscure insert errors and this will help. Stashing the column name in InsertContext makes a lot of sense, btw -- I haven't looked deeply, but maybe we add that to the BaseQueryContext instead since it doesn't hurt to have it around for queries as well.

@@ -38,12 +38,13 @@ def array_type(size: int, signed: bool):
return code if signed else code.upper()


def write_array(code: str, column: Sequence, dest: MutableSequence):
def write_array(code: str, column: Sequence, dest: MutableSequence, ctx):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can add the ctx Type here too?

self._column_name = name

def make_data_error(self, error_message: str) -> DataError:
if self._column_name is not None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not possible, so it's safe to remove the None check.

@CLAassistant
Copy link

CLAassistant commented Sep 19, 2024

CLA assistant check
All committers have signed the CLA.

@angusholder
Copy link
Contributor Author

angusholder commented Sep 19, 2024

Thanks so much, I agree it's hard to track down obscure insert errors and this will help. Stashing the column name in InsertContext makes a lot of sense, btw -- I haven't looked deeply, but maybe we add that to the BaseQueryContext instead since it doesn't hurt to have it around for queries as well.

Sounds like a good idea, I've moved it to BaseQueryContext as you suggest.

Should be ready for another CI run now

@genzgd genzgd merged commit b90cdf9 into ClickHouse:main Sep 19, 2024
33 checks passed
@genzgd
Copy link
Collaborator

genzgd commented Sep 19, 2024

Thanks again! I'm hoping to do another release next week that will include this.

@angusholder angusholder deleted the better-column-errors branch September 19, 2024 19:03
@angusholder
Copy link
Contributor Author

No problem! Great to hear, thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants