Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log a SQL warning when historical values don't fit in dolt_diff_<tablename>'s schema #6459

Merged
merged 4 commits into from
Aug 7, 2023

Conversation

fulghum
Copy link
Contributor

@fulghum fulghum commented Aug 4, 2023

We previously made a change (#6399) to automatically widen all columns in the dolt_diff_<tablename> system table to their widest setting (e.g. varchar(10)TEXT), so that any historical values could still be displayed, even if the table's schema has been changed to a more restrictive/narrow type. This had an unintended side effect of changing data types for customers using the dolt_diff_<tablename> system table, even if they didn't have any schema changes in their table's history.

Since then, I've investigated a few alternate approaches of how to handle this case. One idea was to search through the commit history, look for schema changes, and use the widest historical type for each column. This adds extra latency to all queries against the dolt_diff_<tablename> tables, since the commit graph has to be traversed an extra time before any results can be returned. It also turns out to be fairly involved to compare types – we don't have any APIs for that (yet) and there are some edge cases such as if a Decimal type is changed multiple times with various precision and scale that aren't all valid together.

As that code got more complicated, I thought a simpler approach might be better to start with for handling this edge case where a historical value cannot be converted to the current table's schema. This PR applies the same behavior when a value can't be coerced to a type (e.g. trying to coerce a float to a geometry type) – it will truncate the value to NULL in the table and log a SQL warning in the session.

This means that some historical values will be displayed as NULLs in the dolt_diff_<tablename> system table, which is still an improvement over the previous behavior where they would cause the query to error out and not return any result set. It also means that the column types will be the same types as on the current schema, making it easier for customer to know what type to expect in responses.

We may still want to make this more sophisticated in the future, but this felt like a good tradeoff for now given that this edge case has not been a significant problem for customers so far.

@fulghum fulghum requested a review from zachmu August 4, 2023 21:40
@fulghum fulghum marked this pull request as ready for review August 4, 2023 21:48
Copy link
Member

@zachmu zachmu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, needs to be a minor version bump though

go/libraries/doltcore/rowconv/row_converter.go Outdated Show resolved Hide resolved
Copy link
Member

@zachmu zachmu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, needs to be a minor version bump though

@fulghum fulghum merged commit ce8beef into main Aug 7, 2023
19 checks passed
@Hydrocharged Hydrocharged deleted the fulghum/diff branch August 24, 2023 00:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants