Log a SQL warning when historical values don't fit in dolt_diff_<tablename>
's schema
#6459
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
We previously made a change (#6399) to automatically widen all columns in the
dolt_diff_<tablename>
system table to their widest setting (e.g.varchar(10)
→TEXT
), so that any historical values could still be displayed, even if the table's schema has been changed to a more restrictive/narrow type. This had an unintended side effect of changing data types for customers using thedolt_diff_<tablename>
system table, even if they didn't have any schema changes in their table's history.Since then, I've investigated a few alternate approaches of how to handle this case. One idea was to search through the commit history, look for schema changes, and use the widest historical type for each column. This adds extra latency to all queries against the
dolt_diff_<tablename>
tables, since the commit graph has to be traversed an extra time before any results can be returned. It also turns out to be fairly involved to compare types – we don't have any APIs for that (yet) and there are some edge cases such as if a Decimal type is changed multiple times with various precision and scale that aren't all valid together.As that code got more complicated, I thought a simpler approach might be better to start with for handling this edge case where a historical value cannot be converted to the current table's schema. This PR applies the same behavior when a value can't be coerced to a type (e.g. trying to coerce a float to a geometry type) – it will truncate the value to NULL in the table and log a SQL warning in the session.
This means that some historical values will be displayed as NULLs in the
dolt_diff_<tablename>
system table, which is still an improvement over the previous behavior where they would cause the query to error out and not return any result set. It also means that the column types will be the same types as on the current schema, making it easier for customer to know what type to expect in responses.We may still want to make this more sophisticated in the future, but this felt like a good tradeoff for now given that this edge case has not been a significant problem for customers so far.