-
Notifications
You must be signed in to change notification settings - Fork 271
Support key columns of any data type #539
Comments
Hi I am using data-diff to mainly diff tables between a production database and a test database. objid , objkey, name (Text) , description (Text) So ofcourse the objid values are different between the two databases but the Name column is a column which is unique for the system. So ideally I would like to use the name column as my key column so that I can compare the tables between the two systems. Currently this is not possible, but the system guarantees that the name column really acts like a key column even though it is just a Text() column. So what would be a good approach to solve this and diff the tables by "forcing" data-diff to treat the column as an id so somehow overrule it to say as a user, yes, I can assure you that this column is a key column? Is there anything particular you need with regards to key columns that you do not accept TEXT() as a key column? Thanks! |
@mariahjrogers @koenvb which database providers are you encountering this with? |
@dlawin For my use case I am using the oracle connection. I fetch it straight from the erp system. |
I just started using data-diff to compare data across Postgres databases. |
Context on why this is not supported yet: We currently support certain strings as PKs. They need to be alphanums with a fixed length (and only these characters) The reason for this is that we need to be able to perform arithmetic operations on the keys in order to segment and bisect the tables when the database is of type ThreadedDatabase. It’s possible but not trivial to implement this for arbitrary types For fixed length alphanums, you can see we convert them to numbers in order to support them as keys Line 181 in 312a9c5
|
Hi all! I'm sorry for the delay in following up on this. @mariahjrogers thank you for taking the time to raise this issue! We made a hard decision to sunset the data-diff package and won't provide further development or support. If that's of interest, over the past few months, we have rewritten the diffing engine in Datafold Cloud and solved many issues that existed in this package's diffing algorithm. -Gleb |
Is your feature request related to a problem? Please describe.
I am trying to run a diff on a table without one single primary key column, so I'm passing in multiple key columns. Some of these columns are numeric, some are timestamps, and some are VARCHARs. I get an error thrown by the tool when it encounters the TIMESTAMP column:
ERROR - Cannot use a column of type Timestamp(precision=6, rounds=True) as a key
. I would happily pass it in as a VARCHAR, but I can't modify the type of the field in the table directly, so I tried passing<tstamp_field>::VARCHAR
as the key column value, and then got this error:ERROR - Column 'date_day::varchar' not found in table 1
.Describe the solution you'd like
Data-diff should support key columns of any type and handle different types behind the scenes.
Failing that, it would be great to be able to pass in a column with a type cast and not get a "column not found" error.
Describe alternatives you've considered
One alternative to be able to run my diff with non-VARCHAR key columns would be for me to change the types of the columns in my production table, which isn't an acceptable alternative.
Another alternative would be to generate a new column which is a composite key, but this also isn't an acceptable alternative because it requires first modifying the production version of the table before being able to run a diff against it, when the whole point is that I want to diff my changes before making any impact on the production table.
Additional context
The text was updated successfully, but these errors were encountered: