Optimizations to the dictionary comparison strategy #51
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Take these two JSON files as an example:
By default Graphtage used to try all possible matchings between dictionary key/value pairs; comparing
graphtage f1.json f2.json
would result in the "foo" key being replaced by "bar" and the "f" in the "oof" key being moved to the front of the string.This sort of matching is polynomial time in the size of the input, but often is still intractable for large files. Therefore, Graphtage had an option,
--no-key-edits
or-k
that would prevent two dictionary key/value pairs from being compared to each other unless their keys were identical.graphtage -k f1.json f2.json
would have resulted in the2
being replaced by"two"
, the entire "oof" key/value pair being removed, and the entire "foo" key/value pair being added.This PR…
--dict-strategy
/-ds
option which sets the strategy:match
for the old default behavior andnone
for the old--no-key-edits
behavior. The--no-key-edits
option still exists, but now is an alias to--dict-strategy none
.--dict-strategy auto
, which is now the default, that behaves exactly the same as thematch
strategy, but in the event that two key/value pairs have then exact same key, then they are automatically matched.graphtage --dict-strategy auto f1.json f2.json
will now result in2
being replaces with"two"
,oof
being replaced bybar
, and"two"
being replaced by2
.