Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[py-tx][backwards incompatible] Split Update Format and Index Format #1014

Merged
merged 6 commits into from
May 26, 2022

Conversation

Dcallies
Copy link
Contributor

Summary

The flow of information through the system is:

API.fetch() => Delta => Store => Index => Metadata

NCMEC's API is like the tag-based ThreatExchange API, and it's not stored as hash => record, it's actually stored record => [hash, hash, hash] which is incompatible with many assumptions made along the way.

After lots of mulling, I think the solution is to:

  1. Separate considerations about storing the record that the fetch() API produces
  2. Converting the format from the fetch() update store to the signal => metadata

As a part of that, I need to mess with a lot of the interfaces above. There are more changes that I want to make, and I tried to checkpoint with this diff.

The final version I want to end up with is:

  1. Storing what the update format is from fetch should move from Delta => SignalExchangeAPI
  2. Delta should become a trivial object, with no need to override it

As a side effect, the storage format for the CLI changed from JSON to pickle. It's possible I can actually revert this in a few more PRs, but right now Delta is the only object that knows the correct format.

Test Plan

Added a test

Also went through:

  1. Fetch from threatexchange
  2. Match some content
  3. Run the dataset command to make sure it isn't broken (yet).

@Dcallies Dcallies added python-threatexchange Items related to the threatexchange python tool / library backwards incompatible labels May 25, 2022
@Dcallies
Copy link
Contributor Author

The error doesn't reproduce in 3.8 in devcontainer, let me see if I can get more information.

@Dcallies
Copy link
Contributor Author

Looks like it's fixed in 3.7 python/typing#511

Copy link
Contributor

@BarrettOlson BarrettOlson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve to unblock. I'd rather merge this as a checkpoint than an even bigger PR later. Thanks!

@Dcallies Dcallies merged commit d74c3e1 into facebook:main May 26, 2022
@Dcallies Dcallies deleted the delta_v2 branch May 26, 2022 21:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backwards incompatible CLA Signed python-threatexchange Items related to the threatexchange python tool / library
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants