Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor deduper and write additional tests #649

Merged
merged 5 commits into from
Sep 6, 2016

Conversation

dianashk
Copy link
Contributor

@dianashk dianashk commented Sep 3, 2016

Refactored the deduper. Here's what got done:

  • Break out the diff functions to make testing them easier
  • Add tests for the newly separated diff functions
  • In the diff function, compare the layer, parent.*_id, address_parts.*, and name.*
    • Only compare zip if both records have it set, otherwise ignore
    • When comparing the parent.*_id properties, skip the check where *===item.layer
    • When comparing name.* only expect name.default to match, but only check additional languages if both items have it set, otherwise ignore

Fixes #604

@dianashk
Copy link
Contributor Author

dianashk commented Sep 3, 2016

TBD: I'd like to add a function in the deduper middleware to check which of the two items is prefered when a dupe is found. This would allow us to pick WOF records over Geonames, and OA over OSM, etc.

* @returns {boolean}
* @throws {Error}
*/
function isDiffLayer(item1, item2) {
Copy link
Member

@missinglink missinglink Sep 5, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the method signature of this function is not very intuitive, it's called isDiffLayer which I think is a great name if it returns a boolean value.

looking at the calling code it seems as though the return value is not actually used and the functions main purpose is validation via throw.

a bit nit-picky, but would this function name be better as validateDiffLayer (which returns nothing and throws on error) or isDiffLayer (which does not throw and returns bool so the caller can handle the error case)?

the same question applies to all the is* functions here.

@dianashk
Copy link
Contributor Author

dianashk commented Sep 6, 2016

Thanks @missinglink for the helpful feedback. Right on and not at all nitpicky. 👍 I think I've addressed all your comments.

@trescube
Copy link
Contributor

trescube commented Sep 6, 2016

Very helpful description and I think this will solve our deduping issues. :shipit:

@missinglink
Copy link
Member

:shipit:

@dianashk
Copy link
Contributor Author

dianashk commented Sep 6, 2016

Thanks all for feedback. Merging this and will create a separate issue for the isPreferred functionality mentioned in a comment above. 🚀

@dianashk dianashk merged commit 321001d into add-support-for-additional-scoring Sep 6, 2016
@orangejulius orangejulius deleted the update-deduper branch June 26, 2018 21:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants