Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bindings for near-duplicate detection and address deduping #31

Open
iantabolt opened this issue May 10, 2018 · 3 comments
Open

Bindings for near-duplicate detection and address deduping #31

iantabolt opened this issue May 10, 2018 · 3 comments

Comments

@iantabolt
Copy link

Reading over openvenues/libpostal#294 we are very interested in making use of some of these new features. The question is which methods would be best to expose in the jpostal bindings?

I saw that you mentioned in that libostal PR that you have pypostal bindings already to use the new API from lieu, but I can't seem to find this. If you could point me towards these python bindings, I'd be happy to port them over to jpostal and open a PR.

Many many thanks!!

@iantabolt
Copy link
Author

iantabolt commented Jun 6, 2018

I am working on this now. I am starting with adding dedupe.h bindings. To answer my own question for reference, the python bindings are found at https://github.com/openvenues/pypostal/blob/master/postal/pydedupe.c and https://github.com/openvenues/pypostal/blob/master/postal/dedupe.py

@albarrentine
Copy link
Contributor

Hey, sorry, have been super booked lately working on a voting rights restoration campaign for November. That's exciting, and yes, those are the files to look at on the pypostal side. Also keep in mind the concurrency/synchronization stuff we do for the other jpostal bindings (it's just on the Java side so should be more familiar to folks, see e.g. https://github.com/openvenues/jpostal/blob/master/src/main/java/com/mapzen/jpostal/AddressExpander.java for details).

@iantabolt
Copy link
Author

Absolutely no problem. Thanks for the heads up and all the awesome work you've already done!

I am more or less basing it exactly off of other code that already exists in AddressExpander and AddressParser so it's pretty straightforward. Once I finish the bindings for the fuzzy and toponym duplicate methods then I'll open the PR. This is my WIP branch https://github.com/openvenues/jpostal/compare/master...iantabolt:dedupe-bindings?expand=1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants