-
-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
replace addressit with pelias native parser #1287
Conversation
I did some tests (only in France for now) and it seems that this solves #1279 🎉 |
536104c
to
cfe5851
Compare
Here is the current status of acceptance tests on our planet build.
What are we targeting ? As good as |
I'd like to see all the sane tests still passing, if tests are brittle or just bad then they can be changed. I'm working off the assumption that all the broken tests at this stage are either venue queries or intersection queries? Could you please post any of your test cases which don't fall in one of those two buckets? But yeah, pretty good so far considering that it's pretty much a rewrite of autocomplete :) |
Yes, most of them are venue queries. Here is a set of master vs pelias_parser branch --- master 2019-05-21 17:14:45.528007350 +0200
+++ pelias_parser 2019-05-21 17:18:56.555871513 +0200
autocomplete admin areas
- ✔ [1-2] "{"text":"brook"}"
+ ✘ regression [1-2] "{"text":"brook"}": score 0 out of 1
+ diff:
+ label
+ expected: Brooklyn, New York, NY, USA
+ actual: Crystal Brook, NY, USA
...
autocomplete daly city
- ✔ [1] "{"focus.point.lat":"37.769316","focus.point.lon":"-122.484223","text":"dal"}"
+ ✘ regression [1] "{"focus.point.lat":"37.769316","focus.point.lon":"-122.484223","text":"dal"}": score 0 out of 1
+ diff:
+ name
+ expected: Daly City
+ actual: Berg en Dal
- ✔ [5] "{"focus.point.lat":"37.769316","focus.point.lon":"-122.484223","text":"daly ci"}"
+ ✘ regression [5] "{"focus.point.lat":"37.769316","focus.point.lon":"-122.484223","text":"daly ci"}": score 1 out of 2
+ diff:
+ priorityThresh is 1 but found at position 9
...
autocomplete focus
- ✔ [4] "{"focus.point.lat":52.507054,"focus.point.lon":13.321714,"text":"hard rock cafe"}"
- ✔ [5] "{"focus.point.lat":40.744243,"focus.point.lon":-73.990342,"text":"hard rock cafe"}"
+ ✘ regression [4] "{"focus.point.lat":52.507054,"focus.point.lon":13.321714,"text":"hard rock cafe"}": score 1 out of 2
+ diff:
+ priorityThresh is 1 but found at position 5
+ ✘ regression [5] "{"focus.point.lat":40.744243,"focus.point.lon":-73.990342,"text":"hard rock cafe"}": score 1 out of 2
+ diff:
+ priorityThresh is 1 but found at position 4
...
autocomplete street centroids
- ✔ [1] "{"sources":"osm","layers":"street","text":"rushendon furlong, england"}"
- ✔ [2] "{"sources":"osm","layers":"street","text":"grolmanstr, berlin"}"
+ ✘ regression [1] "{"sources":"osm","layers":"street","text":"rushendon furlong, england"}": no results returned
+ ✘ regression [2] "{"sources":"osm","layers":"street","text":"grolmanstr, berlin"}": no results returned
...
autocomplete street fallback
- ✔ [1] "{"text":"grolmanstr, berlin"}"
+ ✘ regression [1] "{"text":"grolmanstr, berlin"}": no results returned
...
autocomplete synonyms
- ✔ [3] "{"text":"ucsf mt zion, san francisco, ca"}"
+ ✘ regression [3] "{"text":"ucsf mt zion, san francisco, ca"}": score 2 out of 3
+ diff:
+ name
+ expected: UCSF Mount Zion Campus
+ actual: San Francisco
...
labels
- ✔ [24] "{"text":"national air and space museum, washington dc"}"
+ ✘ regression [24] "{"text":"national air and space museum, washington dc"}": score 0 out of 1
+ diff:
+ label
+ expected: National Air and Space Museum, Washington, DC, USA
+ actual: Frigid Air, Tacoma, WA, USA
...
landmarks
- ✔ [15] "{"text":"australian war memorial, canberra, australia"}"
+ ✘ regression [15] "{"text":"australian war memorial, canberra, australia"}": score 2 out of 4
+ diff:
+ name
+ expected: Australian War Memorial
+ actual: Australian Capital Territory
+ locality
+ expected: Campbell
+ actual:
...
locality geodisambiguation
- ✔ [22] "{"text":"Germa","lang":"ru"}"
+ ✘ regression [22] "{"text":"Germa","lang":"ru"}": score 0 out of 2
+ diff:
+ name
+ expected: Германия
+ actual: Germa
+ country_a
+ expected: DEU
+ actual: GRC
...
search
- ✔ [1426636804303:51] "{"text":"4th and King"}"
+ ✘ regression [1426636804303:51] "{"text":"4th and King"}": score 0 out of 6
+ diff:
+ name
+ expected: 4th & King
+ actual: ۴/۴
+ locality
+ expected: San Francisco
+ actual:
+ country_a
+ expected: USA
+ actual: IRN
+ name
+ expected: San Francisco 4th & King Street
+ actual: ۴/۴
+ locality
+ expected: San Francisco
+ actual:
+ country_a
+ expected: USA
+ actual: IRN
...
search_poi
- ✔ [9] "{"text":"Ohio State University"}"
- ✔ [10] "{"text":"Antioch University Seattle"}"
- ✔ [11] "{"text":"Union college, kentucky"}"
+ ✘ regression [9] "{"text":"Ohio State University"}": score 0 out of 2
+ diff:
+ name
+ expected: Ohio State University Lima
+ actual: Ohio
+ localadmin
+ expected: Bath Township
+ actual:
+ ✘ regression [10] "{"text":"Antioch University Seattle"}": score 1 out of 2
+ diff:
+ name
+ expected: Antioch University
+ actual: Seattle
+ ✘ regression [11] "{"text":"Union college, kentucky"}": score 1 out of 2
+ diff:
+ name
+ expected: Union College
+ actual: Kentucky
|
07e0296
to
e9b68ba
Compare
This PR is showing really good results, especially for intersection queries, where it's a massive improvement over using Before merging it would be great to get a handle on the performance implications for request time. Here are some things we can look at:
|
So many improvements since my last acceptance test 😄
|
I tested the heap usage and it's actually a lot lower than I expected! const mem = function(){
let used = process.memoryUsage().heapUsed / 1024 / 1024
console.log(`The script uses approximately ${used} MB\n`)
}
console.log('baseline')
mem()
let AddressParser = require('./parser/AddressParser')
console.log('require')
mem()
let parser = new AddressParser()
console.log('instantiate')
mem()
|
b7dfd00
to
c411c5e
Compare
Hi there, Just to let you know that I'm using this this (mixed with #1268) at @jawg and I have some feedback. The analyzer If I use the full address {
"subject": "20 boulevard saint germain",
"housenumber": "20",
"street": "boulevard saint germain",
"locality": "paris",
"country": "france",
"admin": "paris france"
} And we found in first position the correct answer : But when we do the same search using some synonyms, like {
"subject": "20 boulevard st germain",
"housenumber": "20",
"street": "boulevard st germain",
"locality": "paris",
"country": "france",
"admin": "paris france"
} But we have no answers, that's because we use the analyzer If I replace the
|
…ultimatch", clean up related code
3dc0d8f
to
88656e0
Compare
merged! thanks everyone for your patience, I think this is a big step forward to improving autocomplete and taking on more control of how we parse inputs. 🎉 🍾 |
👏 🚀 🎉 |
These all appear to pass now thanks to pelias/api#1287
Unclear why these fail, but it may be due to pelias/api#1287
Hi guys, |
Hi @nilsnolde there is no configuration change required. |
Oh damn, blind.. Thanks Peter. Good rant btw! Totally agreed. Might steal your wording maybe;) |
The issue described has been solved by the Pelias Parser in pelias/api#1287!
This PR replaces
addressit
with the new pelias/parser module.Ready for testing and review
Functionally, I tried to keep them pretty similar, although the native parser is better at detecting addresses in general, and particularly in places like Germany and France 🎉
One major change is that we used to rely on the admin parsing to generate targeted subqueries (ie. when state='CA' was matched we would produce a subquery for region='CA').
This had the disadvantage of not generating a scoring subquery for cases where 'CA' referred to the country of Canada.
So the change I made is that we consider everything from the first matched admin character to the end of the string as 'admin_parts' and generate subqueries for them to match on any admin field.