Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable interpolation on autocomplete #131

Open
adelcasse opened this issue Apr 13, 2018 · 9 comments
Open

Enable interpolation on autocomplete #131

adelcasse opened this issue Apr 13, 2018 · 9 comments

Comments

@adelcasse
Copy link

We would like to contribute on enabling interpolation for autocomplete. This is a big step in resolving issues in the use of Pelias in our product. As discussed with @orangejulius on Gitter, I open this issue to discuss how we could help on implementing it and how it should be done.

@missinglink
Copy link
Member

missinglink commented Apr 13, 2018

Hi @adelcasse, this feature would need to be addressed in the pelias/api repository to enable autocomplete in Pelias.

The interpolation engine was designed to be a standalone service, I would prefer not to implement autocomplete here because it's a linguistics/syntax/natural language problem and not directly related to address schemas/geodesic math etc.

I think would be 'cleaner' if an outside system (eg. Pelias using an elasticsearch index, or any other system) was able to resolve partially completed address inputs into a complete 'street name' and desired 'house number' (we also need a lat/lon anywhere inside the bounding box of the street in order to disabiguate two streets with the same name in the same city).

Once this has been done you can send those three values (street name, house number, filter point) to the interpolation service and it will return a value, either an interpolation or an exact match.

see: #112 (comment) regarding integration.

In that comment, I wrote a little about our current integration between 'Pelias proper' and the interpolation engine:

- If a user requests an address and elasticsearch returns a street, send a second request to the interpolation
  service for a result. If successful, use the interpolated result, otherwise use the street centroid.

This also only applies to the /v1/search endpoint and not to the autocomplete endpoint at this stage.

Let me explain a little more about how that currently works.

more info: [design doc] [relationship to pelias] [existing standards] [conflation]

indexing

  1. We join the road segments from OpenStreetMap into their longest contiguous linestring and export them as 'polylines' (our .0sv file format)
  2. We import the street data into Pelias in the layer named street, we include the centroid (midpoint) of the line string.
  3. We build the interpolation index, this is fairly complex but you can find more info in the wiki links I posted above, we use the exact same .0sv file as above.

searching

  1. We parse the input text and check to see what constituent parts it contains, if the query is identified as containing a street name and a house number then it's a candidate for an address search
  2. We query elasticsearch for an exact address match (we have addresses there also) and fall back to the street if we don't have a match.
  3. There is some logic in the pelias/api codebase which is able to detect that the user requested an address and got back a street.
  4. In this case, we pass the name of the street, the requested house number and the street centroid to the interpolation engine, which returns an interpolated result.
  5. The street-level-accuracy result is substituted with the address-level-accuracy result and returned to the user.

sorry for the wall of text :)
so... regarding autocomplete

If we want to enable this feature for autocomplete then we need to have a parsing engine capable of (at minimum) being able to parse partially completed input text in to housenumber street name, it should really also be able to handle identifying postalcodes and administrative areas (I recently enabled autocomplete on https://github.com/pelias/placeholder, so that could probably handle the admin portion).

Writing a geographic text parser is not an easy undertaking, and one that is autocomplete aware is even more difficult, we currently use three parsing engines:

  • libpostal is used by the /v1/search endpoint and is fairly accurate in most cases, it lacks two features which we would like to have.
    • it does not support autocomplete
    • it does not handle ambiguities (like 'ontario, ca' being both Canada and California)
  • addressit is a simple parser based off regular expressions, we use this as a fallback parser, it's not very robust, but it can handle some very generic formats of addreses.
  • placeholder is a library I wrote last year, it supports autocomplete. Currently it only supports administrative areas (towns, cities, countries etc) but could potentially be expanded to include streets. It also supports languages, ambiguities and synonyms.

Have a look at the readme docs for those repos to get a better understanding of how they work.

The current obstacle to enabling interpolation on autocomplete is that none of these engines is sufficiently capable of parsing partially completed address input (eg. "1 Ma").

If they were able to do so, then it would probably result in 10,000+ street names globally starting with Ma.
Each of these streets would need to be queried against the interpolation index, which could cause performance issues at scale.

There may be some workarounds for this (like only using the top 10), but they would also need to be considered.

Again, sorry for the wall of text, hopefully that gives some background to the feature and an idea of it's complexity.

If you're still interested in discussing further we could set up a call, depending on your timelines we might be available for consultancy work, if that interests you, or I can continue to help out for free on the issue tracker :)

@elsa-pato
Copy link

Hi @missinglink ,

I work with @adelcasse and I would like to add this feature to Pelias.
I'm totally new to Pelias and I've started looking into it last week, so I haven't gone too far yet :)

For now I've tried to do the following in pelias-api :

- if the basic autocomplete query did not succeed : 
-- call libpostal
-- if libpostal found a street & house number : 
--- repeat first query, without the house number, and filters to return streets only
- call interpolation

This works well for my test cases (french streets), but it might not work worldwide. That's why I'd like to have your opinion on how to proceed :)

I have a few questions in mind:

  • when should we trigger interpolation on autocomplete ? In my scenario, I only trigger it if the basic autocomplete doesn't find any result. This should help solve the performance issue, but it might not be the best for the user.
  • you say libpostal doesn't support autocomplete yet. What's missing exactly ? in my test cases libpostal behaves well so I haven't really looked into that part yet.

@missinglink
Copy link
Member

missinglink commented Jan 29, 2019

hi @elsa-pato, sorry for the late reply.

I'd suggest looking in to your second point a bit more before you continue:

in my test cases libpostal behaves well so I haven't really looked into that part yet.

This hasn't been my experience, libpostal isn't designed to work with partially specified inputs.

Some basic examples:

http://localhost:4400/parse?address=Rue

[
  {
    "label": "city",
    "value": "rue"
  }
]
http://localhost:4400/parse?address=Champs-E

[
  {
    "label": "house",
    "value": "champs-e"
  }
]
http://localhost:4400/parse?address=Boulevard

[
  {
    "label": "suburb",
    "value": "boulevard"
  }
]
http://localhost:4400/parse?address=s

[
  {
    "label": "city_district",
    "value": "s"
  }
]

@missinglink
Copy link
Member

missinglink commented Jan 29, 2019

In a lot of cases it also struggles with fully specified street names:

http://localhost:4400/parse?address=L’Esplanade des Invalides

[
  {
    "label": "house",
    "value": "l'esplanade des invalides"
  }
]

It really wasn't designed to work for anything less than a full postal address, and really must have a city or region specified in the input to work correctly.

@elsa-pato
Copy link

elsa-pato commented Jan 29, 2019

Hi,
Thanks for your reply :)
The thing is that in this specific case, we are looking for a house number in order to interpolate, so the input address would look more like "76 rue ..." which works way better.

http://localhost:4400/parse?address=410%20Boulevard
[
  {
    "label": "house_number",
    "value": "410"
  },
  {
    "label": "road",
    "value": "boulevard"
  }
]
http://localhost:4400/parse?address=410%20Rue
[
  {
    "label": "house_number",
    "value": "410"
  },
  {
    "label": "road",
    "value": "rue"
  }
]
http://localhost:4400/parse?address=410%20s
[
  {
    "label": "house_number",
    "value": "410"
  },
  {
    "label": "road",
    "value": "s"
  }
]

But sure, it's not perfect yet..

http://localhost:4400/parse?address=410%20L%E2%80%99Esplanade%20des%20Invalides
[
  {
    "label": "house_number",
    "value": "410"
  },
  {
    "label": "house",
    "value": "l'esplanade des invalides"
  }
]

(this result is actually quite weird, as -as far as I know, and in France at least- an "esplanade" is rarely a house name, more like a square. But maybe in other countries it's different, I'm currently trying to get a planet build to run more tests)

http://localhost:4400/parse?address=410%20Champs-E
[
  {
    "label": "postcode",
    "value": "410"
  },
  {
    "label": "country",
    "value": "champs-e"
  }
]

ok, this one really fails :p

One more thing is, as I implemented it, we call libpostal & interpolate only if the standard autocomplete search didn't return any result ; so it usually means that the user wrote quite a precise address, which I guess helps libpostal, and might help autocomplete stay performant.

You can check what I did there https://gitlab.scity.coop/pelias-contrib/api/commit/d24f37121f4cb184b26c99934199c338cc8ddf56 (it's really just a quick & dirty solution to start with)

That said, I'll give a deeper look to libpostal and see what I can do :)

@missinglink
Copy link
Member

We are hoping to merge pelias/api#1287 soon which replaces the addressit parser with https://github.com/pelias/parser.
Once that work is complete it will be possible to tackle this issue and enable interpolation for autocomplete.

@missinglink
Copy link
Member

the core team are looking at this again right now.

I've spent some time making the interpolation service more performant, and it can now handle around 6k/s requests on a single thread, so it that should be adequate to handle the load.

The problem still remains the logic for when to call the interpolation service when in autocomplete mode. Since this issue was opened we've completely refactored the parsing logic for autocomplete to use our own parser, which may help make this problem a little easier.

@adelcasse did you manage to find a solution that worked for you? are you still interested in contributing to the discussion of how this might work?

@adelcasse
Copy link
Author

@missinglink sorry I didn't see your message before. We had something working on our side with libpostal as @elsa-pato described. We would need to look at it again but sure we're interested in contributing to the discussion if useful (I don't know if there were changes on that subject since 31 march).

@adelcasse
Copy link
Author

adelcasse commented Jun 4, 2020

@missinglink I've made tests with the "compare" tool (to see différences between your -geocode.earth- and our servers) and I see that your dev environment is "less strict" on housenumbers than your production one (and returns the street first when there is no matching housenumber) : is it the result of pelias/api#1432 or something else ? Is your dev environment code somewhere already ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants