Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store geometries for administrative areas #19

Open
riordan opened this issue Jan 11, 2016 · 10 comments
Open

Store geometries for administrative areas #19

riordan opened this issue Jan 11, 2016 · 10 comments

Comments

@riordan
Copy link

riordan commented Jan 11, 2016

First subset of #1: Storing geometries for admin areas in the document.

@riordan
Copy link
Author

riordan commented Jan 13, 2016

Still discussing ES vs disk vs S3

@trescube
Copy link
Contributor

Current decision is to store in S3 and provide link in /place search.

@riordan
Copy link
Author

riordan commented Jan 14, 2016

@thisisaaronland @heffergm: We're looking to re-serve the Who's on First geojson documents from S3 when someone requests the full details about a particular record that comes from Who's on First.

Today, all documents come from our Elasticsearch index on the /place endpoint, but we'd like to be able to pass along the complete, unadulterated WoF record, not just our version of it (also we'd rather store only the fields we use in ES).

What might that setup look like? What kind of implications would there be for folks looking to use Who's on First from their own setup if we do this?

@heffergm
Copy link

Are we talking about serving the WOF record as an API response, or are you talking about something as a convenience for people to go look at the original geojson?

I think you're going to end up having to reverse proxy the data from S3 via the API, it's all here:

https://s3.amazonaws.com/whosonfirst.mapzen.com/

cc @baldur

@riordan
Copy link
Author

riordan commented Jan 14, 2016

Likely as an API response (since we didn't opt to build a hypermedia-style
link structure into our v1), so we'll probably be reverse proxying.

I suppose if we're ok with non mapzenners requesting it directly from S3,
(and I dont' imagine it'll happen often) then having them point at the S3
bucket could do the trick.

On Thu, Jan 14, 2016 at 3:08 PM, Grant Heffernan notifications@github.com
wrote:

Are we talking about serving the WOF record as an API response, or are you
talking about something as a convenience for people to go look at the
original geojson?

I think you're going to end up having to reverse proxy the data from S3
via the API, it's all here:

https://s3.amazonaws.com/whosonfirst.mapzen.com/

cc @baldur https://github.com/baldur


Reply to this email directly or view it on GitHub
#19 (comment).

David Riordan | Product Manager - Search | dave.riordan@mapzen.com |
@riordan https://twitter.com/riordan | gpg 235D9DC95EF6277C
https://keybase.io/riordan
Mapzen | https://mapzen.com | @mapzen https://twitter.com/search

@dianashk
Copy link
Contributor

I think we need to maintain our response format which means we can't just serve the exact WOF record as-is. We'll need to fetch it from S3 and copy the parts we care about into our response object.

The alternative to S3 access from API is storing the WOF data locally and reading it from disk when needed. This could be a pain at deploy time because we'd need to copy all of WOF to each API server.

@heffergm
Copy link

S3 access from inside AWS is quite fast... I'd be inclined to at least suggest starting there, rather than deal with local clones of a large dataset just to deploy the api.

@dianashk
Copy link
Contributor

Yup, we're all on the same page then. The local clones of data would be plan B. Sounds like plan A is good, though. 🙌

@riordan
Copy link
Author

riordan commented Jan 20, 2016

Lets push this out of the milestone and approach it right afterwards.

@orangejulius
Copy link
Member

orangejulius commented Mar 5, 2018

Hi everyone!
A long overdue update here. For some time we considered this issue low priority, since the Mapzen Places API was serving Who's on First geometries already. Now of course Mapzen has shut down, so we should discuss serving geometries again.

Serving them only from the /v1/place endpoint still makes sense, as they can be quite large (100MB for New Zealand). My guess is they can be stored (as plain text) but not indexed in Elasticsearch via the whosonfirst importer. The geometries are probably 10s of GB for the whole world, so taking advantage of the scalability of Elasticsearch makes sense here.

The /v1/place endpoint would then query for Who's on First records directly by ID, as it does now, and efficiently return the geometry in the response.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants