Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add back users.nearby column as a dummy #2543

Closed
migurski opened this issue Feb 18, 2020 · 26 comments
Closed

Add back users.nearby column as a dummy #2543

migurski opened this issue Feb 18, 2020 · 26 comments

Comments

@migurski
Copy link
Contributor

migurski commented Feb 18, 2020

We removed users.nearby in #2439, but this new API DB schema breaks the write capability of Osmosis. I opened a PR on Osmosis in openstreetmap/osmosis#54, where @simonpoole pointed out that Osmosis no longer maintained. @mmd-osm adds that some other process could be developed to populate an API DB, but there’s no one true path at this time.

Can we add back a dummy users.nearby column to support the current best import path until a new one is decided-upon? This issue duplicates parts of the discussion in #2449, which has been closed.

@tomhughes
Copy link
Member

Well I was going to reply to your other comment but as you've decided to duplicate that with this ticket I'll reply here instead...

My opinion is that we can't be held hostage by a package like osmosis that is essentially dead.

Loading data into an API database is not something most people ever have any need to do so I don't see this as an urgent problem to solve.

Making changes to the user table in the production database is also a bit of a pain so I try and avoid doing unnecessary changes there.

@tomhughes
Copy link
Member

Oh and you missed stateofthemap which is actually the slowest test now I think ;-)

@migurski
Copy link
Contributor Author

I agree it’s a niche need, but it’s worth making a special effort to support people modifying OSM website code and using DB extracts to try out their changes. We want more people to be able to contribute here and leaving Osmosis incompatible with the website API DB is going to be a major missing stair for this user population.

@tomhughes
Copy link
Member

The thing is that osmosis loading has never worked very well and tends to just lead to frustrated users which is why I try and discourage it.

@migurski
Copy link
Contributor Author

Sorry to hear you've been unsuccessful! It remains one of the only mainstream API DB write paths that I’m aware of, so it’s likely that we are stuck with it.

@mmd-osm
Copy link
Collaborator

mmd-osm commented Feb 18, 2020

Rewriting this part of osmosis based on libosmium could be a good GSoC project maybe? Needs someone with good c++ fu, though.

@migurski
Copy link
Contributor Author

Could be interesting! That, or providing a supported piece of code inside this repository to write to the API DB.

@mmd-osm
Copy link
Collaborator

mmd-osm commented Feb 18, 2020

We should also evaluate if the new flex backend in Osm2pgsql could support this. Then it would be a rather simple task of writing some lua code.

@tomhughes
Copy link
Member

Loading data into an API database is relatively easy if the database is empty - creating user records is the hardest thing to deal with.

Where it gets really problematic, and where things usually fell apart for people, is when they try and load data into a database that isn't empty...

@migurski
Copy link
Contributor Author

Makes sense, thanks! We’re still going to have a pretty core compatibility problem here in the meantime. What do we think about re-introducing users.nearby for Osmosis import compatibility, then prioritizing the creation of an in-repo supported way to seed a local website copy?

@tomhughes
Copy link
Member

You yourself described it as "niche" a few hours ago and now it has escalated to a "core compatibility problem" for some reason?

I have already explained why I don't particularly want to revert this.

I'm also not keen on adding a way to load data to this repository. I have no objection to a third party tool but adding it here will require us to maintain something that we're never going to use.

@mmd-osm
Copy link
Collaborator

mmd-osm commented Feb 18, 2020

My recommendation for the time being would be to introduce an additional column for the import via plain old Postgresql command line tools:

  1. On psql, execute alter table users add column nearby integer default 50;
  2. Run osmosis osm api db import
  3. Again on psql execute: alter table users drop column nearby;

That would allow you to run osmosis as is, and avoids any additional trouble for production. Also, we don't need any Rails db migration. Trying to keep this simple.

We could add this as a hint in https://github.com/openstreetmap/openstreetmap-website/blob/34dd2293db85e28b7e5df0889b0b778a685306bb/CONFIGURE.md#populating-the-database, mentioning that osmosis doesn't reflect the latest db schema changes.

@migurski
Copy link
Contributor Author

Sorry, I should have been clearer: this concern is core to a specific niche that we claim to care a lot about. There aren’t a lot of people who will encounter this issue as potential contributors to OSM-website but we want it to be painless for 100% of those people who do.

@grigory-rechistov
Copy link

I have actually hit exactly the same problem just now. The trick by @mmd-osm with a temporary addition of the missing column before invocation of osmosis helped. However, for some reason this manipulation (or something else happening at the same time) wiped information about the only OSM user I previously added to the DB (or at least prevented me to access it again). Which was not a big deal, as I created a new account.

It is very sad that osmosis is out of support and no replacement is on the horison. And I felt bad about other "ancient" command-line application I was still using (osmfilter, osmconvert, upload.py etc.) I thought that I should have switched to osmosisfor hopes of its better support in the future, but now I guess I'll stick to my rusty old guns.

A working method to import into DB via command line interface is important for OSM contributors (and for the OSM project in perspective), let me describe my collaboration use case.

I need ability to import data in OSM format into a private instance of OSM API database I've set up. I plan to share a bunch of vector features obtained from a CC0-licensed source with a number of contributors. On my private OSM API instance, these several people can collaborate on improving these features before uploading them to the public OSM.org database.

I am essentially recreating a WFS-like functionality in a form of an OSM API instance. Using the same JOSM editor and two alternating API URLs, a user does the following steps.

  1. For the same bounding box, user loads a chunk of vector data the from private API into layer 1, and from the OSM.org API into layer 2.
  2. User manually edits "new" data in layer 1, visually compares its correctness and merges individual map features into the layer 2 via JOSM "Merge" (key combination Ctrl-Shift-M by default).
  3. After all problems and conflicts are solved, the user uploads changes of layer 2 to the main OSM API.
  4. User deletes all migrated features from layer 1 and uploads this "removal" changeset back to the private API. This way the user informs other collaborators that this portion of import data is finished, so that they won't see already incorporated features any longer. This prevents everyone from double work and from adding duplicates of features to the main OSM database.

I hope my explanation is not overly complicated to grasp (it's a bit late here to think clearly...) There are a few remaining technicalities to solve (like negative IDs for "new" primitives etc.), but nothing unsolvable. osmosis is meant to help to perform the initial data import from an OSM XML into the private OSM API server.

@tomhughes
Copy link
Member

You can describe the complicated things you are doing as much as you like, but the scope of this project is to develop the www.openstreetmap.org web site.

Some people choose to take the code and run their own clones of www.openstreetmap.org and we do our best to help them where we can but at the end of the day the primary goal takes priority.

Trying to do bidirectional data migration between the main database and a mirror is definitely way out of scope and not something we would want to encourage at all.

@mmd-osm
Copy link
Collaborator

mmd-osm commented Feb 19, 2020

wiped information about the only OSM user I previously added to the DB

Probably an issue with colliding user ids. The documentation I mentioned assumes that you're only creating users after running osmosis. It even explicitly states: After installing but before creating any users or data, import an extract with Osmosis and the --write-apidb task.

What you describe sounds quite involved. You might want to check out RapiD editor for conflation: once a user copies over an element from an external vector source and uploads it to OSM, that feature would be automatically removed from the external data source without any need for user intervention. @migurski might be a good contact to talk to about this ;)

@grigory-rechistov
Copy link

@tomhughes, thanks for replying!

we do our best to help them where we can but at the end of the day the primary goal takes priority.

Surely I understand the main goal here and value your contributions to it.
Let's also not forget that, besides "regular" map consumers and its well established uses, there are "power" users a.k.a. developers/hackers and new ideas they are constantly trying. All innovation depends on them and their comfort with working over the project. This comfort depends on quality of tools they have at their disposal. Of course, they can always write their own tools (or apply their own hacks to the DB for that matter), but the speed of innovation and thus long-term survival of the project depends on them. Let's not forget about that while we cater to our main goals.

Trying to do bidirectional data migration between the main database and a mirror is definitely way out of scope

Data migration here is actually unidirectional:

open data source     →     OSM XML    →     private OSM DB      →      public OSM DB
                 extraction         osmosis                josm layers

What happens in the reverse direction is a cleanup to prevent double work. Essentially, primitives are "moved", not "copied" from private API to public API.

I go through all these troubles to try a new technique of collaboration over an open data source. Consider this in the context of the OSM project philosophy:

  1. The OSM project does not and will not support separate geospatial data layers with its API. Everything we have has to be in the single layer, and it immediately goes live.
  2. Data imports are prohibited unless they are well integrated/conflated with present DB contents. This basically means each and every new primitive (node, way, relation) has to be visually inspected/tuned by a person before signing it off for uploading. But to do such inspection, a person has to be able to compare "old" and "new" data, essentially working with two layers.

A single person can do the inspection locally in JOSM: the editor does support notion of layers. However, typical data imports are huge enough to be parallelizable among many persons. So, a distributed second data "layer" is needed for them to collaborate. Traditionally, this is done via ad-hoc measures like a shared folder with a bunch of smaller OSM files for everyone to grab, inspect and upload. But shared folders and other filesystem-based collaboration workflows suck immediately when amount of data grows. That is why databases were invented, and this is what I am trying to use here.

@grigory-rechistov
Copy link

The documentation I mentioned assumes that you're only creating users after running osmosis. It even explicitly states: After installing but before creating any users or data, import an extract with Osmosis and the --write-apidb task.

Ah, silly me. Thanks for pointing that out! Anyway, it was a minor hurdle compared to other stuff I had to deal with, given how n00b I am with Ruby and Postgres. I am still surprised I managed to bring the API server up, and it works!

You might want to check out RapiD editor for conflation: once a user copies over an element from an external vector source and uploads it to OSM, that feature would be automatically removed from

Yes, this is exactly what I plan to achieve. Having the "removal" phase automated is what I currently lack. I thought that RapiD was primarily meant to import AI-traced roads, but maybe I should give it a deeper look. Thanks!

@gravitystorm
Copy link
Collaborator

Data migration here is actually unidirectional:

@grigory-rechistov I think we've wandered a long way from the topic here. It's enough to know that you've used the osmosis apidb import, and that your use-case (as explained already by @tomhughes ) isn't one that we explicitly support. Beyond that, I'm afraid the rest of the details of your setup and future plans are off-topic.

@gravitystorm
Copy link
Collaborator

My opinion is that we can't be held hostage by a package like osmosis that is essentially dead.

Loading data into an API database is not something most people ever have any need to do so I don't see this as an urgent problem to solve.

The thing is that osmosis loading has never worked very well and tends to just lead to frustrated users which is why I try and discourage it.

It remains one of the only mainstream API DB write paths that I’m aware of, so it’s likely that we are stuck with it.

I agree with all the above statements, which is frustrating since there's no obvious way forward. My magic wand (if I had one) would conjure up someone to spin an osmosis 0.47.1 release with just the patch in openstreetmap/osmosis#54 applied and then we can move on.

I really don't want to add an unused column to our database just to satisfy a broken client that should never have been using this column in the first place. The fact that it is unmaintained suggests that the correct approach is going to be to either find a new maintainer or find replacement software. And osmosis has never really worked very well for this anyway, due to both conflicts with existing data, and leaving sequences broken, and all that kind of stuff.

Given the existing need to run "fixup" SQL commands for the sequences when running osmosis, I think we should just update the documentation to add in the additional SQL commands (along with clearer warnings about the downsides, like messing up dbs with existing geodata, user accounts or both) and move on.

That, or providing a supported piece of code inside this repository to write to the API DB.

The problem is that adding additional data to the db is simple only when the db is already empty. Any other situation becomes complex to the point of being a significant task. I suspect you need the user to decide what to do when e.g. node_id 10,003 already exists (overwrite it? or create a new node, and rewrite all references in the current import), or when user_id = 2 is already in the database - use this user for all the related data in the import, or create a new user for stuff coming from the import, and so on and so on. All these decisions have big implications for what the user is trying to do, when anyone wants to use it beyond just filling an empty db. So this suggests a non-trivial piece of software with many options, tests and complexity, that's not really suitable as a script in this repo.

@migurski
Copy link
Contributor Author

Bummer, but okay.

What do you think would be required to release an Osmosis 0.47.1? I’m not current on the relationship that we may have had with distributors to get past versions out.

@mmd-osm
Copy link
Collaborator

mmd-osm commented Feb 19, 2020

Last sign of life was in October 2018: https://lists.openstreetmap.org/pipermail/osmosis-dev/2018-October/001847.html

If anybody wants to take on release management duties, let me know, and I'll help them get access to some of the key infrastructure such as the dev server for hosting distribution binaries, Maven Central artefact upload process, and GitHub git repository.

Nobody volunteered, meaning it's dead now.

@caduguedess
Copy link

My recommendation for the time being would be to introduce an additional column for the import via plain old Postgresql command line tools:

On psql, execute alter table users add column nearby integer default 50;
Run osmosis osm api db import
Again on psql execute: alter table users drop column nearby;

That would allow you to run osmosis as is, and avoids any additional trouble for production. Also, we don't need any Rails db migration. Trying to keep this simple.
We could add this as a hint in https://github.com/openstreetmap/openstreetmap-website/blob/34dd2293db85e28b7e5df0889b0b778a685306bb/CONFIGURE.md#populating-the-database, mentioning that osmosis doesn't reflect the latest db schema changes.

I've done this and didn't figure out the error "Unable to insert user with id = -1 into the database." when import a geofabrik data into local osm website database.

@mmd-osm
Copy link
Collaborator

mmd-osm commented Mar 15, 2020

Please try again with a Geofabrik extract that includes valid metadata (called "internal" on their website). You need to log on with your OSM user account to download this file.

By the way, this has really nothing to do with the issue here, please create a new issue next time.

@caduguedess
Copy link

It was an issue with the version. I just installed the latest one and everything ran smoothly.

@migurski
Copy link
Contributor Author

As of version 0.47.1, Osmosis now correctly omits the users.nearby column: https://github.com/openstreetmap/osmosis/releases/tag/0.47.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants