Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Task: Script that imports oneway-data dump and makes it available via a web API #1022

Closed
westnordost opened this issue Apr 14, 2018 · 21 comments
Labels
help wanted help by contributors is appreciated; might be a good first contribution for first-timers

Comments

@westnordost
Copy link
Member

westnordost commented Apr 14, 2018

I could need help with something that is not a part of this app (=not Java code). Anyone interested?

Introduction

Amongst other things, Telenav collects and aggregates data about the traffic flow direction of likely oneway roads that have not been tagged as oneway=yes in OSM yet. This data is available as a daily worldwide dump on http://missingroads.skobbler.net/dumps/OneWays/

This data can be used for the oneway-quest (#370). However, it must be made available to the app in form of a web API which can run on my webspace (Python, PHP, Ruby, Perl, perhaps more).

Mission

Your task is to write a script that

  1. downloads and imports the OneWays data dump into a MySQL database daily
  2. makes available a web API to be queried by StreetComplete to get the data within a certain bounding box

In detail:

1. Download and Import

  • The SQL table into which the data is imported needs to have the following rows: wayId, fromNodeId, toNodeId, latitude, longitude. Latitude and longitude should be the centroid of the given LINESTRING geometry. Only those rows should be imported which have a status of OPEN. Whether numberOfTrips should play a role there is TBD.
  • On each import, the previous data must be completely overwritten with the new data. During import, it should be avoided that queries happening at the same time end in an error or spit out wrong data. (Possible solution: import the new data into a new SQL table, after that delete the old table and move the new table into its place)

2. Web-API

  • If the query is an invalid bounding box, a HTTP error should be returned
  • On success, a query should ideally return a json like this. If there is no data in the given bounding box, the segments array should simply be empty. It can also be a CSV instead of a JSON if you feel this would make more sense.
{
  "segments":[
    {"wayId":1, "fromNodeId":7, "toNodeId":8},
    {"wayId":1, "fromNodeId":10, "toNodeId":12},
    {"wayId":2, "fromNodeId":23, "toNodeId":42}
  ]
}

Resources

Distance between two (on assumed spherical Earth) geo-points in meters, necessary for centroid calculation:

// see https://en.wikipedia.org/wiki/Earth_radius#Mean_radius
final double EARTH_RADIUS = 6371000; //m
// see https://en.wikipedia.org/wiki/Great-circle_navigation#cite_note-2
double distanceInMeters(double φ1, double λ1, double φ2, double λ2) // φ = latitude, λ = longitude
{
	double Δλ = λ2 - λ1;

	double y = sqrt(sqr(cos(φ2)*sin(Δλ)) + sqr(cos(φ1)*sin(φ2) - sin(φ1)*cos(φ2)*cos(Δλ)));
	double x = sin(φ1)*sin(φ2) + cos(φ1)*cos(φ2)*cos(Δλ);
	return EARTH_RADIUS * atan2(y, x);
}
@westnordost westnordost added the help wanted help by contributors is appreciated; might be a good first contribution for first-timers label Apr 14, 2018
@ENT8R
Copy link
Contributor

ENT8R commented Apr 14, 2018

Sound like a cool project! Is there a programming language which is preferred by you?

@westnordost
Copy link
Member Author

The only requirement is that it runs on my webspace, the language is the implementer's choice.

@exploide
Copy link
Member

exploide commented Apr 14, 2018

@ENT8R does this mean you picked this task?

Otherwise, three questions for @westnordost due to the webspace capabilities:

  • Can you configure something like cron jobs? Populating the database should be independent from the web API, because triggering long running tasks from a web server might cause problems such as timeouts. I would separate both scripts.
  • Supports your webspace serving WSGI applications (for Python and other) or would it need to be a CGI interface?
  • Can your main webserver (Apache/nginx) be a reverse proxy for the API application? (Otherwise it would need to handle TLS and other stuff itself, which would be more cumbersome.)

(If you won't like to answer everything in detail, feel free to point me to your hoster's documentation.)

@ENT8R
Copy link
Contributor

ENT8R commented Apr 14, 2018

@ENT8R does this mean you picked this task?

Actually I started some experiments and I can now generate already a CSV file with only the ways with the status OPEN with only 30 lines of code... So I will do some more steps and then decide if I bring this to an end but probably this should be relatively easy-to-do...

@westnordost
Copy link
Member Author

westnordost commented Apr 14, 2018

Can you configure something like cron jobs?

Yes

I can't answer the other questions and I wouldn't know how to find this out. The hoster is wint.global, managed with Plesk.

@exploide
Copy link
Member

Wow, they have pretty bad documentation. At least I found nearly nothing. If I would do it with Python, I would ask for your assistance to find out which deployment possibilities they offer. However, since @ENT8R is doing it right now (and I assume with PHP) this is no longer important for me. Thanks anyway.

@westnordost
Copy link
Member Author

I am not sure if he is doing it, he has still a couple of other things open.

@exploide
Copy link
Member

That was the reason why I asked for a commitment to the task. I don't like duplicating effort and while the data crunching and web stuff is fun and pretty straight forward, I would need to read one or two Wikipedia articles to get that LINESTRING centroid right.

I prefer to let him do the steps he wanted to do, and wait for his decision. If he won't, I can give it a try. :)

@westnordost
Copy link
Member Author

I would need to read one or two Wikipedia articles to get that LINESTRING centroid right.

Here is a hint.
Though, the algorithm could be improved by not calculating the total length beforehand and instead iterate from the start and end of the list at the same time

@ENT8R
Copy link
Contributor

ENT8R commented Apr 15, 2018

I think I can show you a first working example in a few hours. I will tell if the code is available on Github.

@ENT8R
Copy link
Contributor

ENT8R commented Apr 15, 2018

Alright. The very first version is now available on Github: https://github.com/ENT8R/oneway-data-api It is completely written in PHP
If you want to update the list just call the endpoint /update.php
To get the data for a specific bounding box (e.g. Cape Town):
/get.php?bbox=18.3072,-34.3583,19.0053,-33.4713

The main problem is currently that the data is not imported into a SQL database but only saved in a file which needs to be accessed everytime a request is made. @exploide Do you have any experience with SQL databases?

@exploide
Copy link
Member

exploide commented Apr 15, 2018

Yes, I have. Feel free to tell me how I can help.

Maybe I have time to review later. Hopefully it's not too complex due to the external dependencies you made use of. My PHP is a bit rusty, but I will see :P

The first thing I immediately spot is that your unprotected /update endpoint might be used to cause unnecessary load on the server. Additionally, when it will take a longer time, e.g. the data becomes larger or database operations slow this down, then it may fail due to HTTP timeouts. I still propose making this only available offline and triggered by a cron job.

@rugk
Copy link
Contributor

rugk commented Apr 15, 2018

endpoint might be used to cause unnecessary load on the server

So you want to introduce an API key? IMHO a good idea.

@exploide
Copy link
Member

exploide commented Apr 15, 2018

No, I want to have this offline xD Working on it now. If @ENT8R or @westnordost really think they want the update functionality online, then I stop this and you can go with it, but I don't know why this would be useful. Who should have a legitimate reason to trigger this update from the outside? A simple cron job invoking the php script and it's up to date everyday...

@rugk
Copy link
Contributor

rugk commented Apr 15, 2018

Ah, yes, of course, it may be internally done by a cron job. Anyway as for external connections, one could introduce an API key anyway. Could be useful to limit the server load.

@exploide
Copy link
Member

I just submitted a WIP PR there. Maybe it's good to continue discussion there and in the other issue tracker, so less emails and notifications are emitted to the main project here.

@ENT8R
Copy link
Contributor

ENT8R commented Apr 15, 2018

From my point of view we have now a great working tool/API. If you (@westnordost) have some more suggestions feel free to open an issue in the other repo.

@exploide
Copy link
Member

exploide commented Apr 15, 2018

Fine. @ENT8R are there further construction sites you are aware of? Otherwise we can let @westnordost shoot a glance.

I think one could see if either the geophp library or the own geometry code could be removed, but I leave this up to you. Regarding the bounds checks, I also only adapted what you did there to the MySQL query. JSON looked like yours before, but a second round of sanity checking would be good.

After I spend the last years in the ORM world, raw SQL looked ugly to me, but for such a small page, I decided to not make use of an additional ORM library and keep the dependencies clean and easy.

So from my point of view, we are pretty much done?!

EDIT: ok, you commented just in the same moment :P Maybe update your testing site to the DB enabled version?!

@westnordost
Copy link
Member Author

I shot a glance

@westnordost
Copy link
Member Author

:shipit:
I think this project is finished. Thank you @ENT8R and thank you @exploide :-)

Everything seems to work now https://www.westnordost.de/streetcomplete/oneway-data-api/?bbox=18,-34,19,-33

@mnalis
Copy link
Member

mnalis commented Jul 7, 2024

(upstream ImproveOSM support shutdown - now to be removed, ref.: #5725)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted help by contributors is appreciated; might be a good first contribution for first-timers
Projects
None yet
Development

No branches or pull requests

5 participants