-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
⚡ 🚸 Re-enable populating addresses in trips and make sure that they are used on the phone #937
Comments
So concretely:
After those are done, we can move to the phone changes. |
For the record, |
I have spent most of my time on this project so far reading and testing the functionality of I was initially unsure how to run the nominatim calls locally, since their documentation is not great. I was able to download a docker container from nominatim-docker that contains everything needed to run a localhost and make nominatim calls:
I then modified
per this exchange. When trying to run, I kept getting the following error:
I decided to look through old files since @JGreenlee showed me how (thank you!!!) and found out that I could just use |
I'm glad you figured this out - just to clarify:
so you didn't have to run the nominatim service locally - just make calls from your laptop to the free nominatim service ( wrt
Do you know why it didn't work? Hint: Look at the data that you downloaded into your container. |
Answer: because we loaded data from Monaco but were trying to access a location in Colorado.
|
our integration tests are currently in Let's first write the test and then see whether we want to use the secrets or not. we might want to run the tests against both geofabrik and OSM nominatim initially to see if they are consistent . If they stay consistent for a while, we could consider removing one. |
Right now, the URL for the nominatim service is in a config file ( We could also configure the file using an environment variable, but I think that doesn't make as much sense given that:
|
I decided to test the functionality of the docker containter that I downloaded. I started the docker container and modified the nominatim.json with this line:
I used a latitude and longitude in Monaco. I also modified nominatim.py with The output was as expected. So, if the correct dataset is downloaded, the docker container method with nominatim could be useful. |
Looked heavily into nominatim's documentation to see if we can query data from previous dates. Unfortunately, this is not possible for any call in nominatim. As an alternative solution, tried to use
Resulting outputs were very large, and require a large amount of resources just to potentially get the display name out. You cannot query against addresses, unless they are very specific, which would require a large amount of looping over data. Additionally, you cannot query against lat/long, as it wants an input of |
For 3 (a), we could actually use the overpy approach as a fallback if the location that we are lookup was older than a week or something. For 3 (b) it doesn't make sense to only test the overpy to get the value if we are going to continue using nominatim as the default. If we are going to use nomimatim, we should test nominatim. So for testing alone, I would suggest that we:
For docker:
For the overpy approach, I don't see what the issue is.
If we implement the overpass python backup option, we should test that as well. |
to run integration tests, you would:
So similar to the existing docker-compose but with the addition of the nominatim service. Couple of caveats to note are:
|
During our meeting, we wanted to test that the original docker-compose command was working locally, so that the very basics of setting up a new container would not be an issue. While running Line 9 ofrunAllTests.sh reads as follows:
This resulted in a python command not found error, because my machine uses python3. Changing the command to |
Locally, running the following block of code successfully downloads the OSM data from Northern California and creates a container named norcal-nominatim:
It also creates a volume, so that the container can be started and stopped while maintaining the data. This command line call is great for local testing, but our current digital server uses docker-compose to test. So, I converted the commands passed into docker run -it into commands for the docker-compose file:
This way, we can integrate running the nominatim server in tandem with our mongoDB, so that we can make calls to both at the same time. This might be necessary, depending on if the other integration tests will require calls to mongo. To run the tests, I added a directory to the integration tests called nominatimTests. I created a file called nominatim-docker-test.yml, that runs the test every Sunday at 4:05am. Most notably, it builds a Dockerfile in the integrationTests directory:
and runs docker-compose:
The Dockerfile runs a file called start_integration_tests.sh, which will initialize all of the integration tests. |
On my test run, the servers were initializing (though seemed to be dueling due to using the same port). I tried to query the local nomination server, and it did not work. I decided to do a quick sanity check and make sure that my goal was possible by following the procedure on the nomainatim-docker page for Monaco. I ran their docker run -it and called the server using their suggested command: I decided to try a local test using the Northern California container, but modify the common by including the optional wikipedia import, while removing the creation of the volume. This way, it would be more similar to the command used in the successful Monaco example. This increased the time required to complete the docker run -it exponentially (almost 45 minutes!), but will hopefully allow for a reverse geocoding call to succeed. |
This makes no sense. The whole point of running in docker is that it is isolated from what is on your machine. Even on your machine, if you use the version of python from the e-mission environment, you should not have to use |
So we tried to run the container again using It did appear to launch the tests, although we don't fully know why We execed into the container with
where is this code running from then? Are we mounting the source code of the test somewhere?
|
We tested out the docker-compose again and found that writing python3, while resolving the python command not found error initially, did not entirely resolve the problem.
The server is in
Running
|
@nataliejschultz Right, you need to activate
before running all tests |
@nataliejschultz your issues with the local docker-compose are almost certainly a configuration issue on your side. As you can see, the CI test is running just fine, and I can verify that the tests are launched properly when I try to run.
I would encourage you to clear your environment and re-test, following the instructions in the CI until you get this to work successfully. |
I tried a few different fixes, including running in my emission environment, but nothing was working to resolve the error
So, I added a single line: I noticed that the same single line of code is present in activate_tests.sh, which is executed after both setup_tests.sh and setup_conda.sh. Is there a reason that the environments are activated in the order that they are? Could activating the environment earlier cause problems down the line, or potentially be beneficial to avoid future configuration issues? |
Have spent the past hour trying to figure out why my reverse geocoding call wasn't working locally, but would work using nominatim directly. I was using and getting
Turns out that you need to put quotes around a call if it contains an ampersand while using curl. Finally succeeded with this:
Now wondering if using postman would be a good option, if it's allowed? |
Currently working on creating a fake trip in Rhode Island since I think the Northern California data will take too long to copy every time we test. Using the docker container method described in synthpop. |
I don't understand this comment. Where would we need to use postman? |
Why do we need to use synthpop? We don't need to simulate a synthetic population. And there is no US Census data for Monaco anyway. Just pass in lat/lon points known to be in Monaco/Rhode Island/... Note previous comment:
|
You should not have to make any code changes to resolve locally. If you do, you are not running the local system correctly.
Because we need to setup before we activate. And once we have setup, we don't need to activate over and over. You need to see where the python error is coming from. Again, I would suggest running the startup script in the docker container step by step so you can see where the error is coming from. Think of unix/HPC. A program that runs locally but not on the remote server is not useful. |
After our meeting yesterday, we found out that the Currently looking at how to address the comments on my commits last night in the PR for this issue. |
Haven't been logging progress very much as I was feeling very sick the past few days. Have been steadily working on addressing the comments, as well as issues I've encountered along the way. I spent time trying to understand how to get the environment variable for |
Testing locally from docker-compose.yml. I also modified the workflow file nominatim-docker-test.yml to run on push to see how far it would get into the workflow. After looking into the error:
I ran |
I had to add to the timeout a couple of times, but the Despite running properly with dockerize, the tests are still failing. I noticed that my method to get the place_id to be equal using unittest's
However, this is apparently a known error that might not be fixable. I might just have to change the assertion to expect a specific key (or multiple keys associated with that function). Additionally, test_get_json_geo appears to be returning an empty list: |
This has proven to be an issue. I have been trying to use
However, when using the emission kernel, or trying to call from within the environment inside the docker container, I get a 403 forbidden error. |
I would just delete the |
This is almost certainly due to our running afoul of nominatim's usage limits. We are not supposed to make too many calls. Given that it works on your laptop in the regular python3 but not in emission, I would think that this may just be the random ordering chance.
not really related to the python environment Couple of things here:
We will not actually use nominatim for this in production because that will definitely violate their terms. We should use the geofabrik API to test against instead. Note that in that case, you cannot hardcode the URL since it needs to embed our API key which we do not want to publish. Fortunately, we should already be using an environment variable to pass in the URL to call. So you can use an ennvironment variable in docker-compose for local testing and on GitHub actions, use GitHub actions secrets to pass in the variable with the API key embedded. |
The module we are trying to test in the pipeline only works on a place. It does not need a trip. If you run it and it fails because it doesn't have a trip, we have to think about it but we can keep that around as a negative test. But I don't think it needs a trip (90% certainty). So you create a fake place, and pass it in! The main difference between this and calling the module directly is that the module will return the result to you directly, but the pipeline stage will fill in the result into the place and return the place. So calling the pipeline stage is testing that the stage reads the lat/lon from the place and passes it to the module correctly and extracts the address correctly and fills in the response correctly. You should test with a blank data place, but expect it to fail |
I am back from my trip! I spent the first chunk of the day dealing with timesheet issues (very unfortunate timing with the end of the fiscal year), but am working on creating a fake place now. |
I created a fake place using the
And it outputs an Entry class that looks like this:
The function fills in the metadata per instructions in metadata.py's
The reverse geocoding is successful! |
I don't think you need
one of the issues you have been facing is that it takes a long time to start up the nominatim docker container. I know you have worked around it with timeouts but we still don't want to abuse the github actions by making it do a lot of work. Since you are creating the container with the same data load every time, you can use We can simply update the image every time we update the data file, roughly once a year. |
one test against the base nominatim API for community users who don't have a paid subscription (this should not trigger usage limits). Other tests against the geofabrik API which is the one we use, should be compatible with nominatim, and should not have usage limits. Depending on test stability we can decide how to proceed. |
I have been working on this for a very long time now, and have a lot of thoughts to share. I really wanted to get it done, but there are too many issues to deal with at the moment. The most important test is passing, as are the majority of the other 9 tests. I'm not supposed to work tomorrow according to my schedule, but I am very determined and want to get this done with so I can move on!! I'll get online in the morning when I can collect my thoughts properly. |
Here is the long update (pt 1): I removed the Workflow Woes: API Key Woes: I tried adding a few different environment variables to the test in nominatim-docker-test.yml:
Then, I attempted two different methods to access the secrets in my TestNominatim.py file:
The first method resulted in an error:
So I continued with the second method. It seems like the variables are not carried through to this file anyway, since my prints:
Yield the following result:
Additionally, I have not found any examples of other people using these supposed environment variables outside of their yml files. I think at this point I will have to use AWS secrets manager. |
Tests
It calls get_json_geo, which is supposed to pass in an address and return a list of a dictionary with nominatim data on that location. It might be failing because I skipped adding the wikipedia data to the container. However, I don't know if this test is completely necessary. The most important test, which is
This test is passing. Thoughts
|
|
|
Big Progress! I tried many different ways to pass the test API key through to TestNominatim.py. I eventually found out that I have to pass the variable into the docker-compose through the docker-compose command, which can be done like this:
Then, the variable has to be set again in the docker-compose (specifically in the web-server service in the environment section) so that it can be accessed by the web server when running the test:
I wanted to make sure that the secret was still being recognized as a secret, and not just a regular k/v. So, I added a print to TestNominatim.py:
Which yields the following in GitHub Actions:
So it appears that the secret is still protected, even after passing through multiple files. I am going to test with the real key and see if I can get that test to work! |
Indeed!
Great job digging through this and figuring out how these environment variables are passed through in GH actions. You should put that into your demo for others to learn! |
Currently,
This is strange, because the same function is called earlier in the program and not producing this error.
I am glad that this showed up, because it means that the test is working in the way that we had hoped by identifying a change in the way that nominatim displays outputs. I'm not sure if I should modify the test to ignore this part of the result since we are now aware of the change. |
It looks like the HTTP error was temporary, because the tests are finally PASSING!!! Moving the code to ready for review! |
Since the tests were passing, I decided to do the docker commit work. I created a docker hub account, built my container locally by running docker compose, created an image from the container using When I looked through the logs, actions appears to be going through the exact same process as before; essentially, it's still building the container from scratch:
It seems like pre-building the container and committing it as an image didn't make any difference. Despite still passing, the test took a minute longer. |
@nataliejschultz I don't see the tests passing in the PR. Can you add a link to the tests passing on your fork? I can then know that this is because I haven't configured the secret yet.
This is weird. Can you confirm that when you "built my container locally by running docker compose", you waited for the container to start up before using This may be because the endpoint downloads and builds without checking to see if the data has already been loaded. In that case, you should override the endpoint in the workflow file to only include the final "run" step (since the download + install is encapsulated in your committed image). |
Here is a link to a successful run.
I waited for docker compose to finish completely (ie for the tests to run locally), exited the container, and ran these commands in terminal:
I just tried creating the image off of a running container and pushing that (see image tag 3.0), and it is the exact same size as the 2.0 image. Here is what it says in my terminal when I created and pushed the images:
Once I created my new image, I modified the docker compose for the nominatim service to look like this:
The workflow file itself only runs the compose; are you suggesting I should modify the workflow file to run the image and start a container (if possible), or modify the nominatim service in the compose? |
modify the nominatim service in the compose. I used an admittedly imprecise description to try and align to what you had said before "I then modified the workflow file to pull my newly created image." in an attempt to make it easier to understand 😄 Will make sure to stay precise in the future! |
As should I! My bad. |
start.sh calls init.sh if the IMPORT_FINISHED file doesn't exist But it might just be that I can think of three ways to fix this:
|
I spent time rooting around in one of the images I created using I started by trying to run Starting over and working on a different image, I ran both Running my workflow outright in another test gave an error right away. I wasn't sure if this was going to happen, but I had my suspicions. Now that I have an image to use that has the correct data loaded, I believe I can change the entrypoint and get it to run the way we want! |
Okay, so it wasn't as simple as just changing the entrypoint. I have a suspicion that the file required to bypass ps: planned to get more done yesterday, but my home had a leak we had to address :/ working today to compensate. |
I've edited all of the tests as much as I can without further input from @shankari :) I'm really excited for this test to be done!! |
Today has been super busy!! I met with Abby about the Samsung phone and went over my many trips of different types that I've taken over the past few weeks. Also had the all-hands, during which I couldn't test locally. I've been addressing the comments in my PR, but haven't pushed changes yet. |
Okay, I think the URL call issue is finally fixed. It took me a long time to figure out how to get it right. Here's an example of the current workflow: Set environment variables for web-server service in docker-compose:
TestNominatim.py gets these variables:
I left them this way because the string of the URL is used in a few different tests. Nominatim module sets NOMINATIM_QUERY_URL to one of the URLs, based on how it's called:
Function calls one of the nominatim services with a specified URL(container):
Nominatim.py gets environment variable that was set in the test, and uses it in whatever function is called:
|
Right now, we don't fill in address information in the trips when we process them. Instead, the phone "fills in" the addresses when it receives the trips by making calls to OSM nominatim. This is annoying because we make a lot of potentially redundant calls to nominatim (although @JGreenlee) is caching some information on the phone, and it is also annoying for users because the addresses fill in one by one by one very slowly and it uses up people's data plans.
Instead, while we are filling in place information, we should fill in the addresses so that they will be part of the trip and used in multiple places without having to keep re-looking up the information. It will also have the benefit that we can know the address at the time the trip was taken, rather than the time the trip was viewed, in case there are changes to the map in the middle.
There is existing code to fill this in using nominatim on the server, but it runs into nominatim usage limits https://github.com/e-mission/e-mission-server/blob/8761f4afce635bc4cc7ff1732d9f00956cb5c4ad/emission/analysis/intake/cleaning/clean_and_resample.py#L250
However, we are in the process of signing up for a paid nominatim service. To prepare for that, we should:
The text was updated successfully, but these errors were encountered: