Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add scripts/get-shapefiles.py #2137

Merged
merged 1 commit into from
Nov 29, 2016

Conversation

Ircama
Copy link
Contributor

@Ircama Ircama commented May 30, 2016

[Edited summary for this PR, left out on my first push.
Edited the command line options to reflect the merged commit]

The aim of this PR is to replace get-shapefiles.sh with a Python version named get-shapefiles.py and positioned in the scripts directory of openstreetmap-carto.

It generates and populates the data directory with all needed shapefiles, including indexing them through shapeindex. Related features are the same as the original shell script, with the addition of some command line options.

The code exploits an internal dictionary for easy adaptation to future changes, supports Windows, UNIX, Python 2.7, Python 3 and curl when installed (even if curl is not required, for more comfortable Windows support). Installation of shapeindex is a prerequisite to index files, but is not strictly required if -s option is used (old indexes are not automatically removed; it is better to manually remove indexes from the subfolders when updating the data directory without updating indexes).

The script can be run from the scripts directory of openstreetmap-carto, or from its base folder, or through absolute path from any directory. It should allow a default Web proxy and also the configuration of the HTTP_PROXY environment variable. With Windows, PAC and NTLM authentication are not supported.

Documentation has been updated accordingly.

Resolves #43
Resolves #66
Resolves #1233

Basic invocation: scripts/get-shapefiles.py

Command-line options: get-shapefiles.py [-h] [-c] [-d <directory name>] [-e] [-f] [-l] [-n]
                         [-p] [-r] [-s] [-u] [--world-boundaries]
                         [--simplified-land] [--ne-admin] [--land-polygons]
                         [--icesheet-polygons] [--icesheet-outlines]

optional arguments:
  -h, --help            show this help message and exit
  -c, --check           check whether the 'data' directory already exists
  -d <directory name>, --directory <directory name>
                        set the name of the data directory (default: 'data')
  -e, --no-extract      do not populate target directories with the expansion
                        of downloaded data
  -f, --force           force continuing even if project.mml does not exist
  -l, --no-curl         do not use 'curl' even if available
  -n, --no-download     do not download archive if already existing locally
  -p, --pause           pause before starting
  -r, --remove          remove each downloaded archive after its expansion
  -s, --no-shape        do not run shapeindex
  -u, --update          force downloading files even if not newer than the
                        locally existing ones
  --world-boundaries    only process world_boundaries
  --simplified-land     only process simplified-land-polygons-complete-3857
  --ne-admin            only process ne_110m_admin_0_boundary_lines_land
  --land-polygons       only process land-polygons-split-3857
  --icesheet-polygons   only process antarctica-icesheet-polygons-3857
  --icesheet-outlines   only process antarctica-icesheet-outlines-3857

@Ircama Ircama mentioned this pull request May 30, 2016

print "\nChecking shapeindex version:"
try:
err = subprocess.call("bin/shapeindex -V", shell=False)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We aren't creating a bin directory, so this should just be shapeindex

@pnorman
Copy link
Collaborator

pnorman commented May 30, 2016

I wouldn't mind replacing get-shapefiles.sh with a python version, but not adding a second version which is to do the same thing. Switching to a python version would make it easier to download on Windows.

@HolgerJeromin
Copy link
Contributor

It would be nice to have a script which is 2.7 and 3 compatible.
But this can be done later.
http://python3porting.com/differences.html seems to be a good resource for that (But my last porting was a year ago).

@Ircama
Copy link
Contributor Author

Ircama commented May 31, 2016

@pnorman: In fact I am trying to manage implications of contributing to openstreetmap-carto through a Windows installation of Tilemill. One aspect that needed rewriting was get-shapefiles.sh

@pnorman
Copy link
Collaborator

pnorman commented Jun 1, 2016

One of the important features of get-shapefiles.sh is you can re-run it to update

@Ircama
Copy link
Contributor Author

Ircama commented Jun 2, 2016

Added command line arguments.

Run it with scripts\get-shapefiles.py

To allow re-run it to update, use get-shapefiles.py -qu
To get help, get-shapefiles.py -h

usage: get-shapefiles.py [-h] [-q] [-u] [-n] [-e] [-s] [-1] [-2] [-3] [-4]
                         [-5] [-6]

This script generates and populates the 'data' directory with all needed
shapefiles.

optional arguments:
  -h, --help        show this help message and exit
  -q, --quiet       don't print status messages to stdout
  -u, --update      allow updating an already existing 'data' directory
  -n, --nodownload  do not download files if already existing locally
  -e, --noextract   do not populate target directories with downloaded data
  -s, --noshape     do not run shapeindex
  -1                only process world_boundaries
  -2                only process simplified-water-polygons-complete-3857
  -3                only process ne_110m_admin_0_boundary_lines_land
  -4                only process water-polygons-split-3857
  -5                only process antarctica-icesheet-polygons-3857
  -6                only process antarctica-icesheet-outlines-3857

@gravitystorm
Copy link
Owner

How hard is it to make the script do the following:

  • Work on both python 2 and 3
  • Only use modules from the standard library

Also, I can see there is inconsistencies in both variable naming (all lower (nodownload), snakecase (update_mode), other (Owps)) and indentation. I'm not a python programmer but best to go with whatever the python conventions are.

Finally, if this is a replacement for get-shapefiles.sh, then the PR should include removing the old version.

@Ircama
Copy link
Contributor Author

Ircama commented Jun 6, 2016

@gravitystorm: OK, I'll try in the next days

help="do not populate target directories with downloaded data")
parser.add_argument('-s', "--noshape", dest='noshape', action='store_true',
help="do not run shapeindex")
parser.add_argument('-1', dest='Owb', action='store_true',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

variables like --world-boundaries, --simplified-water, --ne-admin, etc would be easier to remember than numerics

@Ircama Ircama force-pushed the topo-extension-get-shapefiles branch from 121d4af to 4f73b92 Compare July 1, 2016 09:35
if args.option_wps:
subprocess.call("shapeindex --shape_files data/antarctica-icesheet-polygons-3857/icesheet_polygons.shp")
print()
if args.option_aip:
Copy link

@tilmanb tilmanb Jul 1, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

option_aip controls the call to icesheet-outlines.
option_aio (the next one) controls the ne110m admin boundaries.
I guess this is not intended.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tilmanb: Thanks, I will correct this and will also change the water acronyms to land

@pnorman
Copy link
Collaborator

pnorman commented Jul 2, 2016

I found the following issues running it

  • The documentation needs updating to refer to the new name
  • The script needs mode +x

I get the error

$ scripts/get-shapefiles.py

Shapeindex is found and has
   Error. [Errno 2] No such file or directory
  • Updating should be the default and not require confirmation
  • When I run the script with scripts/get-shapefiles.py -su and rerun it, it downloads all the files again. It should only download those which have changed. This takes about 15 seconds for me. Curl uses the -z option for this.
  • The status update is substantially less useful than curl's. This isn't essential, but it'd be nice.

Code and data are intermixed and there's a lot of repeated similar code. The shell script has this too, but it's only 86 lines. I suggest moving the definitions of the files and their locations to the top in an array, dict, or other structure and having code which loops over that.

@Ircama
Copy link
Contributor Author

Ircama commented Jul 2, 2016

@pnorman: thanks for the feedbacks, I will setup a UNIX VM and test it there.

OK for the updates you asked.

I did not use curl to simplify the Windows setup.

The documentation needs updating to refer to the new name

Would you refer to the Scripted download section in INSTALL.md or also something else?

@pnorman
Copy link
Collaborator

pnorman commented Jul 2, 2016

Would you refer to the Scripted download section in INSTALL.md or also something else?

Anywhere that get-shapefiles.sh is mentioned.

@matthijsmelissen
Copy link
Collaborator

@Ircama Is this ready for review?

@Ircama
Copy link
Contributor Author

Ircama commented Jul 25, 2016

@math1985 yes, thanks.

@matthijsmelissen
Copy link
Collaborator

@pnorman Do you have time to review this?

@pnorman
Copy link
Collaborator

pnorman commented Jul 25, 2016

Yes, later this week.

@Ircama Ircama force-pushed the topo-extension-get-shapefiles branch 2 times, most recently from 0f156aa to 5cdc3b9 Compare August 22, 2016 00:12
@Ircama Ircama mentioned this pull request Aug 28, 2016
@Ircama
Copy link
Contributor Author

Ircama commented Sep 1, 2016

Memo for me: there is some change to be done to comply to PEP8 style guide. Will do ASAP with a new commit. Also a documentation typo needs fixing.

@pnorman
Copy link
Collaborator

pnorman commented Sep 2, 2016

I still need to do a code review on this.

@Ircama Ircama force-pushed the topo-extension-get-shapefiles branch from 5cdc3b9 to c0ae573 Compare September 7, 2016 22:33
@Ircama
Copy link
Contributor Author

Ircama commented Sep 8, 2016

[Unwanted push, just rebased by now]

@Ircama Ircama force-pushed the topo-extension-get-shapefiles branch from c0ae573 to ca7277f Compare September 8, 2016 21:43
@Ircama Ircama changed the title Added scripts\get-shapefiles.py Add scripts/get-shapefiles.py Sep 8, 2016
@Ircama
Copy link
Contributor Author

Ircama commented Sep 8, 2016

@pnorman: code now ready for review, thanks

Ircama added a commit to Ircama/openstreetmap-carto that referenced this pull request Sep 17, 2016
Additional comments to get-shapefiles.py, moved from gravitystorm#2137

The aim of this PR is to propose a documentation for the scripts that are available in the *scripts* directory.

It updates the already existing README.md with some currently available information.

Notice that get-shapefiles.py is already mentioned here.
@Ircama
Copy link
Contributor Author

Ircama commented Sep 17, 2016

Rebased

Ircama added a commit to Ircama/openstreetmap-carto that referenced this pull request Sep 17, 2016
Fixes gravitystorm#2333 (comment)
reported by @kocio-pl (thanks!)

Rebased with the previous commit:

Add a description for all the scripts included in OpenStreetMap Carto

Additional comments to get-shapefiles.py, moved from gravitystorm#2137

The aim of this PR is to propose a documentation for the scripts that are available in the *scripts* directory.

It updates the already existing README.md with some currently available information.

Notice that get-shapefiles.py is already mentioned here.
@Ircama
Copy link
Contributor Author

Ircama commented Sep 21, 2016

Resolves #43

@kocio-pl
Copy link
Collaborator

kocio-pl commented Sep 24, 2016

I guess this script resolves also #66 and #1233?

@Ircama Ircama force-pushed the topo-extension-get-shapefiles branch from b8a49c8 to ba2a700 Compare September 25, 2016 08:06
@Ircama
Copy link
Contributor Author

Ircama commented Sep 25, 2016

Rebased so that it can be easily reviewed/merged

@Ircama Ircama force-pushed the topo-extension-get-shapefiles branch from ba2a700 to 45e92aa Compare October 14, 2016 23:05
@Ircama
Copy link
Contributor Author

Ircama commented Oct 14, 2016

Newly rebased

Resolves #66
Resolves #1233

@HolgerJeromin
Copy link
Contributor

If you place the "resolves" text in the first message of this PR the issues will be closed automatically.

@kocio-pl
Copy link
Collaborator

I think you should also delete get-shapefiles.sh script in this PR.

@Ircama
Copy link
Contributor Author

Ircama commented Oct 15, 2016

If you place the "resolves" text in the first message of this PR the issues will be closed automatically.

@HolgerJeromin, @kocio-pl: thanks, I thought that any place within the PR for a comment was enough for the auto closure.

I think you should also delete get-shapefiles.sh script in this PR.

Yes, thanks a lot. I have just fixed it with a new commit.

@Ircama Ircama force-pushed the topo-extension-get-shapefiles branch from 372345e to b9cdc68 Compare October 30, 2016 19:09
@kocio-pl
Copy link
Collaborator

@pnorman Any problems left with this PR?

@pnorman
Copy link
Collaborator

pnorman commented Nov 14, 2016

@pnorman Any problems left with this PR?

the -1 to -6 short options.

Aside from that, it seems to work.

I don't like the gain in code complexity, but it's probably worth it for windows support in the download script.

@Ircama
Copy link
Contributor Author

Ircama commented Nov 14, 2016

@pnorman - OK no problem; would you like to have the short options removed (keeping the long ones), or both the short and long options removed (with reference to -1 to -6)?

Just one note, as I see many sites mentioning the bash script, I’d like to double check that its removal is wanted.

@pnorman
Copy link
Collaborator

pnorman commented Nov 14, 2016

@pnorman - OK no problem; would you like to have the short options removed (keeping the long ones), or both the short and long options removed (with reference to -1 to -6)?

The long options are good to have.

Just one note, as I see many sites mentioning the bash script, I’d like to double check that its removal is wanted.

We don't want to maintain two download scripts.

@nebulon42 nebulon42 self-assigned this Nov 19, 2016
@nebulon42
Copy link
Contributor

I'm currently having problems with shapeindex on my Ubuntu 16.04. It tries to load the shared libraries of Ubuntu 14.04. So I can't test this right now.

I also think the short options -1 .. -6 are not the best choice.

@nebulon42 nebulon42 removed their assignment Nov 19, 2016
@kocio-pl
Copy link
Collaborator

@Ircama Double check is here, so do you plan to remove the short options? I would like to merge this PR soon, if possible.

- Removed short options -1 .. -6
- Rebased

Add scripts/get-shapefiles.py

Previous changes:
- Removed unneeded information from the documentation
- Compliance to PEP8 style guide, typo correction and some revision
- Coding revised to fully comply to PEP8 style guide
- A documentation typo has been fixed
- Additional documentation provided
- Some minor code improvements
- code compacted and exploiting a dictionary
- added execution flag
- tested with Windows, Ubuntu and Cygwin
- file dates are checked before downloading new files
- curl is used if available
- return codes are checked
- documentation updated
- now update is the default and does not require confirmation
- code improvements

$ ./get-shapefiles.py -h
usage: get-shapefiles.py [-h] [-c] [-d <directory name>] [-e] [-f] [-l] [-n]
                         [-p] [-r] [-s] [-u] [--world-boundaries]
                         [--simplified-land] [--ne-admin] [--land-polygons]
                         [--icesheet-polygons] [--icesheet-outlines]

optional arguments:
  -h, --help            show this help message and exit
  -c, --check           check whether the 'data' directory already exists
  -d <directory name>, --directory <directory name>
                        set the name of the data directory (default: 'data')
  -e, --no-extract      do not populate target directories with the expansion
                        of downloaded data
  -f, --force           force continuing even if project.yaml does not exist
  -l, --no-curl         do not use 'curl' even if available
  -n, --no-download     do not download archive if already existing locally
  -p, --pause           pause before starting
  -r, --remove          remove each downloaded archive after its expansion
  -s, --no-shape        do not run shapeindex
  -u, --update          force downloading files even if not newer than the
                        locally existing ones
  --world-boundaries    only process world_boundaries
  --simplified-land     only process simplified-land-polygons-complete-3857
  --ne-admin            only process ne_110m_admin_0_boundary_lines_land
  --land-polygons       only process land-polygons-split-3857
  --icesheet-polygons   only process antarctica-icesheet-polygons-3857
  --icesheet-outlines   only process antarctica-icesheet-outlines-3857

This script generates and populates the 'data' directory with all needed
shapefiles, including indexing them through shapeindex.
@Ircama Ircama force-pushed the topo-extension-get-shapefiles branch from b9cdc68 to 3fd7ddf Compare November 29, 2016 00:24
@Ircama
Copy link
Contributor Author

Ircama commented Nov 29, 2016

Code updated as requested.

I meanwhile noticed that, when using -s to disable shapeindex (e.g., useful in case shapeindex is not properly installed), old indexes are not automatically removed. I did not have time to add this purge operation in the code now (and perform again all testing with all OSs); could be part of a future update. By now, it is better to manually remove indexes from the subfolders when updating the data directory without updating indexes. (Notice that running shapeindex is the default mode and also the suggested one.)

@kocio-pl kocio-pl merged commit e77111f into gravitystorm:master Nov 29, 2016
@kocio-pl
Copy link
Collaborator

I just made some basic testing with Python 2/3 and shapeindex/--no-shape under Linux and it works.

I think that 6 months for one script is long enough and fine tuning can be done later. In the meantime we lack portability and the code waiting too long in the queue may need to be keep in sync, which is additional problem. Thanks for your code and patience, @Ircama!

@Ircama Ircama deleted the topo-extension-get-shapefiles branch December 15, 2016 15:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants