Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strip long names from boundaries. #1700

Merged
merged 3 commits into from
Nov 6, 2018

Conversation

zerebubuth
Copy link
Member

This strips names off boundaries between zooms 8 and 10 when they're too long to be reasonably rendered in the tile, based on some assumption about the scale at which we'll be rendering text. This can help remove many names on very short segments of boundary that we'd never have been able to render anyway.

Additionally, merges boundary lines. We weren't doing this previously, so would have a few fragmented boundaries which were duplicating properties and inflating feature count.

Some quick tests show we're reducing GeoJSON boundaries layer size by around 23% on tile 8/134/89 and MVT boundaries layer size by 9% (because MVT is already deduplicating the name values there's less opportunity for us to save.)

One observation: Many of the names which are using a lot of bytes are not long in terms of number of rendered characters, but the characters take up more space encoded as UTF-8. It would be interesting to see how many of these we could compress away as transliterations, but that would require client-side support.

Connects to #1683.

This strips names off boundaries between zooms 8 and 10 when they're too long to be reasonably rendered in the tile, based on some assumption about the scale at which we'll be rendering text. This can help remove many names on very short segments of boundary that we'd never have been able to render anyway.

Additionally, merges boundary lines. We weren't doing this previously, so would have a few fragmented boundaries which were duplicating properties and inflating feature count.
@@ -16,9 +16,10 @@ def test_state_boundary(self):
'https://www.openstreetmap.org/relation/61320',
], clip=self.tile_bbox(9, 150, 192, padding=2))

# NOTE: might not have an ID if it has been merged
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 thanks for adding this note!


self.assert_has_feature(
8, 133, 89, 'boundaries',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should still keep the zoom 8 test but for kind: region only as that proves we are taking the OSM data correctly at zoom 8.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. Fixed in fb5565c.

queries.yaml Outdated
source_layer: boundaries
start_zoom: 8
end_zoom: 11
factor: 11
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add comment about factor, what it does, and it's units.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comment and changed to a more descriptive name in a75fd59.

return None


def _delete_labels_longer_than(max_label, props):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is max_label = max_length, both terms are mentioned here but I think only one is used in practice.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well spotted, thanks! I think that was a typo / thinko. Fixed in a75fd59.


# maximum number of characters we'll be able to print at this
# zoom.
max_label = int(shape_length / tolerance)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be called max_length (or even max_chars)?

Please add comment about what units these are in... logical pixels? meters? characters?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed this to max_label_chars, added some comments and made other variable names more descriptive in a75fd59 - hopefully that makes it clear enough?

(end_zoom is not None and zoom >= end_zoom):
return None

tolerance = factor * tolerance_for_zoom(zoom)
Copy link
Member

@nvkelso nvkelso Nov 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add comment for what type of units this results in? Is it "must fit exactly plus the factor" or?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed this to meters_per_letter, since that's what it is - Mercator meters per letter / character / grapheme cluster in the string. The factor was roughly how many pixels to expect the width of each letter to be, I've found that 11 is a pretty good value by experimentation, but we should see what the results of a build look like and tweak it.

Copy link
Member

@nvkelso nvkelso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few nits but overall great!

@zerebubuth zerebubuth merged commit 38034e6 into master Nov 6, 2018
@zerebubuth zerebubuth deleted the zerebubuth/1683-strip-names-off-boundary-lines branch November 6, 2018 01:05
@nvkelso nvkelso mentioned this pull request Dec 3, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants