Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Chinese tags from OSM #1960

Merged
merged 12 commits into from
Jul 29, 2021
Merged

Fix Chinese tags from OSM #1960

merged 12 commits into from
Jul 29, 2021

Conversation

peitili
Copy link
Contributor

@peitili peitili commented Jul 26, 2021

  • Update tests
  • Update docs

This is a follow-up PR for #1956 and it fixes an issue of Chinese labels in the vector tiles. There was a bug that if there is some name tags starts with "zh" such as "zh_pinyin", it would override the output "name:zh" feature, and our vector tiles would contain pinyin as the Chinese label for those sourced from OSM. However it is not ideal since pinyin is just a Romanisation for Chinese not actual Chinese characters.

@peitili peitili changed the title Debugchinese Fix Chinese tags from OSM Jul 26, 2021
Copy link
Member

@iandees iandees left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks OK to me. Let's make sure the logic makes sense to @nvkelso though.

@@ -592,14 +592,18 @@ def _normalize_country_code(x):

osm_l10n_lookup = set([
'zh-min-nan',
'zh-yue',
])

# key is the name in OSM source; value is the Tilezen internal name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update the comment here. Value is a tuple where the second field is a priority?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update the comment here. Value is a tuple where the second field is a priority?

good point, will update in the next commit

])

# key is the name in OSM source; value is the Tilezen internal name

# key is the name in OSM source; value is a tuple of Tilezen internal name and its priority value
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for adding more detail here, I was scratching my head earlier

@peitili
Copy link
Contributor Author

peitili commented Jul 27, 2021

@nvkelso also we only assume name:zh would contain "/" separated fields, but do we need to handle the case for all the name:zh* fields to only select one of them from the "/" separated list in the final result?

@nvkelso
Copy link
Member

nvkelso commented Jul 27, 2021

@nvkelso also we only assume name:zh would contain "/" separated fields, but do we need to handle the case for all the fields to only select one of them from the "/" separated list in the final result?

Since we see this is the case for Papua New Guiana on Friday in Slack then yes I think we need to do this for each. https://www.openstreetmap.org/relation/307866

@peitili The unit tests you added inside transform generally work... but since the error was seen on a specific feature in tiles, we should test that tile in the associated unit test to make sure it doesn't regress, too. So in https://github.com/tilezen/vector-datasource/blob/master/integration-test/1955-chinese-parser.py add a test for USA using the full property data for https://www.openstreetmap.org/node/424317935 (related to https://www.openstreetmap.org/relation/148838).

@peitili
Copy link
Contributor Author

peitili commented Jul 27, 2021

@nvkelso also we only assume name:zh would contain "/" separated fields, but do we need to handle the case for all the fields to only select one of them from the "/" separated list in the final result?

Since we see this is the case for Papua New Guiana on Friday in Slack then yes I think we need to do this for each. https://www.openstreetmap.org/relation/307866

@peitili The unit tests you added inside transform generally work... but since the error was seen on a specific feature in tiles, we should test that tile in the associated unit test to make sure it doesn't regress, too. So in https://github.com/tilezen/vector-datasource/blob/master/integration-test/1955-chinese-parser.py add a test for USA using the full property data for https://www.openstreetmap.org/node/424317935 (related to https://www.openstreetmap.org/relation/148838).

Sure, I tried to add a test to mimic the relation, however, it is non-trivial to figure out how the fixture work. For example. The test below would fail because it says layer 'places' was empty and I debugged that all the layers are empty for this fixture set up.

So instead I just create a OSM way in the integration test to represent United States.

def test_united_states_osm(self):
      # United States of America (osm city)
      z, x, y = (16, 0, 0)

      self.generate_fixtures(
          dsl.point(424317935, _tile_centre(z, x, y), {
              u'source': u'openstreetmap.org',
              u'name:zh': u'美國',
              u'name:yue': u'即美利堅合眾國',
              u'name:zh_pinyin': u'Měiguó',
          }),
          dsl.relation(148838, {
              'type': 'label'
          }, nodes=[424317935])
      )

      self.assert_has_feature(
          16, 0, 0, 'places',
          {'id': 424317935,
           'source': "openstreetmap.org",
           'name:zh': u'美國',
           'name:zht': u'美國'})

      self.assert_no_matching_feature(
          16, 0, 0, 'places',
          {'name:zh-default': u'美國'})

)
# should show up in zooms within the range 2-6
for zoom in xrange(2, 6):
x, y = deg2num(lat, lon, zoom)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

neat!

@@ -62,6 +206,132 @@ def test_san_francisco_osm(self):
16, 10482, 25330, 'places',
{'name:zh-default': u'旧金山/三藩市/舊金山'})

self.generate_fixtures(dsl.way(26819236, wkt_loads(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My last 2¢: Please remove this fixture and later 2 tests as I think they duplicate the test_united_states which starts on line 16, and that test has the corrected coordinates while this one still tests for zoom 16 in error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that makes sense, I removed in the next commit.

Copy link
Member

@nvkelso nvkelso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching these edge cases!

I'll update my separate Natural Earth simplified and traditional Chinese logic in nvkelso/natural-earth-vector#533.

@peitili peitili merged commit 39b34fa into master Jul 29, 2021
@peitili peitili deleted the debugchinese branch July 29, 2021 15:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants