Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to Generate IDs for input features #97

Closed
asheemmamoowala opened this issue Jul 31, 2018 · 7 comments · Fixed by #121
Closed

Add option to Generate IDs for input features #97

asheemmamoowala opened this issue Jul 31, 2018 · 7 comments · Fixed by #121

Comments

@asheemmamoowala
Copy link
Contributor

#94 implements index-based ids for the returned cluster nodes. Some GeoJSON input may not have feature ids, or may use string feature ids which do not work with geojson-vt.

Proposal

generateLeafIds: enables generation of ids on the leaf nodes, ensuring uniqueness across leaf nodes and cluster nodes. Overwrites any existing id values.

Pros: Simple and efficient solution to create GeoJSON and VT spec compliant id attributes for all nodes.
Cons: id value for a feature can change between different invocations of the API, and depends on the input GeoJSON features.

@mourner
Copy link
Member

mourner commented Jul 31, 2018

Let's do generateIds to be consistent with geojson-vt, implying ids for the input data.
BTW what's our plan for avoiding collisions with cluster ids? We'd need to change the current cluster id scheme.

@asheemmamoowala
Copy link
Contributor Author

👍 generateIds

I tried to implement this when working on #94, but found that collisions would be likely with index of points in the input and the the encoded cluster id (i << 5) + (zoom + 1) where i is the index of the first point in the cluster.

The clusterId is a useful shortcut for getChildren, so that needs to be maintained.

clusterIds can be offset by points.length which would use 0...n ids for the leaf nodes and (n<< 5) + zoom + 1 + n to compute ids for cluster nodes. With this, the total number of points could be up to 719 billion before we run into issues with ids overlapping.

@clif-os
Copy link
Contributor

clif-os commented May 7, 2019

@asheemmamoowala so let me get this straight, we could simply use the normal calculation for cluster id + points.length, and assign points their index based ids per usual?

@clif-os
Copy link
Contributor

clif-os commented May 7, 2019

also are we against creating a function for generating near-random integers?

I was using this fn to place 100k features on my map originally:

const uuidInt = () =>
  Number(
    uuid(null, [], 6)
      .slice(-7, -1)
      .join('')
  )

@asheemmamoowala
Copy link
Contributor Author

we could simply use the normal calculation for cluster id + points.length, and assign points their index based ids per usual?

That was the proposal. This has a possibility of conflicting with existing ids in a large GeoJSON file. There is not a good suggestion so far of how to guarantee that cluster ids do not conflict with geojson feature ids if already present in the data.

also are we against creating a function for generating near-random integers?

Yes. Using random values for ids means that the same geojson will produce different results on different runs, preventing persisting ids between sessions, For example, consistent id generation will allow maintaining the id of "select" features across calls to setData that update the properties of features.

@clif-os
Copy link
Contributor

clif-os commented May 8, 2019

preventing persisting ids between sessions, For example, consistent id generation will allow maintaining the id of "select" features across calls to setData that update the properties of features.

Ok that makes a lot of sense.

This has a possibility of conflicting with existing ids in a large GeoJSON file.

I see. Are you worried that the feature.id will be set by the user incoming to mapbox gl, or that the id will be set via the generateId source specification?

If it is generateId you are worried about, couldn't we reverse the logic and set feature.id to index and offset cluster_ids by features.length?

@asheemmamoowala
Copy link
Contributor Author

I see. Are you worried that the feature.id will be set by the user incoming to mapbox gl, or that the id will be set via the generateId source specification?

Yes I am worried about a GeoJSON source with existing feature ids conflicting with the generated ids for clusters.

If it is generateId you are worried about, couldn't we reverse the logic and set feature.id to index and offset cluster_ids by features.length?

If the input data already has feature Ids, I don't think that we should be changing those. Developers should be able to rely on their individual feature nodes coming through this library un-changed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants