-
Notifications
You must be signed in to change notification settings - Fork 996
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Entity types as a higher-level concept #405
Comments
What would be an example scenario where this approach is the most sensible? For Gojek at least, i would imagine that project based entities make more sense. One project per service type (food, ride, gopay), each having entities which might share the same name (customer id, driver id). |
The example you are referring to would be for
It seems to provide a cleaner isolation, but it is also the case that "users" would have to define their own projects and feature sets from which they would reference these authoritative entities. So I am only seeing one option here, not two. The disadvantage comes from having to know whether to use either of these two projects. |
Another possible solution would be a hybrid model between global and project level entities. I have added this as (3) in the comment above, titled |
I am in favour of 3. Option 2 (unique global entity name) may lead to complicated entity management for some cases. For example, let say we have drivers for different countries. Option no 2 dictates that we cannot have the same entity for all country (eg. driver), but instead, multiple different entities. (eg. driver_vn, driver_th, driver_sg). It is likely that in an end to end machine learning workflow, the code section involving the drivers will be similar regardless of country (eg. Extracting driver entity value from JSON request during prediction step). So, for option no 2, the pipeline will need to know that driver_vn, driver_sg and driver_ th all belongs to the same group and should be handled the same way, which leads to extra configurations on the user side. |
Though, if we go for option 3, we might want to explore if the concept of default project should be extended to feature retrieval as well, for consistency. For example, if no project / default project has been set and project is not explicitly specified in feature ref, then the fallback would be the 'default' project. |
Its not clear what you mean here. What prevents you from having simply |
Absolutely, that was my hope as well! |
Actually, yeah you are correct, I can just have driver in a global project instead of having the entity defined in each regional project. Too entrenched in the code base that I am currently working on and didn't consider this possibility. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Moving this out of the 0.6 milestone because I think we can live without it for the time being. |
Isn't 3. the same as 1. with just a special project called default? The fact there is a special default project doesn't change the fact that all entities are scoped to a project ie 1. Right? |
Correct. |
Entity as a construct I believe is increasing complexity in the system. What I fail to understand is how the notion of entity is helping in grouping semantically related features together (as per the definition of entity in the documentation). Also, it introduces more problem as joins are happening at the later point of time and entity is defined at the start of user experience. Few questions:
Just want to take others suggestion on the same! |
Introduction
Currently an entity, or more formally an entity type, is treated as a special type of field within a feature set. There has been an attempt to simplify the creation and management of entities and to keep them consistent with features, however some challenges exist with our current approach.
Note: The terms entity and entity type will be used interchangeable in the following issue.
How are entities created?
How are entities used?
What is the problem?
Proposals
1. Project-level entities
Functionality
gojek/customer
).Advantages
Disadvantages
2. Global-level entities
Functionality
Advantages
Disadvantages
float
and another wants to use astring
for an entity data type, then it would likely result in two entities being created. This would still be the case in the Project-level entity proposal, but at least in that proposal the unorthodox approach (maybestring
) could be isolated to a specific project.3. Default project entities
Functionality
default
project. This would be similar to how Kubernetes does namespacing.project level entities
proposal, except users don't actually have to create an entity inside of a named project.my_company/customer
, it would be possible to refer to "global" entities by either usingcustomer
ordefault/customer
.Advantages
project-level entities
.global-level entities
, except that this default project would still not be a true global namespace. There would still need to be an organizational process that informs users to use the entities in thisproject
.project-level
sharing and isolation can be reused.Disadvantages
The text was updated successfully, but these errors were encountered: