Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Resources created in state index and de-provisioning mapping #136

Closed
amitgalitz opened this issue Oct 31, 2023 · 15 comments
Closed
Assignees
Labels
enhancement New feature or request

Comments

@amitgalitz
Copy link
Member

amitgalitz commented Oct 31, 2023

Is your feature request related to a problem?

Right now we have a resources created field in the state index but there are a few ways on how we want to store this information.

What solution would you like?

We can either map the ID of what was created to the type/name of the workflow step that created it or to the object name itself.

Option 1 [Preferred]:

"resources_created": [
        {
            "type": "create_index_step",
            "id": "FobagIsBjjuFSw4Sfl3c"
        },
        {
            "type": "register_model",
            "id": "FobagIsBjjuFSw4Sfl3c"
        }
    ]
    

Option 2:

"resources_created": [
       {
           "type": "index",
           "id": "FobagIsBjjuFSw4Sfl3c"
       },
       {
           "type": "model",
           "id": "FobagIsBjjuFSw4Sfl3c"
       }
   ]
   

I think it makes sense to map the created objected to the step that created it, in order to avoid adding another category of things like model/index/agent.

However two things we need to consider are:

  1. The order of the created objects in terms of de-provisioning order. Especially if some things depend on each other. For example do we need to delete the agent first before deleting the model that uses it. We can also potentially get this order from the use case template's nodes and edges themselves and delete in reverse order.

  2. How will we map each workflow step that created the resources to the workflow step that deletes the resources. We should probably have a way to match the create request to the equivalent delete request when we add workflow nodes/steps.

There could be a global delete map that looks something like this, this would be however another thing to maintain when adding new steps:
create_api:delete_api

 "deletion_map": {
            "create_index_step": "delete_index_step",
            "register_model": "delete_model",
        },

Current Solution:

"resources_created": [
        {
            "workflow_step_name": "create_connector",
            "resource_id": "fZTmz4sBVvQ80zEwkC2z"
        }
    ]
    

Additional Context

Right now we have some implementation of updating the resources created field in CreateConnectorStep. We should decide if we want to add more information here by fetching the resource name from some enum.

Additionally we can abstract the method in CreateConnectorStep to be used by all steps easily as mentioned in this comment: #148 (comment)

@amitgalitz amitgalitz added enhancement New feature or request untriaged labels Oct 31, 2023
@joshpalis
Copy link
Member

joshpalis commented Oct 31, 2023

The order of the created objects in terms of de-provisioning order. Especially if some things depend on each other. For example do we need to delete the agent first before deleting the model that uses it. We can also potentially get this order from the use case template's nodes and edges themselves and delete in reverse order.

Just a thought, if we rely on the deprovision API to ascertain the order of deletion from the nodes and edges provided by the use case template (rather than the order the resource appears in the resources_created field), then we'll probably need to include the workflow step id to the resources created along with the workflow step type. This would allow us to determine which specific step is responsible for creating the resource, and then determine what order this step appears in the graph. I think this would be helpful in cases where we have multiple instances of the same step type.

Perhaps something like this :

"resources_created": [
        {
            "workflow_step_id" : "workflow_step_1",
            "workflow_step_type": "create_index_step",
            "id": "FobagIsBjjuFSw4Sfl3c"
        },
        {
            "workflow_step_id" : "workflow_step_2",
            "workflow_step_type": "register_model_step",
            "id": "FobagIsBjjuFSw4Sfl3c"
        }
    ]

@dbwiddis
Copy link
Member

A few thoughts here:

  1. I think trying to recreate the workflow step in reverse order is over-complicating things:
    • not all steps will create a resource
    • all we really want to do is delete all of them
    • unless the DELETE API fails on a dependency then order won't matter (follow up Q: do any DELETE APIs fail?)
  2. More information is better, again the goal is to eventually call a DELETE API
  3. Conceptually speaking you can think that each workflow node that creates a resource has an imaginary "delete node" following it that can take its output as input and do its job. e.g., "CreateIndexStep" should have a corresponding "DeleteIndexStep"
  4. So I wonder if we can't just create a "delete workflow template" on-the-fly as we're creating the resources. When I call CreateIndexStep, then a DeleteIndexStep gets added to the deletion template. When I create an Agent, the DeleteAgent step gets added to the deletion template. We can add the nodes sequentially in the forward direction (order doesn't matter in JSON) but add the edges to set the predecessors in the opposite direction.

WDYT?

@ohltyler
Copy link
Member

ohltyler commented Nov 15, 2023

I think it is important to be transparent to the user regarding what this plugin will be creating behind the scenes when the user creates a workflow. This is critical for users to know what is going on and to keep track of how the different resources are connected and interacting with one another, as well as what the implications are should they choose to de-provision it.

I think it is almost equally important to show the associated resources, regardless of if they were created from provision, or just hooking up some existing resource. For example, a user may re-purpose some knn index, or external model connector. It would still be useful to provide that information so the user understands the impacts when those resources change or be deleted, it may cause problems with the workflow (not feasible to dynamically warn the user, but by statically providing the associated resources, it gives the user more transparency).

Perhaps we have 2 categories of resources within the state index:

  1. Created resources - created via provision, will be deleted when de-provisioning
  2. Associated resources - created externally and just used within the workflow somewhere, won't be deleted when de-provisioning.

This allows us to both 1/ show everything (created from or just associated with) a workflow, and 2/ have validation logic when de-provisioning to let users be aware of what will be deleted should they proceed.

Maybe these could be configured within the step type itself - e.g., create connector step would have a connector created resource tied to it somehow, query index step would have a knn index associated resource tied to it somehow

@amitgalitz
Copy link
Member Author

Perhaps we have 2 categories of resources within the state index:

Created resources - created via provision, will be deleted when de-provisioning
Associated resources - created externally and just used within the workflow somewhere, won't be deleted when de-provisioning.

Maybe these could be configured within the step type itself - e.g., create connector step would have a connector created resource tied to it somehow, query index step would have a knn index associated resource tied to it somehow

Do you suggest having something like adding a variable to each step that highlights the resource it creates (create_connector -> connector) (register_model -> model) and so on?

@ohltyler
Copy link
Member

Perhaps we have 2 categories of resources within the state index:
Created resources - created via provision, will be deleted when de-provisioning
Associated resources - created externally and just used within the workflow somewhere, won't be deleted when de-provisioning.

Maybe these could be configured within the step type itself - e.g., create connector step would have a connector created resource tied to it somehow, query index step would have a knn index associated resource tied to it somehow

Do you suggest having something like adding a variable to each step that highlights the resource it creates (create_connector -> connector) (register_model -> model) and so on?

Either the resource it creates or an associated resource if it is just utilizing some existing one.

@amitgalitz
Copy link
Member Author

A few thoughts here:

  1. I think trying to recreate the workflow step in reverse order is over-complicating things:

    • not all steps will create a resource
    • all we really want to do is delete all of them
    • unless the DELETE API fails on a dependency then order won't matter (follow up Q: do any DELETE APIs fail?)

There could be cases where we can't delete a model because some agent is using it (not 100% sure on current API response in those cases but its possible. Also for example if you delete a running detector before stopping it, it will fail, so you have to have some order of stopping then deleting). Keeping some order of deletion could be helpful so we don't have to think about edge cases, but not an absolute must if we can handle those separately.

  1. More information is better, again the goal is to eventually call a DELETE API
  2. Conceptually speaking you can think that each workflow node that creates a resource has an imaginary "delete node" following it that can take its output as input and do its job. e.g., "CreateIndexStep" should have a corresponding "DeleteIndexStep"
  3. So I wonder if we can't just create a "delete workflow template" on-the-fly as we're creating the resources. When I call CreateIndexStep, then a DeleteIndexStep gets added to the deletion template. When I create an Agent, the DeleteAgent step gets added to the deletion template. We can add the nodes sequentially in the forward direction (order doesn't matter in JSON) but add the edges to set the predecessors in the opposite direction.

WDYT?

Making some internal deletion template makes sense to me so we keep an order of what we want to delete, for adding more detail like you mention we might want to add some variable in each step with the resource name. so its easy for the user to understand and read connector like Tyler mentions above vs just create_connector_step

@amitgalitz
Copy link
Member Author

Perhaps we have 2 categories of resources within the state index:
Created resources - created via provision, will be deleted when de-provisioning
Associated resources - created externally and just used within the workflow somewhere, won't be deleted when de-provisioning.

Maybe these could be configured within the step type itself - e.g., create connector step would have a connector created resource tied to it somehow, query index step would have a knn index associated resource tied to it somehow

Do you suggest having something like adding a variable to each step that highlights the resource it creates (create_connector -> connector) (register_model -> model) and so on?

Either the resource it creates or an associated resource if it is just utilizing some existing one.

I am okay with that, just wanted to keep to the tenant of as little work as possible for onboarding new steps, this is super small though so probably not a big deal, just meant the less domain knowledge we need to add on each new step is the best but adding a variable for the resource name is very low effort if thats what we go with

@ohltyler
Copy link
Member

Perhaps we have 2 categories of resources within the state index:
Created resources - created via provision, will be deleted when de-provisioning
Associated resources - created externally and just used within the workflow somewhere, won't be deleted when de-provisioning.

Maybe these could be configured within the step type itself - e.g., create connector step would have a connector created resource tied to it somehow, query index step would have a knn index associated resource tied to it somehow

Do you suggest having something like adding a variable to each step that highlights the resource it creates (create_connector -> connector) (register_model -> model) and so on?

Either the resource it creates or an associated resource if it is just utilizing some existing one.

I am okay with that, just wanted to keep to the tenant of as little work as possible for onboarding new steps, this is super small though so probably not a big deal, just meant the less domain knowledge we need to add on each new step is the best but adding a variable for the resource name is very low effort if thats what we go with

Understood - I think its reasonable that for any new steps onboarded, adding resource-related details (if applicable) is a requirement. I still think separating into 2 parts (created or just associated) will be needed, or at least some way to discern the two so the de-provision / delete APIs can work as expected.

@owaiskazi19
Copy link
Member

owaiskazi19 commented Nov 16, 2023

@amitgalitz In terms of having another id in the resources_created field, I think we are over complicating the API response here. We can simply add the response received and the same response which we are storing in the WorkflowData here.

So, the status API should return

"resources_created": [
        {
            "type": "create_index_step",
            "index_name": "xyz"
        },
        {
            "type": "register_model",
            "model_id": "abc",
            "model_status": "CREATED"
        }
    ]

We already have a workflowID to map the resources created for that specific provisioning and user can use the same to retrieve the information.

@amitgalitz
Copy link
Member Author

@amitgalitz In terms of having another id in the resources_created field, I think we are over complicating the API response here. We can simply add the response received and the same response which we are storing in the WorkflowData here.

So, the status API should return

"resources_created": [
        {
            "type": "create_index_step",
            "index_name": "xyz"
        },
        {
            "type": "register_model",
            "model_id": "abc",
            "model_status": "CREATED"
        }
    ]

We already have a workflowID to map the resources created for that specific provisioning and user can use the same to retrieve the information.

Are you okay with doing something like:
We can do something like you mentioned I just think its always easier to query an index and parse an API response when the key's are constant and not different for every item in the array.
thoughts @ohltyler @dbwiddis @owaiskazi19 ?

"resources_created": [
        {
            "workflow_step_name": "register_model",
            "resource_name": "model_id",
            "id": "L85p1IsBbfF"
        }
    ]
    

@owaiskazi19
Copy link
Member

owaiskazi19 commented Nov 16, 2023

Are you okay with doing something like: We can do something like you mentioned I just think its always easier to query an index and parse an API response when the key's are constant and not different for every item in the array. thoughts @ohltyler @dbwiddis @owaiskazi19 ?

"resources_created": [
        {
            "workflow_step_name": "register_model",
            "resource_name": "model_id",
            "id": "L85p1IsBbfF"
        }
    ]
    

We can just query the resources_created field to get all the responses of the provisioning field or for making it more simpler we can have it like:

"resources_created": [
        "create_index_step": {
            "index_name": "xyz"
        },
        "register_model": {
            "model_id": "abc",
            "model_status": "CREATED"
        }
    ]

@ohltyler
Copy link
Member

No strong opinion on that part. My only suggestion is we need a separate field in the state index to persist associated resources - ones not created from provision, but associated with some existing resource.

I think however we are generating the resources_created field, we can do the same for a resources_associated field. WDYT?

@dbwiddis
Copy link
Member

Are you okay with doing something like:

"resources_created": [
        {
            "workflow_step_name": "register_model",
            "resource_name": "model_id",
            "id": "L85p1IsBbfF"
        }
    ]

I'll say yes in general, as I need two pieces of information to properly deprovision these resources:

  • the type of resource
  • the id of the resource

It's nice to have the step that created it as well.

Might need another entry for "dependent resource" such as if the model ID is associated with a connector, I need to delete the model first and then the connector (I think?).

For the specific example with a model_id, I'm not sure that's the correct resource name/type. Is it a connector-based model ID? Local? Remote?

More information is better.

@amitgalitz
Copy link
Member Author

After some additional conversation the current solution I'll go ahead with is having a mapping more similar to what @owaiskazi19 suggested:
The first key will be the workflow_step_name and the second will be the resource_type that was created.

I do agree about potentially having associated resources as a separate field and dependent resources but this wont be apart of the first implemented solution.

"resources_created": [
        "create_index_step": {
            "index_name": "xyz"
        },
        "register_model": {
            "model_id": "abc",
        }
    ]

@dbwiddis
Copy link
Member

@amitgalitz I think this can probably be closed given work in #231 and related PRs. I know we are still considering adding a timestamp for creation time but otherwise the existing implementation is sufficient.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants