Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[IO-1196][internal] Workflow data models #622

Merged
merged 12 commits into from
Jul 3, 2023
Merged

[IO-1196][internal] Workflow data models #622

merged 12 commits into from
Jul 3, 2023

Conversation

owencjones
Copy link
Contributor

Problem

Introducing workflows will require data transfer of workflows data structures. We don't have these

Solution

Have taken the JSON objects the workflows endpoints return (provided by @Nathanjp91) and created pydantic objects of all applicable objects, for validation and parsing

Changelog

Introduced 5 objects in darwin/future/data_objects and relevant tests

@linear
Copy link

linear bot commented Jun 30, 2023

@@ -0,0 +1,172 @@
from datetime import datetime
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For ease of review, I'm going to mark those of these that are pretty much what they need to be, and those which are prone to change.

from darwin.future.pydantic_base import DefaultDarwin


class WFDataset(DefaultDarwin):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WFDataset is a simple model, and pretty much exactly what we need it to be already.

name: str
instructions: str = Field(min_length=0)

def __int__(self) -> int:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added these methods for ease, they may prove useful, they are better that running the Pydantic initial dunders in this case, and if they don't prove useful, they took very little time to do.



class WFEdge(DefaultDarwin):
"""
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WFEdge seems largely complete, we may find more to add to it later, but it works as a graph node already.

Was considering the validity of a graph solver function. E.g. a function that confirms the graph is valid within our parameters, this would be simple (it's similar to linked list traversal), but until we need it, lets not bother.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would constitute an unsolvable system? disconnected stages that can't be visited? That would seem an easy check with recursion and a 'visited' array.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a vague reference, but for instance a workflow with unconnected nodes, or nodes connected in one infinite loop, two loops, etc - these are essentially a graph, but we have imposed logic on that.

I don't know if it's necessary for us to do anything with it, as the backend may already validate it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Validation exists somewhere, but might only be on the frontend. If you do try to create disconnected graphs it complains (only when you click save though, so I assume it's on the backend). It might not be that relevant to double up validation.



class WFType(Enum):
"""
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is complete, but we may find that we need to add an extra stage for a hidden type that isn't apparent, or that I've not thought of.



class WFUser(DefaultDarwin):
stage_id: UUID
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems complete by the data structures in the JSON, which are concise and clear in this case



class WFStageConfig(DefaultDarwin):
# ! NB: We may be able to remove many of these attributes
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I included this to allow validation with completeness, but many of these values had no assigned type at any stage in any of the data I could get.

I suspect we don't need many of these for backend functionality, but validating them won't hurt, and we can evolve this object more freely as the comments reflect



class WFStage(DefaultDarwin):
"""
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confident in this one, we may find some are either unnecessary or we need extra info added.



class Workflow(DefaultDarwin):
"""
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The complex extraneous items made the final workflow object simple, and I'm confident in this one.

@@ -1,8 +1,9 @@
import unittest
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incidental tidying, unrelated to this

@@ -0,0 +1,7 @@
{
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Example file used for test - NB: all IDs are randomly generated/madeup, unrelated to production system IDs

@@ -0,0 +1,6 @@
{
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Example file used for test - NB: all IDs are randomly generated/madeup, unrelated to production system IDs

@@ -0,0 +1,45 @@
{
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Example file used for test - NB: all IDs are randomly generated/madeup, unrelated to production system IDs

@@ -0,0 +1,26 @@
{
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Example file used for test - NB: all IDs are randomly generated/madeup, unrelated to production system IDs

@@ -0,0 +1,4 @@
{
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Example file used for test - NB: all IDs are randomly generated/madeup, unrelated to production system IDs

@@ -0,0 +1,75 @@
{
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Example file used for test - NB: all IDs are randomly generated/madeup, unrelated to production system IDs

"type": "annotate"
}
],
"team_id": 1337,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Told you they were made up



def test_sad_paths() -> None:
dataset = WFDataset.parse_file(validate_dataset_json)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sad path testing is largely unnecessary for pydantic, and when successful conversely ends up E2E testing Pydantic itself.

It was handy in writing the objects correctly with the behaviour I expected, and I used sad path tests less for successive objects.

assert str(parsed_stage.id) == "e69d3ebe-6ab9-4159-b44f-2bf84d29bb20"


def test_raises_with_invalid_uuid() -> None:
Copy link
Contributor Author

@owencjones owencjones Jun 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was originally a social test for a UUID validator that I wrote, before finding out Pydantic had introduced proper support for this.

Leaving it in place provides both a sanity test and a test that the pydantic support remains for all the objects that use this, so I left it in.

def test_Workflow_validates_from_valid_json() -> None:
parsed_set = Workflow.parse_file(validate_json)

assert isinstance(parsed_set, Workflow)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Included a significant amount of testing here, as the Workflow object is the object that encompasses all the other objects into one place.

This test passing ensures essentially that the other objects also parse correctly, at least in the main.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like the amount of testing, although with pydantic objects I've just been trusting that they validate correctly.

@owencjones owencjones marked this pull request as ready for review June 30, 2023 09:57
@owencjones owencjones changed the title Io 1196 [IO-1196][internal] Workflow data models Jun 30, 2023
Copy link
Contributor

@Nathanjp91 Nathanjp91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All looks pretty good, no changes required I think. Data objects are very fleshed out.

def test_Workflow_validates_from_valid_json() -> None:
parsed_set = Workflow.parse_file(validate_json)

assert isinstance(parsed_set, Workflow)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like the amount of testing, although with pydantic objects I've just been trusting that they validate correctly.

@owencjones owencjones merged commit 3ff6c03 into master Jul 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants