Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additional metadata (notes, comments, etc.) #71

Open
makkus opened this issue Feb 29, 2024 · 10 comments
Open

Additional metadata (notes, comments, etc.) #71

makkus opened this issue Feb 29, 2024 · 10 comments

Comments

@makkus
Copy link
Contributor

makkus commented Feb 29, 2024

We want to store additional metadata alongside the computations that happen within kiara. It's unclear how that should happen, and what this metadata should be attached to, and how the API endpoints should look like for this.

@makkus
Copy link
Contributor Author

makkus commented Feb 29, 2024

As I've said before, I'm really not sure how to do this, so I'll need eveyones help defining the context in which this should happen. In my mind this depends very much on how a specific frontend wants to guide the user through a 'workflow session', but I might be wrong and overestimating the complexity of this, wouldn't be the first time.

So, ideally, someone who has a clearer idea of this feature would describe their 'ideal-world-API-endpoints' they would want to use for this feature, coming from a frontend (Jupyter or interactive GUI), including all the data they want to store, and (more importantly), retrieve again later. I'll be happy to implement those endpoints, I just don't have a good idea how they should look like.

The one thing that is not clear to me at all, is how to manage the frontend-side temporal aspect that in my mind is connected to all of this. Again, I might be wrong.

A good exercise for everyone would be to think about a user using the frontend you are interested in, how they do a computation, then want to store a note that is connected to it. That's the easy part. Then think about the situation (or situations) where that user wants to access that note again. When does that happen, in which circumstances? What does the user need to do to see the note again, what inputs do they need to provide to identify the specific note they want (whether they know it or not)? If you could come up with descriptions of those user interactions, it would help me out a lot.

For Jupyter, that could be just mock code that describes how you see a user using those imaginary API endpoints, for anything graphical just very rough wireframes or even written descriptions would be good enough. The important thing is that those are concrete descriptions, and they include not just the storing of notes, but when/how to get them back. Once we have that, we can talk about whether this is possible from a backend perspective, and if not, why and hopefully at some stage we'll arrive at the same understanding how all this can work out.

@makkus
Copy link
Contributor Author

makkus commented Feb 29, 2024

Some context that may/may not be useful: DHARPA-Project/kiara-website#36

@makkus
Copy link
Contributor Author

makkus commented Mar 7, 2024

Ok, so, according to our meeting today the answer to my above question is:

  • the run_job / queue_job endpoint gets an additional required argument 'comment', which takes a string which is stored internally, and can be looked up via the job id for that particular job
  • everytime run_job / queue_job are used, the results are stored into the kiara data store (along with all input/intermediate values)
  • the get_job API endpoint can be used to retrieve the comment associated with the job (along with other job metadata like submission time, status, etc.)
  • an additional list_jobs API endpoint will be created, which will return a list of job ids that were run in the past, sorted from earliest to latest
  • an additional API endpoint will be created (name tbd) that lets users find the job id for any particular value_id

@CBurge95
Copy link

CBurge95 commented Mar 7, 2024

Just to add to this in a 'visual' sense of how this might work / how I as a user would want to see & use these notes and logs.

Giving the run_job function an extra mandatory argument, so that people have to input notes (as string) every time they use the run job function. We won't validate what this is (in terms of, if they choose to write nothing, we won't define how many characters counts as a string, but they have to actively write a blank note if this is the case).

Each time run_job is used, all information associated with this run is automatically stored, inc. input variables, notes, and timestamps. This may look something like this:

run_job(job name, module inputs, notes)

run_job(same job name, new module inputs, notes)

run_job(different job name, module inputs, notes)

etc. for as many variations through to the end of the research where - alongside exporting outputs or datasets etc. - will do:

final_ouput.lineage -- > to get the data lineage of the 'final' object, as currently exists

project_name.job_log --> will return a list / table as below that lists all the jobs run with associated metadata

Timestamp Module Name Module Inputs Module Outputs Notes Job Runtime
11.00 job name module inputs module outputs notes time
11.05 same job name new module inputs new module outputs notes time
11.10 different job name module inputs module outputs notes time

We will probably create visualisations at a later date, but for the moment this is the outgoing information that would be needed. It will be associated with the context / project so (in user documentation, particularly for Jupyter notebook) we will need to make this clear how they can name / access the name of their project so that they can call the job log.

This 'job log' might / should also include extra information such as preview of module code, plugin package version of the module run, and other things we might think of important as researchers (that I can't think of at the moment)

@makkus
Copy link
Contributor Author

makkus commented Mar 13, 2024

Ok, so kiara version 0.5.10rc8 now has an inital version for this. Here's how it would look like in Python:

from kiara.api import KiaraAPI
kiara = KiaraAPI.instance()

inputs = {
    "a": True,
    "b": False
}
result = kiara.run_job("logic.and", inputs, comment="A comment")
result_val = result["y"]

comment = kiara.get_job_comment(result_val.job_id)
print(f"The comment for the job that produced '{result_val.value_id}' is:")
print(comment)

job_records = kiara.list_job_records()

print()
print("All job records:")
print()

for job_id, job in job_records.items():
    print(f"Job '{job_id}', submitted: {job.job_submitted}")
    comment = kiara.get_job_comment(job_id)
    if comment is not None:
        print(f"Comment for job '{job_id}': ", comment)
    else:
        print(f"No comment for job '{job_id}'")

    print("All job details:")
    dbg(job.model_dump())  # dbg is just a helper method that should be available globally whenever you use kiara

Check the API endpoints used in that code for more details, and as always let me know if the docs for those endpoints are missing information or are unclear. There are also a few other (mostly convenience) endpoints related to jobs in the API, so have a look at those too.

I won't be releasing a 'production' new kiara version for a while, because the archive/store format is still changing quite a bit, and breakage from one kiara version to the next one would be guaranteed. That means also that you can expect breakage between rc versions (probably should have named them 'beta', but too late now and not that important anyway).

As before, using your existing context with the new version should work (if not, it's a bug and you need to tell me), but you can't downgrade anymore after you have used the new version.
Up-/downgrading the kiara package should work in a virtualenv, but the context configurations will be incompatible, so either use a separate kiara context, or delete the (default) context whenever you downgrade to the old 0.5.9 stable kiara version.

@stakats
Copy link
Contributor

stakats commented Mar 21, 2024

FYI, it seems like it's necessary to delete the context manually, by actually deleting the directory that contains the context.

@makkus
Copy link
Contributor Author

makkus commented Mar 21, 2024

Ok, I thought I had that fixed, but maybe there are edge-cases I haven't considered. I will make sure that doesn't happen for the release, but for testing it's probably good enough.

To find the folders kiara is using, there is a kiara --runtime-info command in this new version, that should make it easier to find the data path that needs to be deleted.

@stakats
Copy link
Contributor

stakats commented Mar 22, 2024

Per Slack discussion, let's add a method to set a comment on a job that has run, e.g. something like kiara.set_job_comment(job_id, comment="A comment"). This would overwrite any existing comment.

@makkus
Copy link
Contributor Author

makkus commented Apr 2, 2024

Ok, updating a comment will work with 0.5.10rc9:

comment = kiara.get_job_comment(result_val.job_id)
print(comment)

kiara.set_job_comment(result_val.job_id, "This is an updated comment.")
comment = kiara.get_job_comment(result_val.job_id)
print(comment)

@stakats
Copy link
Contributor

stakats commented May 23, 2024

After looking at some of the (minor, solvable) issues that have cropped up around this, I am wondering whether we want to make comments optional. Otherwise running jobs (especially when testing/debugging) is a bit of a drag.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants