Additional metadata (notes, comments, etc.) #71

makkus · 2024-02-29T13:39:19Z

We want to store additional metadata alongside the computations that happen within kiara. It's unclear how that should happen, and what this metadata should be attached to, and how the API endpoints should look like for this.

makkus · 2024-02-29T13:53:50Z

As I've said before, I'm really not sure how to do this, so I'll need eveyones help defining the context in which this should happen. In my mind this depends very much on how a specific frontend wants to guide the user through a 'workflow session', but I might be wrong and overestimating the complexity of this, wouldn't be the first time.

So, ideally, someone who has a clearer idea of this feature would describe their 'ideal-world-API-endpoints' they would want to use for this feature, coming from a frontend (Jupyter or interactive GUI), including all the data they want to store, and (more importantly), retrieve again later. I'll be happy to implement those endpoints, I just don't have a good idea how they should look like.

The one thing that is not clear to me at all, is how to manage the frontend-side temporal aspect that in my mind is connected to all of this. Again, I might be wrong.

A good exercise for everyone would be to think about a user using the frontend you are interested in, how they do a computation, then want to store a note that is connected to it. That's the easy part. Then think about the situation (or situations) where that user wants to access that note again. When does that happen, in which circumstances? What does the user need to do to see the note again, what inputs do they need to provide to identify the specific note they want (whether they know it or not)? If you could come up with descriptions of those user interactions, it would help me out a lot.

For Jupyter, that could be just mock code that describes how you see a user using those imaginary API endpoints, for anything graphical just very rough wireframes or even written descriptions would be good enough. The important thing is that those are concrete descriptions, and they include not just the storing of notes, but when/how to get them back. Once we have that, we can talk about whether this is possible from a backend perspective, and if not, why and hopefully at some stage we'll arrive at the same understanding how all this can work out.

makkus · 2024-02-29T14:06:23Z

Some context that may/may not be useful: DHARPA-Project/kiara-website#36

makkus · 2024-03-07T12:09:12Z

Ok, so, according to our meeting today the answer to my above question is:

the run_job / queue_job endpoint gets an additional required argument 'comment', which takes a string which is stored internally, and can be looked up via the job id for that particular job
everytime run_job / queue_job are used, the results are stored into the kiara data store (along with all input/intermediate values)
the get_job API endpoint can be used to retrieve the comment associated with the job (along with other job metadata like submission time, status, etc.)
an additional list_jobs API endpoint will be created, which will return a list of job ids that were run in the past, sorted from earliest to latest
an additional API endpoint will be created (name tbd) that lets users find the job id for any particular value_id

CBurge95 · 2024-03-07T14:27:26Z

Just to add to this in a 'visual' sense of how this might work / how I as a user would want to see & use these notes and logs.

Giving the run_job function an extra mandatory argument, so that people have to input notes (as string) every time they use the run job function. We won't validate what this is (in terms of, if they choose to write nothing, we won't define how many characters counts as a string, but they have to actively write a blank note if this is the case).

Each time run_job is used, all information associated with this run is automatically stored, inc. input variables, notes, and timestamps. This may look something like this:

run_job(job name, module inputs, notes)

run_job(same job name, new module inputs, notes)

run_job(different job name, module inputs, notes)

etc. for as many variations through to the end of the research where - alongside exporting outputs or datasets etc. - will do:

final_ouput.lineage -- > to get the data lineage of the 'final' object, as currently exists

project_name.job_log --> will return a list / table as below that lists all the jobs run with associated metadata

Timestamp	Module Name	Module Inputs	Module Outputs	Notes	Job Runtime
11.00	job name	module inputs	module outputs	notes	time
11.05	same job name	new module inputs	new module outputs	notes	time
11.10	different job name	module inputs	module outputs	notes	time

We will probably create visualisations at a later date, but for the moment this is the outgoing information that would be needed. It will be associated with the context / project so (in user documentation, particularly for Jupyter notebook) we will need to make this clear how they can name / access the name of their project so that they can call the job log.

This 'job log' might / should also include extra information such as preview of module code, plugin package version of the module run, and other things we might think of important as researchers (that I can't think of at the moment)

makkus · 2024-03-13T21:37:38Z

Ok, so kiara version 0.5.10rc8 now has an inital version for this. Here's how it would look like in Python:

from kiara.api import KiaraAPI
kiara = KiaraAPI.instance()

inputs = {
    "a": True,
    "b": False
}
result = kiara.run_job("logic.and", inputs, comment="A comment")
result_val = result["y"]

comment = kiara.get_job_comment(result_val.job_id)
print(f"The comment for the job that produced '{result_val.value_id}' is:")
print(comment)

job_records = kiara.list_job_records()

print()
print("All job records:")
print()

for job_id, job in job_records.items():
    print(f"Job '{job_id}', submitted: {job.job_submitted}")
    comment = kiara.get_job_comment(job_id)
    if comment is not None:
        print(f"Comment for job '{job_id}': ", comment)
    else:
        print(f"No comment for job '{job_id}'")

    print("All job details:")
    dbg(job.model_dump())  # dbg is just a helper method that should be available globally whenever you use kiara

Check the API endpoints used in that code for more details, and as always let me know if the docs for those endpoints are missing information or are unclear. There are also a few other (mostly convenience) endpoints related to jobs in the API, so have a look at those too.

I won't be releasing a 'production' new kiara version for a while, because the archive/store format is still changing quite a bit, and breakage from one kiara version to the next one would be guaranteed. That means also that you can expect breakage between rc versions (probably should have named them 'beta', but too late now and not that important anyway).

As before, using your existing context with the new version should work (if not, it's a bug and you need to tell me), but you can't downgrade anymore after you have used the new version.
Up-/downgrading the kiara package should work in a virtualenv, but the context configurations will be incompatible, so either use a separate kiara context, or delete the (default) context whenever you downgrade to the old 0.5.9 stable kiara version.

stakats · 2024-03-21T12:54:33Z

FYI, it seems like it's necessary to delete the context manually, by actually deleting the directory that contains the context.

makkus · 2024-03-21T13:06:57Z

Ok, I thought I had that fixed, but maybe there are edge-cases I haven't considered. I will make sure that doesn't happen for the release, but for testing it's probably good enough.

To find the folders kiara is using, there is a kiara --runtime-info command in this new version, that should make it easier to find the data path that needs to be deleted.

stakats · 2024-03-22T10:28:04Z

Per Slack discussion, let's add a method to set a comment on a job that has run, e.g. something like kiara.set_job_comment(job_id, comment="A comment"). This would overwrite any existing comment.

makkus · 2024-04-02T08:31:32Z

Ok, updating a comment will work with 0.5.10rc9:

comment = kiara.get_job_comment(result_val.job_id)
print(comment)

kiara.set_job_comment(result_val.job_id, "This is an updated comment.")
comment = kiara.get_job_comment(result_val.job_id)
print(comment)

stakats · 2024-05-23T08:02:12Z

After looking at some of the (minor, solvable) issues that have cropped up around this, I am wondering whether we want to make comments optional. Otherwise running jobs (especially when testing/debugging) is a bit of a drag.

makkus mentioned this issue Jun 5, 2024

API kiara.list_all_values endpoint takes several minutes to process #73

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Additional metadata (notes, comments, etc.) #71

Additional metadata (notes, comments, etc.) #71

makkus commented Feb 29, 2024

makkus commented Feb 29, 2024 •

edited

Loading

makkus commented Feb 29, 2024

makkus commented Mar 7, 2024

CBurge95 commented Mar 7, 2024 •

edited

Loading

makkus commented Mar 13, 2024 •

edited

Loading

stakats commented Mar 21, 2024

makkus commented Mar 21, 2024

stakats commented Mar 22, 2024 •

edited

Loading

makkus commented Apr 2, 2024 •

edited

Loading

stakats commented May 23, 2024

Additional metadata (notes, comments, etc.) #71

Additional metadata (notes, comments, etc.) #71

Comments

makkus commented Feb 29, 2024

makkus commented Feb 29, 2024 • edited Loading

makkus commented Feb 29, 2024

makkus commented Mar 7, 2024

CBurge95 commented Mar 7, 2024 • edited Loading

makkus commented Mar 13, 2024 • edited Loading

stakats commented Mar 21, 2024

makkus commented Mar 21, 2024

stakats commented Mar 22, 2024 • edited Loading

makkus commented Apr 2, 2024 • edited Loading

stakats commented May 23, 2024

makkus commented Feb 29, 2024 •

edited

Loading

CBurge95 commented Mar 7, 2024 •

edited

Loading

makkus commented Mar 13, 2024 •

edited

Loading

stakats commented Mar 22, 2024 •

edited

Loading

makkus commented Apr 2, 2024 •

edited

Loading