RFC: DE<>GIS Dataset Review Process + Internal Publishing Versions #819
Replies: 19 comments
-
Nice write-up! Thank you for doing it. Quick q: we can still use |
Beta Was this translation helpful? Give feedback.
-
I'm also a bit unclear about GH issues... It sounds like the workflow will be as follows:
Does that sound right? |
Beta Was this translation helpful? Give feedback.
-
Thank you, and yes. |
Beta Was this translation helpful? Give feedback.
-
@sf-dcp Yes - thank you, I was a little light on the details around the GIS workflow. Your distillation is very clear though - will copy that over to the main description above. That's exactly what I'd envisioned. |
Beta Was this translation helpful? Give feedback.
-
@sf-dcp Ah, the one difference I'd envisioned is that GIS would note required changes in the child issue, not the parent. Updated main text with your example flow. |
Beta Was this translation helpful? Give feedback.
-
I really have basically no notes on this write-up. This all sounds great, and makes complete sense for how we generally interact with these once they're in this review state. My main note is that GIS support in terms of data access is a bit of a footnote, and that it really needs to be a key part of this, we can't have a result be that they have to struggle manually with a new folder structure. Counterpoint would be to just keep "latest" for now. But seems like while we're touching all this stuff will be the best time to get work on a longer-term solution |
Beta Was this translation helpful? Give feedback.
-
@alexrichey
I guess if these don't live in the DE-only
|
Beta Was this translation helpful? Give feedback.
-
It seems like the verb "publish" here means "DE declares a certain build is ready for QA." So once a build passes QA, which verbs are next. I guess "package and "distribute"? I really like "stage" as a verb for "DE declares a certain build is ready for QA", but wouldn't mind "publish" if we're happy with it only meaning that. |
Beta Was this translation helpful? Give feedback.
-
Yeah, here "publish" means something like "internally published by DE." I'll need to think more about this terminology. Suggestions welcome. "Staged" is sensible, esp in the implication that not everything that's been staged will make it into production. It does have mostly connotations with other systems (ie Git, CI/CD) and I'm not sure how I feel about that. |
Beta Was this translation helpful? Give feedback.
-
Might also be good to distance from staging in that it has a very specific meaning for datasets in |
Beta Was this translation helpful? Give feedback.
-
Unless we want to align those! |
Beta Was this translation helpful? Give feedback.
-
I hate to say it, but I think the term we're actually looking for here is
|
Beta Was this translation helpful? Give feedback.
-
very down to use when it passes QA, do we then promote a |
Beta Was this translation helpful? Give feedback.
-
I also hate to agree but I think you're right. We've sort of circled around to the original "staging" idea, we're just adding something before "draft" instead of after. So in this, "builds" are very disposable (good), but what about "drafts"? To both of your last points, do we clean them out after promoting a draft? (to There's something nice about drafts being more permanent - we can always look back at what happened throughout the course of QA. At least for some period of time (6 months? Year? 2 further publications? Maybe just forever, our s3 costs are pretty cheap). And while part of me likes the simplicity of the final draft being the real "final draft", promoting in some way still has a certain amount of clarity that makes things like GIS endpoints simple. A counterpoint to that though is this latest "republishing" of pluto - I'm not sure (in general) how we'd best want to capture a republishing (and redistribution) in this scheme |
Beta Was this translation helpful? Give feedback.
-
I guess if our act of promoting a "final drat" generates metadata somewhere and we later have to promote a new draft and overwrite the thing at the final endpoint, we'll have a record of the republishing/redistribution Hard to say where the best place for that record would be though (ignoring a DB option for now since we use json files which I still really like). Probably shouldn't be a file in a Maybe a file in the |
Beta Was this translation helpful? Give feedback.
-
I was thinking no - a
I think you're right. The But using db-pluto/drafts/24v2
A month later:
So we end up with every intermediate in |
Beta Was this translation helpful? Give feedback.
-
That all sounds great to me |
Beta Was this translation helpful? Give feedback.
-
when there are multiple subfolders in and sounds like these |
Beta Was this translation helpful? Give feedback.
-
@damonmcc In this scheme, we're still going to keep the packaging folder under the product. So to package, you'll specify a version (and potentially sub-version) of a dataset, (e.g. PLUTO 24v2 version 2) that'll get dropped into a And yes, this does mean we get rid of edm-distributions. One other consideration is that in the case of republishing, we'd want to add some versioning scheme. Maybe something like 24v2-r2. (not something we have to figure out here) |
Beta Was this translation helpful? Give feedback.
-
Problem Statement
When DE has finished a build, we've often encountered some combination of the following problems:
Proposed Solution
For all of our products, we should add a subfolder under the version to indicate the
draft publication version
. The current state looks like this:dataset files
draft publication version
/dataset files
The
draft publication version
will be composed of an integer version, and a summary to describe the the build, similar to the summary line of a git commit. A list of builds versions could look like this:I suggest an integer version instead of a timestamp because we don't really care when the draft was published, whereas the integer corresponds to something that we do care about. e.g. if we're in round three of PLUTO publishing, and you see that the last draft publication is
6-fix-the-issue
then you immediately know something is wrong.Draft Publication Github Issues
Our Publishing Github Action will create a Github Issue for every published build version. Decisions, discussions, etc should be documented on that issue. They should all be linked back to a parent Issue for a build of a dataset.
The Issue for the draft publication should use Github Labels to indicate the status. A list of statuses might be:
Perhaps we can auto-add all of GIS as an Assignee
Implementation Details (Technical)
publish
folder to this new scheme.Publish
action to accept achanges summary
field, which will be used to generate thedraft publication version
. The integer part will be inferred from existing versions on DO.dcpy
: Thedraft publication version
concept needs to be added to the edm publish connector.Publish
functionality should refuse to overwrite existing data.Implementation Details (Nontechnical)
draft
folder should be considered deletable.Other Considerations
latest
folders. It's a convenient hack, but has the liability of being potentially out of sync with actual latest versions. As part of this, we could help GIS migrate off. We could either supply them python code to infer last build version, or add a REST endpoint to the QAQC app to redirect to the DO location.edm-publishing
/db-pluto
/23v2
which should be (presumably) under the draft folder.draft publications
?publication drafts
? They really are just drafts of what we'll eventually publish.Example Workflow for GIS (Copied from @sf-dcp's comment)
Suppose we're building PLUTO v24.1.
Ready
and GIS team is added to the issueTheir QA review looks good. They put tag Passed for the child issue
Their QA review asks for changes --> They put tag Failed for the child issue --> They note required changes in the child issue --> we repeat the process for generating consecutive child issues.
Whiteboarding
Beta Was this translation helpful? Give feedback.
All reactions