Use architecture decision records #2048

michaelfig · 2020-11-23T18:34:42Z

Adopting this PR is intended to raise the bar for documenting significant architectural decisions across the Agoric SDK.
It is based on a very lightweight process addition.

@Agoric/internal PTAL and add review comments as desired.

dckc · 2020-11-23T18:57:57Z

-1

The team already has a healthy habit of documenting design issues. You could use one or more labels to help find more significant ones if you want.

counter-point: issues don't go into the git history
counter-counter-point: issue discussion is / can be still decentralized to some extent, since github sends and receives comments by email. (hint: it defaults to not sending comments to the author, but it's straightforward to flip the preference the other way. I sleep easier knowing I have copies of all my github comments in email.)
also: quite a bit of design rationale is in the source repository

The one way that I would like the issue documentation habit to get more healthy is for folks to get a habit of every time they say something like "we'll eventually have to X" or "we plan to X" to ensure that there's an open issue representing X and, ideally, cite / link to it. By way of example, @warner recently said "There will be an arbitrary number of cranks in each block (the quantity will depend upon some sort of gas usage that we haven't figured out yet ... )". But that can get out of hand too; spending too much time documenting issues can take away from higher priority work. But saying something like "IOU an issue" can acknowledge that the norm is for all "known unknowns" to be represented as issues.

As I say, the team already has pretty healthy habits in this area (though it relies a little more heavily on @warner 's super-human ability to retrospectively document design discussions than might be ideal).

Chris-Hibbert

This seems like a fine idea. A couple of suggestions:

Even at the scale of Agoric, we have different projects. The template should explicitly ask for it. If developers writing about Zoe or marshalling or the registrar don't explicitly give the decision context it will be harder to understand in the future.
Decisions will often have a context that includes previous decisions. There should be a section for saying what other ADRs provide context, are updated, or are overruled by a new one.

michaelfig · 2020-11-23T19:26:27Z

The team already has a healthy habit of documenting design issues. You could use one or more labels to help find more significant ones if you want.

I appreciate your comments. Yes, documenting things in terms of issues does make sense. Let's see how the rest of the team weighs in.

warner · 2020-11-23T19:39:01Z

I think I'm generally in favor.

So far, we've generally used GitHub Issues for this.. they do a fine (IMHO) job of hosting discussions and conclusions. The shortcoming that I keep running into is their intended lifetime. I want to be able to close tickets: the "steady state" of the project, if (haha) we ever finish implementing and fixing everything, should be zero open issues. So using Issues for long-term storage of important design discussion is troublesome. It's not like the Issue vanishes when it is closed, the text is still nominally available, but without some sort of index that tells a future reader which ones contain important discussion and which ones are just trivial bugfixes, it lacks discovery.

As an example, @FUDCo and I were just talking about how we retire Promise IDs after their Promise has been resolved. Six months ago we did a bunch of analysis and identified some of the constraints that a solution needed to meet. The notes on that went into a ticket (#1049) which was originally named "Mutually referential promises in arrays causes infinite kernel recursion" because that's the immediate bug which prompted the analysis. It was then renamed "Discussion of garbage collection of resolved kernel promises" because that's what really made up the bulk of the text. It was then closed in favor of a "what steps do we need to take to implement the proposed design" ticket (#1124), which is more obviously closeable when we finish taking those steps.

Yesterday, when this topic came up again and I was trying to remember what design we had come up with (and what were the constraints), my research approach was to page through all the closed tickets with label=SwingSet until I saw one with a familiar title. Another approach I sometimes use (especially in unfamiliar codebases) is to find the relevant code file, look at it's git history, identify the PRs which changed it, and then study any issues associated with those PRs.

The ADR scheme seems like it might be a way to put a summary of these discussions in a more central place. Keeping them next to the code feels like a good idea, although we need to decide on some practice to follow for when we update them. I'm a fan of requiring PRs to update both docs and code simultaneously (and I often practice writing the docs change first, under the philosophy that if you can't explain how it works or how to use it, then maybe you shouldn't implement it). But I know we don't all work the same way, and these ADRs would be a sort of third category of in-tree documents (1= how to use a given system or API, 2= how that system works internally, 3=ADR=rationale of why it works that way). So it's not as immediately obvious when the ADR should be updated relative to the code it justifies.

I suppose another option would be to include ticket URLs in our in-tree "how does it work" documents, as a form of index. We do something similar with release notes (or used to, or maybe still do depending upon how well our automation is working), where a downstream user who wants to upgrade and needs to know what has changed will look at a single well-known document for that list of changes, and each change will point to an Issue number or some PRs with which they might be able to find the discussion (this is usually more important in the other direction: I have a problem with some library, I search around for somebody talking about that problem, hopefully that discussion happens in a bug ticket, I find a linked PR, then I look in the ChangeLog to see whether the latest release would fix it).

That would lack the summary aspect, though (on the plus side, we aren't adding new work for developers to spend time writing up conclusions of a long discussion, on the minus side we are adding an archaeological task to the future readers of those discussions). So I think I'd prefer explicit ADRs over just adding URLs.

dckc · 2020-11-23T19:56:59Z

if @warner is in favor, that's good enough for me.

(I guess I should have voted -0)

FUDCo · 2020-11-23T20:08:19Z

This seems like it's just reinventing the RFC with slightly different terminology. That said, RFCs are good, and this seems good too. Like @warner described above, I have been frustrated by using issue tracking to record design history on account of issue search being terrible and also things falling out of the record when issues are closed.

The one (very, very minor) thing I didn't like in the writeup was the opening line:

ADR 1: Record architecture decisions

which makes my OCD hurt.

katelynsills · 2020-11-30T21:47:39Z

I think this is a great idea. To make this function smoothly, we might also want to:

Implement a light-weight design review process. For instance, ensure that every line of code in agoric-sdk, SES, and dapps has a team of at least two people associated with it (i.e. teams for Zoe, the wallet, etc). Then, whenever a proposed change would affect that code in a substantial way, an ADR doc needs to be written and the appropriate team needs to approve it. This would help ensure that there's agreement on the approach, and if there is disagreement, at least allow a clear step of Disagree and Commit that would allow the implementers of the design to go forward freely and peacefully without interruption or goal-post moving.
Figure out the threshold for requiring ADRs. How big does the change need to be?
Figure out the process of who develops something once it is decided to be a good design. For Zoe/ERTP tasks, we have a (mostly) clear process: work is coordinated during the weekly meeting. However, we don't have anything like that for wallet or cosmic-swingset work, as far as I know.

michaelfig added the documentation Improvements or additions to documentation label Nov 23, 2020

michaelfig self-assigned this Nov 23, 2020

Chris-Hibbert reviewed Nov 23, 2020

View reviewed changes

michaelfig added this to the MetaMask Agoric-Ethereum Bridge Demo milestone Nov 23, 2020

rowgraus removed this from the MetaMask Agoric-Ethereum Bridge Demo milestone Jan 21, 2021

michaelfig added 5 commits January 30, 2021 14:15

docs: create ADR-0001

706331c

docs: update status section

a8d2acb

docs: add adr-template.md

032119f

docs: explain the consequence of a simple starting point

d1b99f8

docs: add project and other relevant ADRs to Context

20fe6c9

michaelfig force-pushed the mfig/adr-1 branch from da60043 to 20fe6c9 Compare January 30, 2021 20:21

michaelfig merged commit e628351 into master Jan 30, 2021

michaelfig deleted the mfig/adr-1 branch January 30, 2021 20:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use architecture decision records #2048

Use architecture decision records #2048

michaelfig commented Nov 23, 2020

dckc commented Nov 23, 2020

Chris-Hibbert left a comment

michaelfig commented Nov 23, 2020

warner commented Nov 23, 2020

dckc commented Nov 23, 2020

FUDCo commented Nov 23, 2020

katelynsills commented Nov 30, 2020

Use architecture decision records #2048

Use architecture decision records #2048

Conversation

michaelfig commented Nov 23, 2020

dckc commented Nov 23, 2020

Chris-Hibbert left a comment

Choose a reason for hiding this comment

michaelfig commented Nov 23, 2020

warner commented Nov 23, 2020

dckc commented Nov 23, 2020

FUDCo commented Nov 23, 2020

katelynsills commented Nov 30, 2020