-
Notifications
You must be signed in to change notification settings - Fork 476
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why inputs for TX are treated as Set? #4296
Comments
I don't know what you're referring to. Inputs are a list in |
Yeah, they are list in the TxInfo but they go over internal set... |
I think what you are saying is that you cannot predict the ordering of the list, because it's based on the set in the real transaction. |
That's right, this leads to some redundant checks or filter searches. That we could avoid if we were sure that order will be the same in validator as in the TX. |
So the inputs aren't sorted or anything like that? The order just matches the order in the transaction, which is determined by the submitter I assume? |
No, it does not match the order in the original serialization, but rather the internal sort order used by the ledger. |
Do you have any tips for where to look for this sort order? is it defined in https://github.com/input-output-hk/cardano-ledger or do I have to look through the node implementation to figure it out? |
@L-as the ordering is the lexicographic ordering on pairs |
I guess it would be in the spec? |
Sounds like this is the expected behavior of Plutus then. @michaelpj however given the confusion, maybe it's worth clarifying the alignment in Plutus docs too? Or was it already done at some point? |
I think this should be in the part of the ledger spec that explains how the script context is converted into Data, which doesn't currently exist. |
OK, I've opened a ticket on the ledger side and I'm closing this one. |
If we specify (as is currently true) that the transaction inputs in the script context are in lex order, then we are essentially promising that plutus scripts can depend on this. ie that we will never change it. with the advantage of make scripts more efficient. And a bug in the ledger code (breaking this promises, accidentally) would lead to problems in Plutus scripts. If we make it clear that we make no promises, then we give ourselves flexibility. at the cost of this filtering. In general, it's unclear to me where to draw the line on what ledger internals we should let Plutus scripts depend on. |
Well, we should specify what happens, because it's part of the observable behaviour of the chain. You could not write a competing node without reproducing this behaviour, therefore it should be specified. I don't think we have the liberty to declare that it's undefined behaviour. |
In this instance, I think it is fair, but in general, are we promising that everything that the ledger does for the script context is a promise forever? Ie the code is the true spec? I guess if, say, another implementation sent the inputs in a different order, we can never control what assumptions scripts might make, and such a divergence would lead to problems (even if the spec itself made it clear that it was undefined...). |
FWIW scripts rely on this behaviour to optimise operations on maps. |
Yes, we are. The script context object is observable by scripts, and so any changes to it can affect at least e.g. the budget and therefore whether or not scripts validate. This is why it's such a problem that we haven't specified it properly. |
Anytime we've discussed a change and noticed that it would be observable by scripts, we've gone out of our way to do something different which was not observable, even when is seemed extremely unlikely to ever manifest, so I'm absolutely sympathetic to the concern. But promising that we will never make a change observable by scripts is very daunting. It is daunting for two reasons:
The alternative that I see is this: write down an explicit list of assumptions that script authors can assume about the script context, write property tests for this list, and declare everything not on the list as undefined behavior that should not be relied upon. |
I think the alternative is to fully specify the creation of the script context. Note that doesn't mean specifying everything about the ledger, just how the script context is created. Since it mostly re-presents information from the transaction, I don't expect that to be so difficult, apart from the odd thing like ordering. I also just cannot see how we can ever say anything is "undefined behaviour". If behaviour is observable by scripts, and there are scripts in the history of the chain that depend on that to validate, then if we change that then we break history. It's just like everything else. |
Maybe it's not so bad, and my time is better spent attempting the task than chatting 😆
My concern is not about breaking history, indeed we cannot do that. I am worried about what we can and cannot change with a hard fork. |
FWIW Matthias is already annoyed that it's not specified, and I basically promised that I'd specify it for the new SOP version: cardano-foundation/CIPs#455 (comment) Let's talk about how to do the work best, but I think the Plutus team can chip in here.
Assuming we keep the old behaviour for old scripts then we can change whatever we like! This is covered by the ledger-script interface part of CIP-35, and I think it's fine to do in a new Plutus ledger language. We could do it on a protocol version change, but given that we need to retain the old behaviour anyway, I think we should generally do the usual thing and do it with a new Plutus ledger language. |
you make it sound so easy :lolsob: |
We are brothers in backwards-compatibility-suffering 😂 |
Imagine we have features If we promise to maintain all the current observable facts, then we are promising: "for any idea Of course I'm happy to make promises about all the On the one hand, I can hear:
But on the other hand, it sounds like exactly the kind of thing that usually bites us. And I think what you are proposing is something like a "reference implementation" of the script context creation, so that all the |
I think we should get together and figure out what we can and can't promise for the script context. Here are some examples of
I think the introduction of new features might be less problematic (we can always disallow old Plutus versions in scripts with new features), but I think that if we wanted to make the strongest possible promise here large parts of the ledger would be completely unchangeable forever. |
Ah, I think I see. Let me make it concrete: let's talk about the sorting order of the inputs. Suppose that they are initially in using ordering O1, and we want to make a change to O2. I think:
Scripts can't tell how much Ada is delegated to a pool, but I take your point. I didn't quite follow the other example but I agree it would be good to get together and think it through. |
I definitely appreciate you making a concrete example. but just to be clear to everyone following along, I do think that the ordering of the transaction inputs in the script context is a property that we could make about the V1 and V2 context for eternity. Unfortunately I don't have a better example in mind atm (hence the unknown unknowns). |
Was it an intentional decision somewhere to sort transactions in this way? I.e. is there a good reason why this was done, or is it just an accident of the datastructures used? Last I dug into it, there was an explicit step when building the script context to sort these inputs (though I'm no good at navigating haskell codebases, so I can't find it again), so it didn't feel like it was an artifact of "this is how the ledger stores things internally". For reference, this sort order has caused some consternation for at least two of the major DEX's on Cardano (SundaeSwap / MinSwap), and is limiting throughput. If you try to batch transactions, and do so in a way that preserves the on-chain order, this (quite surprising) sorting destroys that. The only solution is to add an extra field in the redeemer and sort them in the script (costly), or to segment the batching into monotonic subsequences, which hurts protocol throughput. I understand that the ship has sailed for PlutusV1/V2, likely (as, beyond even just execution units, the actual outcome of the transaction depends on this sort order now); but it'd be good to document somewhere not just that this is the behavior, but why: i.e. an accident of history, or xyz good reason. That way when designing future Plutus versions, we can either revisit that decision, or continue to intentionally benefit from xyz good reason. |
This is mostly an accident caused by a design decision made for Shelley (order didn't matter, so we put things in a set instead of a list). However, I still maintain that it's actually better for the ecosystem this way. There has been a bunch of discussion here: cardano-foundation/CIPs#231 |
It actually is how the ledger stores things internally. There are a few things going on here, I will try to explain. On the wire, using CBOR, the transaction inputs are just in a list:
(Arguably, we should have used the CBOR set tag, This list does not have to be sorted, but some services (like the CLI) might sort them anyway. Now, inside the transaction body, we store the transaction inputs as a
Do you have a minimal example of this? I feel like I might have read a post about this a while back... 🤔 |
relevant: #4753 |
Right; but there was also an explicit
Consider SundaeSwap:
Our pool contract then enforces that, when transactions are batched together, the appropriate amounts are paid out to the user; since the price shifts in between each execution, the order that these are executed matters: the user who executes second will get slightly less SUNDAE, because the purchase of SUNDAE has pushed the price up. Additionally, for fairness it's important that these be executed in the order they settled on-chain. So, if we build a transaction that has the inputs:
Which then get sorted, and executed out-of-order, essentially letting To solve this, scoopers break the stream of transactions into monotonically increasing subsequences: grab as many transactions as you can in their on-chain order, until one txID is less than the previous one. This results in less-than-optimal batching (on average splitting the batch size in half); This was such a subtle and nonobvious behavior that both MinSwap and us didn't notice this until after we had launched, when it was impossible to replace the contracts; WingRiders documented a solution where the order is provided in the redeemer, but this increases execution unit costs as you have to perform an additional sort over the inputs to "undo" the sorting that the node did when building the script context. In my opinion, though it seems the ship has already sailed on this, we should move this sorting behavior, if desirable, to the person building the transaction; If the protocol doesn't care about order, they can sort them by TxHash for the map-building effiency gains mentioned above; The argument that "scripts have to check every input anyway" doesn't hold much water for me, as it ignores the fact that the ordering contains useful information that is otherwise impossible to access on-chain; and it doesn't materially stop people from doing stupid things: someone can just as easily Perhaps as execution unit budgets increase and other languages with more efficient compilers come online, this will become a moot point as sorting a few inputs by redeemer will become cheap, but it very much is adding this annoying parabolic arc where many contracts will have to undo work that the node is doing. |
Nope, it's implicit due to being put into a I don't think the minor efficiency improvement is worth the extra complexity. |
Do you consider calling Note that by this point in the ledger, the user's wire order is long gone (of course we could have chosen to memoize it, but we don't, we supply the ordering give by Set, which is a part of the internal structure. we don't go out of our way to sort it).
Thank you for the example! Indeed I just assumed that the logic would be present in the redeemer. But I myself have never had to worry about writing optimized Plutus contracts and am probably ignorant of a lot of the struggles.
It's not insurmountable, of course, but it would be a much more invasive change than perhaps folks realize. (and indeed a lot interesting discussion is in the issues that @WhatisRT mentioned). |
@JaredCorduan no; I guess I'm mistaken. I'm not a Haskell developer so maybe I misread the code, or was looking in some other place. It's probably not relevant now. Re: invasive change; for sure, that's fair. All I can do is provide my limited perspective :) and from the outside, it seems like a violation of the general philosophy that's been taken elsewhere to bend over backwards to make sure the script context accurately represents the transaction on chain; for example, the incompatibility of PlutusV1 with even the mere presence of a reference input for the purpose of reference scripts, which you and I have talked about before 😅 So, not necessarily pushing strongly for a change, just putting our experience out there so it can feed into the complex decision making process you have to be responsible for ;) |
I can see how it could feel that way! We never considered the ordering of the transaction inputs on the wire to be semantically meaningful, which is why it was placed into a set right upon deserialization. it was defined as a set even before the first line of Haskell was ever written. but obviously I see now that the original ordering is meaningful to folks building on Cardano and I wish we could have taken that into account. As for the script contexts, we felt the decision was about things that really did matter a lot. In the circumstances where you have a square peg and an intentionally round hole, you have to pick between getting out the saw and rounding off the edges or giving up. With V1 we did the former and later regretted it. |
In my two year journey of building on Cardano using plutus and the ledger - writing scripts that are sensitive to sequence of inputs, modifying transactions and repairing redeemer pointers through these modifications - I often felt that the choice for IMO what actually matters is that inputs are unique in a transaction, but their order should not matter for the EUTxO ledger semantics. And with the example of @Quantumplation the order is even important in some cases. But, constraints liberate! Today no script or users in general can rely on the order of the inputs being like originally constructed because they might have been lex ordered by the ledger. Doesn't that mean that if we remove the ordering constraint, existing usages will be fine? EDIT: When typing out the last paragraph I realize - no. Users could rely on the fact that inputs are lex ordered of course. But, it would be less breakage as changing it in the other direction 🤷 |
Just to comment on this: the script context is supposed to accurately represent transactions, not the serialized form of transactions. Transaction inputs are a set. They're serialized in an order because they have to be, but that's an artifact of serialization. I think we are being consistent here, just perhaps not in the way you would like :) |
@michaelpj I guess that's fair :) |
That's true. But who said that the order of that set needs to be lex order and not a transaction-creator defined one? The order of transaction input has meaning if an application developer wants (or needs) to give it meaning.
To me, that we re-order inputs from what we are given (from the wire) seems to be the artifact here. Something implicitly happening because we used |
Sets don't have an order, that's what distinguishes them from unique lists. So from the perspective of the ledger there's no re-ordering going on: we have a set of Note that there's also a difference between an actual set and a Haskell |
I agree on that a set in math does not have an order. But I disagree that the ledger is not affected by the order. In fact, the ledger spec defines "the set of inputs is an ordered set". Only that way it can However, this issue is about users of the EUTxO model and its implementation on Cardano voicing a (IMO legitimate) request for preserving order of inputs as it conveys meaning. The question I care about is.. why are inputs not a unique list in the spec and implementation as @anton-k originally asked? This seems to have been discussed in cardano-foundation/CIPs#231, but to no conclusive outcome or is it just discarded because nobody wants to change it (it's likely quite involved)? In general, this makes me a sad user. |
I'd call this a misleading explanation. The intent here wasn't to suggest that there is some extra structure on this set, it's just to explain how this function works. Maybe I should go and fix that language. The reason for this is historic. For accounting you really only need a set of inputs, and that's still true today. From that point of view, an ordering on the inputs really is just metadata about the inputs, and it makes sense that such extra information should be provided by the redeemers. In fact, if you imagine two scripts in a single transaction that would require an order on the inputs, why should those orders be forced to be equal? If you want to support a use-case like that, you'd then have to re-implement a permutation in the redeemer anyways, but now the scripts have to fight on which one gets preferential treatment. From my perspective, it's really about complexity. There are many ways in which we can make writing scripts easier that add complexity, and almost all ledger features that increase complexity look like mistakes to me today. It's either a few script authors needing ordered inputs that have to pay the price, or ledger which would have to support this forever and now everyone has to pay the unknown price of a locked-in feature. To me this seems short-sighted, but I'm not a script author. |
Are transaction inputs provided to the script context as a set type, or as a list type? If as a list, maybe that's where the breakdown (such as it was, I'm not trying to be overly critical or annoying, just trying to increase empathy where I can) was, and why the behavior was so surprising to us and others. |
Scripts don't get to decide. They only verify whether the transaction is as they expect or not. The transaction creator (should) get to decide on the order and needs to do in a way to satisfy the scripts they want to redeem. If two scripts require ordered inputs, one ascending lex the other descending lex .. then obviously they are not compatible and will never be spent in the same transaction. This is does not work today, nor will it ever work - but it does not need to. What we (or me at least) care about is that the order of inputs is an interface between the transaction author and the script verifying the transaction. Note, that the ledger is completely out of the picture here. It would just represent the transaction as it was created, validating it by doing it's own checks and also calling the scripts. And as you say @WhatisRT, the ledger rules do not need ordered inputs, they merely need to be unique to - for example, to check they sum up to a balanced transaction. That's great! So the ledger would not care if we just keep the order as it was in the transaction? @Quantumplation To my understanding:
|
We don't have a native set type - in that respect it's similar to the transaction wire format, where the set of inputs is represented as a list (which is the source of all this confusion!).
Just to try and convey the conceptual difference that I think I see here. My interpretation of Sebastian's position:
My interpretation of my own/Jared/Andre's position:
I think this is the crux: otherwise we just keep talking past each other saying "why are you discarding features of 'the transaction'" and "those are not features of 'the transaction'". |
Great summary @michaelpj To add my own position, which is somewhere between the two:
|
for sure. above in this thread, you will see that @effectfully created an issue in the ledger repo for exactly this (and more!). the hard part for me is deciding a clean place to draw the line around what promises we can make about the script context. notice that and all of these wire concerns pale in comparison to the semantic ones, such as @WhatisRT mentioned above... |
@Quantumplation I think you make some good points about documenting things more cleanly. I don't actually know where script authors get their documentation from and I know that the specs can be quite dense. One thing that I'd like to have one day is a 'simplified spec' that only contains things script authors would care about, together with a proof that scripts that don't abuse certain features behave exactly the same on the real ledger as on the simplified ledger. That should make it easier to understand the ledger and to reason about correctness of scripts. Do you think that would help? (Only issue is that this is really a stretch goal and I'm currently pretty busy with all the Voltaire stuff, so it's unlikely to be started any time soon) @JaredCorduan I really like your point that even the fields in the transaction itself could be reordered to intend some meaning, I didn't even think about that. There are just so many places where the serialization can be changed without doing anything and we have to decide what to show to Plutus and what not. And once we show something to Plutus we're basically committed for doing that forever and we have to be incredibly careful with changing its semantics (since a change in semantics means possibly breaking assumptions a script made). The less we show to scripts, the freer we are to make changes in the future. |
@WhatisRT thank you! And, actually, I chose my wording very carefully: emphasizing them in ways that are discoverable to. Because the big myth that everyone falls for about documentation is thinking you can just write the be-all-end-all documentation and solve people's confusion... But then at the end you just have yet another thing for people to read that's slightly different from the rest. It's like that xkcd comic about standards. That's not to say that both efforts you describe wouldn't be Good Things ™️. But I'm suggesting there are ways to be more subtly pervasive than that too: integrate it into our APIs / SDKs, our tooling, etc. If plutus V1 had passed it to us as a set; if our testing framework randomized the order or sorted them like the node does; if we had tools to vizualise the transaction that somehow emphasized this, etc. Another example of this kind of surprising behavior is the mint-0-ada phenomenon 😅 Sundae was cursed to be early, so it's 1 Billion percent understandable that those things weren't in place back then with everything IOG had going on, and there are definitely ways we could have stumbled into this on our own. I'm not trying to play the blame game, just trying to emphasize how surprise can be bad, and help find ways to stop it from cropping up in future work (especially as we're about to make things way more complicated with input endorsers heh) |
I absolutely agree that discoverability is important, but I just make the rules and write them down, I have no idea what happens afterwards 😅 In fact, at some point in time I expected people to ask me whether some documentation they wrote is correct, but that never happened and at some point that expectation went away. So my approach to this is really to try and always make things as simple as possible (for many reasons, but including fewer misunderstandings). But even that isn't the answer and I've been horrified a bunch of times over the years by issues like the 0-ada thing you mentioned. I think this is particularly hard because the ledger isn't what people directly write their code against. The more layers you pile on top (or next to it), the harder it is to be sure that everything is right. So maybe the goal is to make the source of truth as attractive as possible, so people will want to sit close to it? |
Summary
If inputs are set it is impossible to rely on order of inputs.
this forces us to do redundant filter-like searches instead of just pattern matching on list of inputs.
In the case of outputs it's convemient that outputs are list and we can do
Instead of more resource heavy filtering:
but because inputs form not a list but set it can not be done for inputs.
This leads to redundant filtering.
Steps to reproduce the behavior
Actual Result
I'd like to be able to pattern match on list of inputs inside validator and rely on order.
It's true that spend inputs are unpredictable but we can move all arbitrary user inputs to the end of the list
and have easy to use access to inputs that spend scripts and which are useful for checks inside validator.
Expected Result
Describe the approach you would take to fix this
No response
System info
The text was updated successfully, but these errors were encountered: