Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why inputs for TX are treated as Set? #4296

Closed
anton-k opened this issue Dec 17, 2021 · 51 comments
Closed

Why inputs for TX are treated as Set? #4296

anton-k opened this issue Dec 17, 2021 · 51 comments

Comments

@anton-k
Copy link
Contributor

anton-k commented Dec 17, 2021

Summary

If inputs are set it is impossible to rely on order of inputs.
this forces us to do redundant filter-like searches instead of just pattern matching on list of inputs.

In the case of outputs it's convemient that outputs are list and we can do

fooOut : barOut : bazOut : _ = txInfoOutputs info

Instead of more resource heavy filtering:

fooOut = head $ filter  fooPred (txInfoOutputs info)
barOut = head $ filter  barPred (txInfoOutputs info)
bazOut = head $ filter  bazPred (txInfoOutputs info)

but because inputs form not a list but set it can not be done for inputs.
This leads to redundant filtering.

Steps to reproduce the behavior


Actual Result

I'd like to be able to pattern match on list of inputs inside validator and rely on order.
It's true that spend inputs are unpredictable but we can move all arbitrary user inputs to the end of the list
and have easy to use access to inputs that spend scripts and which are useful for checks inside validator.

Expected Result


Describe the approach you would take to fix this

No response

System info


@anton-k anton-k added the bug label Dec 17, 2021
@michaelpj
Copy link
Contributor

I don't know what you're referring to. Inputs are a list in TxInfo. If you're referring to the fact that inputs in transactions themselves are a set, then you need to open an issue with the ledger, which I think is highly unlikely to change.

@anton-k
Copy link
Contributor Author

anton-k commented Dec 17, 2021

Yeah, they are list in the TxInfo but they go over internal set...

@michaelpj
Copy link
Contributor

I think what you are saying is that you cannot predict the ordering of the list, because it's based on the set in the real transaction.

@anton-k
Copy link
Contributor Author

anton-k commented Dec 17, 2021

That's right, this leads to some redundant checks or filter searches. That we could avoid if we were sure that order will be the same in validator as in the TX.

@L-as
Copy link
Contributor

L-as commented Dec 29, 2021

I think what you are saying is that you cannot predict the ordering of the list, because it's based on the set in the real transaction.

So the inputs aren't sorted or anything like that? The order just matches the order in the transaction, which is determined by the submitter I assume?

@michaelpj
Copy link
Contributor

No, it does not match the order in the original serialization, but rather the internal sort order used by the ledger.

@L-as
Copy link
Contributor

L-as commented Jan 4, 2022

Do you have any tips for where to look for this sort order? is it defined in https://github.com/input-output-hk/cardano-ledger or do I have to look through the node implementation to figure it out?

@JaredCorduan
Copy link
Contributor

@L-as the ordering is the lexicographic ordering on pairs (TransactionID, TransactionIndex). @michaelpj do you know if there is a place in our documentation where it would be helpful to include this detail?

@michaelpj
Copy link
Contributor

I guess it would be in the spec?

@effectfully
Copy link
Contributor

No, it does not match the order in the original serialization, but rather the internal sort order used by the ledger.

Sounds like this is the expected behavior of Plutus then.

@michaelpj however given the confusion, maybe it's worth clarifying the alignment in Plutus docs too? Or was it already done at some point?

@michaelpj
Copy link
Contributor

I think this should be in the part of the ledger spec that explains how the script context is converted into Data, which doesn't currently exist.

@effectfully
Copy link
Contributor

effectfully commented Feb 3, 2023

I think this should be in the part of the ledger spec that explains how the script context is converted into Data, which doesn't currently exist.

OK, I've opened a ticket on the ledger side and I'm closing this one.

@JaredCorduan
Copy link
Contributor

If we specify (as is currently true) that the transaction inputs in the script context are in lex order, then we are essentially promising that plutus scripts can depend on this. ie that we will never change it. with the advantage of make scripts more efficient. And a bug in the ledger code (breaking this promises, accidentally) would lead to problems in Plutus scripts.

If we make it clear that we make no promises, then we give ourselves flexibility. at the cost of this filtering.

In general, it's unclear to me where to draw the line on what ledger internals we should let Plutus scripts depend on.

@michaelpj
Copy link
Contributor

Well, we should specify what happens, because it's part of the observable behaviour of the chain. You could not write a competing node without reproducing this behaviour, therefore it should be specified. I don't think we have the liberty to declare that it's undefined behaviour.

@JaredCorduan
Copy link
Contributor

In this instance, I think it is fair, but in general, are we promising that everything that the ledger does for the script context is a promise forever? Ie the code is the true spec? I guess if, say, another implementation sent the inputs in a different order, we can never control what assumptions scripts might make, and such a divergence would lead to problems (even if the spec itself made it clear that it was undefined...).

@L-as
Copy link
Contributor

L-as commented Feb 4, 2023

FWIW scripts rely on this behaviour to optimise operations on maps.

@michaelpj
Copy link
Contributor

In this instance, I think it is fair, but in general, are we promising that everything that the ledger does for the script context is a promise forever? Ie the code is the true spec?

Yes, we are. The script context object is observable by scripts, and so any changes to it can affect at least e.g. the budget and therefore whether or not scripts validate. This is why it's such a problem that we haven't specified it properly.

@JaredCorduan
Copy link
Contributor

Anytime we've discussed a change and noticed that it would be observable by scripts, we've gone out of our way to do something different which was not observable, even when is seemed extremely unlikely to ever manifest, so I'm absolutely sympathetic to the concern.

But promising that we will never make a change observable by scripts is very daunting. It is daunting for two reasons:

  • The ledger is comprised of many lines of code, what human could say with the utmost confidence that they could list out every observable fact about the script context? How would anyone ever know if they've exhausted the list? I do think I've got most (nearly all?) of it in my head, Peter and I came up with a pretty good list, but there will always be a lingering doubt that we've missed something.
  • We are in the Rumsfeld "unknown unknowns" territory, making promises about a future that is hard to predict while ruling out future possibilities. (I do realize that we can have new assumptions in new versions of Plutus, but it can still be extremely difficult to maintain invariants for the existing languages that have no expiration).

The alternative that I see is this: write down an explicit list of assumptions that script authors can assume about the script context, write property tests for this list, and declare everything not on the list as undefined behavior that should not be relied upon.

@michaelpj
Copy link
Contributor

The alternative that I see is this: write down an explicit list of assumptions that script authors can assume about the script context, write property tests for this list, and declare everything not on the list as undefined behavior that should not be relied upon.

I think the alternative is to fully specify the creation of the script context. Note that doesn't mean specifying everything about the ledger, just how the script context is created. Since it mostly re-presents information from the transaction, I don't expect that to be so difficult, apart from the odd thing like ordering.

I also just cannot see how we can ever say anything is "undefined behaviour". If behaviour is observable by scripts, and there are scripts in the history of the chain that depend on that to validate, then if we change that then we break history. It's just like everything else.

@JaredCorduan
Copy link
Contributor

I think the alternative is to fully specify the creation of the script context. Note that doesn't mean specifying everything about the ledger, just how the script context is created. Since it mostly re-presents information from the transaction, I don't expect that to be so difficult, apart from the odd thing like ordering.

Maybe it's not so bad, and my time is better spent attempting the task than chatting 😆

I also just cannot see how we can ever say anything is "undefined behaviour". If behaviour is observable by scripts, and there are scripts in the history of the chain that depend on that to validate, then if we change that then we break history. It's just like everything else.

My concern is not about breaking history, indeed we cannot do that. I am worried about what we can and cannot change with a hard fork.

@michaelpj
Copy link
Contributor

Maybe it's not so bad, and my time is better spent attempting the task than chatting

FWIW Matthias is already annoyed that it's not specified, and I basically promised that I'd specify it for the new SOP version: cardano-foundation/CIPs#455 (comment)

Let's talk about how to do the work best, but I think the Plutus team can chip in here.

I am worried about what we can and cannot change with a hard fork.

Assuming we keep the old behaviour for old scripts then we can change whatever we like! This is covered by the ledger-script interface part of CIP-35, and I think it's fine to do in a new Plutus ledger language. We could do it on a protocol version change, but given that we need to retain the old behaviour anyway, I think we should generally do the usual thing and do it with a new Plutus ledger language.

@JaredCorduan
Copy link
Contributor

Assuming we keep the old behaviour for old scripts then we can change whatever we like!

you make it sound so easy :lolsob:

@michaelpj
Copy link
Contributor

We are brothers in backwards-compatibility-suffering 😂

@JaredCorduan
Copy link
Contributor

Assuming we keep the old behaviour for old scripts then we can change whatever we like!

Imagine we have features a, b, and c in ledger that make their way into the script context. there are all kinds of properties P(a, b, c) that might hold for Plutus VX scripts. Imagine now adding feature d to the ledger. even if d is not used anywhere in some Plutus VX script, P(a, b, c) could fail in the presence of d in the ledger if we do not go out of our way to preserve it.

If we promise to maintain all the current observable facts, then we are promising: "for any idea d that we every come up with in the future, and any property P(a, b, c) of the current ledger, P(a, b, c) will still hold with the introduction of d. (I should probably use some notation like a(d) or something that makes a now depend on d or something...).

Of course I'm happy to make promises about all the Ps that I can think of (assuming they are not too ridiculous), but promises about the ones I've yet to think of are worrisome.

On the one hand, I can hear:

yeesh, Jared, you are making this way more complicated than it needs to be, it's just a simple translation.

But on the other hand, it sounds like exactly the kind of thing that usually bites us.

And I think what you are proposing is something like a "reference implementation" of the script context creation, so that all the Ps become implicit (reducing the infinite to the finite). But even then I worry that this could be too much of a simplification of the real story, and that tendrils of the provenance will find their way all the way back to the wire...

@WhatisRT
Copy link

WhatisRT commented Feb 7, 2023

I think we should get together and figure out what we can and can't promise for the script context. Here are some examples of Ps where weird things happen:

  • We want to remove pointer addresses by turning them into enterprise addresses, but we still have to support them because some script might rely on them. But now we've changed semantics: Let's say a script allows minting some tokens once per epoch proportional to the amount of Ada delegated to some pool. If a pointer is allowed here, this change has now changed the semantics: you'll get the tokens, but the pool owner doesn't get your stake.

  • Let's say we didn't need to keep track of registered reward accounts anymore, and we decide to get rid of the certificates. Well, now some script contexts can't be constructed anymore, so maybe we can't do that and still have to be able to make them, but just as dummies. But now there are some Plutus scripts that might have more steps that they could take, since they can track parts the state of this registry internally. So, if we want to preserve the semantics, we have to keep the current system in place forever.

I think the introduction of new features might be less problematic (we can always disallow old Plutus versions in scripts with new features), but I think that if we wanted to make the strongest possible promise here large parts of the ledger would be completely unchangeable forever.

@michaelpj
Copy link
Contributor

Ah, I think I see. Let me make it concrete: let's talk about the sorting order of the inputs. Suppose that they are initially in using ordering O1, and we want to make a change to O2.

I think:

  • We cannot say "haha, the ordering is unspecified, we're just going to change it silently and break literally everything"
  • We can say "we're going to change the ordering at a particular point using our usual mechanism for backwards compatibility"
  • We don't have to say "we'll maintain that particular ordering forever"

Let's say a script allows minting some tokens once per epoch proportional to the amount of Ada delegated to some pool

Scripts can't tell how much Ada is delegated to a pool, but I take your point. I didn't quite follow the other example but I agree it would be good to get together and think it through.

@JaredCorduan
Copy link
Contributor

Let me make it concrete: let's talk about the sorting order of the inputs. Suppose that they are initially in using ordering O1, and we want to make a change to O2.

I definitely appreciate you making a concrete example. but just to be clear to everyone following along, I do think that the ordering of the transaction inputs in the script context is a property that we could make about the V1 and V2 context for eternity. Unfortunately I don't have a better example in mind atm (hence the unknown unknowns).

@Quantumplation
Copy link
Contributor

Was it an intentional decision somewhere to sort transactions in this way? I.e. is there a good reason why this was done, or is it just an accident of the datastructures used?

Last I dug into it, there was an explicit step when building the script context to sort these inputs (though I'm no good at navigating haskell codebases, so I can't find it again), so it didn't feel like it was an artifact of "this is how the ledger stores things internally".

For reference, this sort order has caused some consternation for at least two of the major DEX's on Cardano (SundaeSwap / MinSwap), and is limiting throughput. If you try to batch transactions, and do so in a way that preserves the on-chain order, this (quite surprising) sorting destroys that.

The only solution is to add an extra field in the redeemer and sort them in the script (costly), or to segment the batching into monotonic subsequences, which hurts protocol throughput.

I understand that the ship has sailed for PlutusV1/V2, likely (as, beyond even just execution units, the actual outcome of the transaction depends on this sort order now); but it'd be good to document somewhere not just that this is the behavior, but why: i.e. an accident of history, or xyz good reason.

That way when designing future Plutus versions, we can either revisit that decision, or continue to intentionally benefit from xyz good reason.

@WhatisRT
Copy link

WhatisRT commented Feb 8, 2023

This is mostly an accident caused by a design decision made for Shelley (order didn't matter, so we put things in a set instead of a list). However, I still maintain that it's actually better for the ecosystem this way. There has been a bunch of discussion here: cardano-foundation/CIPs#231

@JaredCorduan
Copy link
Contributor

Last I dug into it, there was an explicit step when building the script context to sort these inputs (though I'm no good at navigating haskell codebases, so I can't find it again), so it didn't feel like it was an artifact of "this is how the ledger stores things internally".

It actually is how the ledger stores things internally.

There are a few things going on here, I will try to explain.

On the wire, using CBOR, the transaction inputs are just in a list:

(Arguably, we should have used the CBOR set tag, #258, but we didn't.)

This list does not have to be sorted, but some services (like the CLI) might sort them anyway.

Now, inside the transaction body, we store the transaction inputs as a Set. We think of them semantically as sets, and we also use Haskell's Data.Set. I (we?) don't think of sets as inherently ordered, so in order to provide an index for the transaction input, we needed something, and lex ordering seemed a fine choice (though with hindsight 20/20 we should have at least more carefully considered the serialization ordering. or more like hindsight 2018 😆 ). The code for placing new outputs in the UTxO has been basically unchanged since Shelley. Moreover, this is the same order that they are now passed to the script context. And, very importantly, this is the order that the redeemers need to use.


If you try to batch transactions, and do so in a way that preserves the on-chain order, this (quite surprising) sorting destroys that.

Do you have a minimal example of this? I feel like I might have read a post about this a while back... 🤔

@L-as
Copy link
Contributor

L-as commented Feb 8, 2023

relevant: #4753

@Quantumplation
Copy link
Contributor

@JaredCorduan
It actually is how the ledger stores things internally.

Right; but there was also an explicit sort call when building the ScriptContext, IIRC.

Do you have a minimal example of this? I feel like I might have read a post about this a while back... thinking

Consider SundaeSwap:

  • User A pays funds to swap 100 ADA to SUNDAE into our "escrow" contract at TxOut 9bd...#0
  • User B pays funds to swap 100 ADA to SUNDAE into our "escrow" contract at TxOut 1ef...#0

Our pool contract then enforces that, when transactions are batched together, the appropriate amounts are paid out to the user; since the price shifts in between each execution, the order that these are executed matters: the user who executes second will get slightly less SUNDAE, because the purchase of SUNDAE has pushed the price up.

Additionally, for fairness it's important that these be executed in the order they settled on-chain.

So, if we build a transaction that has the inputs:

 - `9bd...#0`
 - `1ef...#0`

Which then get sorted, and executed out-of-order, essentially letting 1ef...#0 skip the line. This opens the protocol up to a sort of "proof of work" attack to enable frontrunning.

To solve this, scoopers break the stream of transactions into monotonically increasing subsequences: grab as many transactions as you can in their on-chain order, until one txID is less than the previous one. This results in less-than-optimal batching (on average splitting the batch size in half);

This was such a subtle and nonobvious behavior that both MinSwap and us didn't notice this until after we had launched, when it was impossible to replace the contracts; WingRiders documented a solution where the order is provided in the redeemer, but this increases execution unit costs as you have to perform an additional sort over the inputs to "undo" the sorting that the node did when building the script context.

In my opinion, though it seems the ship has already sailed on this, we should move this sorting behavior, if desirable, to the person building the transaction; If the protocol doesn't care about order, they can sort them by TxHash for the map-building effiency gains mentioned above; The argument that "scripts have to check every input anyway" doesn't hold much water for me, as it ignores the fact that the ordering contains useful information that is otherwise impossible to access on-chain; and it doesn't materially stop people from doing stupid things: someone can just as easily find the first input with a token as they could depend on the first input.

Perhaps as execution unit budgets increase and other languages with more efficient compilers come online, this will become a moot point as sorting a few inputs by redeemer will become cheap, but it very much is adding this annoying parabolic arc where many contracts will have to undo work that the node is doing.

@L-as
Copy link
Contributor

L-as commented Feb 8, 2023

Right; but there was also an explicit sort call when building the ScriptContext, IIRC.

Nope, it's implicit due to being put into a Map, see https://github.com/input-output-hk/cardano-ledger/blob/ac405a977557a7c58ce1cf69d3c2a0bf148cf19f/eras/alonzo/impl/src/Cardano/Ledger/Alonzo/TxInfo.hs#L371

I don't think the minor efficiency improvement is worth the extra complexity.

@JaredCorduan
Copy link
Contributor

JaredCorduan commented Feb 8, 2023

Right; but there was also an explicit sort call when building the ScriptContext, IIRC.

Do you consider calling fromList on Set an explicit sort?

https://github.com/input-output-hk/cardano-ledger/blob/ac405a977557a7c58ce1cf69d3c2a0bf148cf19f/eras/babbage/impl/src/Cardano/Ledger/Babbage/TxInfo.hs#L203

Note that by this point in the ledger, the user's wire order is long gone (of course we could have chosen to memoize it, but we don't, we supply the ordering give by Set, which is a part of the internal structure. we don't go out of our way to sort it).


WingRiders documented a solution where the order is provided in the redeemer, but this increases execution unit costs as you have to perform an additional sort over the inputs to "undo" the sorting that the node did when building the script context.

Thank you for the example! Indeed I just assumed that the logic would be present in the redeemer. But I myself have never had to worry about writing optimized Plutus contracts and am probably ignorant of a lot of the struggles.

In my opinion, though it seems the ship has already sailed on this, we should move this sorting behavior, if desirable, to the person building the transaction;

It's not insurmountable, of course, but it would be a much more invasive change than perhaps folks realize. (and indeed a lot interesting discussion is in the issues that @WhatisRT mentioned).

@Quantumplation
Copy link
Contributor

@JaredCorduan no; I guess I'm mistaken. I'm not a Haskell developer so maybe I misread the code, or was looking in some other place. It's probably not relevant now.

Re: invasive change; for sure, that's fair. All I can do is provide my limited perspective :) and from the outside, it seems like a violation of the general philosophy that's been taken elsewhere to bend over backwards to make sure the script context accurately represents the transaction on chain; for example, the incompatibility of PlutusV1 with even the mere presence of a reference input for the purpose of reference scripts, which you and I have talked about before 😅

So, not necessarily pushing strongly for a change, just putting our experience out there so it can feed into the complex decision making process you have to be responsible for ;)

@JaredCorduan
Copy link
Contributor

JaredCorduan commented Feb 9, 2023

Re: invasive change; for sure, that's fair. All I can do is provide my limited perspective :) and from the outside, it seems like a violation of the general philosophy that's been taken elsewhere to bend over backwards to make sure the script context accurately represents the transaction on chain; for example, the incompatibility of PlutusV1 with even the mere presence of a reference input for the purpose of reference scripts, which you and I have talked about before

I can see how it could feel that way! We never considered the ordering of the transaction inputs on the wire to be semantically meaningful, which is why it was placed into a set right upon deserialization. it was defined as a set even before the first line of Haskell was ever written. but obviously I see now that the original ordering is meaningful to folks building on Cardano and I wish we could have taken that into account.

As for the script contexts, we felt the decision was about things that really did matter a lot. In the circumstances where you have a square peg and an intentionally round hole, you have to pick between getting out the saw and rounding off the edges or giving up. With V1 we did the former and later regretted it.

@ch1bo
Copy link

ch1bo commented Feb 9, 2023

In my two year journey of building on Cardano using plutus and the ledger - writing scripts that are sensitive to sequence of inputs, modifying transactions and repairing redeemer pointers through these modifications - I often felt that the choice for Set was with the right intention, but with too big of consequences and frankly a real pain in the neck.

IMO what actually matters is that inputs are unique in a transaction, but their order should not matter for the EUTxO ledger semantics. And with the example of @Quantumplation the order is even important in some cases.

But, constraints liberate! Today no script or users in general can rely on the order of the inputs being like originally constructed because they might have been lex ordered by the ledger. Doesn't that mean that if we remove the ordering constraint, existing usages will be fine?

EDIT: When typing out the last paragraph I realize - no. Users could rely on the fact that inputs are lex ordered of course. But, it would be less breakage as changing it in the other direction 🤷

@michaelpj
Copy link
Contributor

it seems like a violation of the general philosophy that's been taken elsewhere to bend over backwards to make sure the script context accurately represents the transaction on chain

Just to comment on this: the script context is supposed to accurately represent transactions, not the serialized form of transactions. Transaction inputs are a set. They're serialized in an order because they have to be, but that's an artifact of serialization. I think we are being consistent here, just perhaps not in the way you would like :)

@Quantumplation
Copy link
Contributor

@michaelpj I guess that's fair :)

@ch1bo
Copy link

ch1bo commented Feb 14, 2023

Transaction inputs are a set.

That's true. But who said that the order of that set needs to be lex order and not a transaction-creator defined one?

The order of transaction input has meaning if an application developer wants (or needs) to give it meaning.

that's an artifact of serialization

To me, that we re-order inputs from what we are given (from the wire) seems to be the artifact here. Something implicitly happening because we used Set TxIn with some Ord TxIn instead of a nubbed [TxIn].

@WhatisRT
Copy link

WhatisRT commented Feb 14, 2023

Transaction inputs are a set.

That's true. But who said that the order of that set needs to be lex order and not a transaction-creator defined one?

Sets don't have an order, that's what distinguishes them from unique lists. So from the perspective of the ledger there's no re-ordering going on: we have a set of TxIn, and we need to give it to Plutus via (effectively) some serialization that's inherently ordered, so we need to pick one.

Note that there's also a difference between an actual set and a Haskell Set (which is inherently ordered since Haskell doesn't have quotients), and when we're talking about sets we mean the regular kind that you'd encounter in math. Everything else is just implementation details.

@ch1bo
Copy link

ch1bo commented Feb 14, 2023

Sets don't have an order, that's what distinguishes them from unique lists. So from the perspective of the ledger there's no re-ordering going on...

I agree on that a set in math does not have an order. But I disagree that the ledger is not affected by the order.

In fact, the ledger spec defines "the set of inputs is an ordered set". Only that way it can indexof the powerset of tx inputs to determine / resolve redeemer pointers. See for example subsection 4.1 in https://github.com/input-output-hk/cardano-ledger/releases/download/cardano-ledger-spec-2023-01-18/alonzo-ledger.pdf

However, this issue is about users of the EUTxO model and its implementation on Cardano voicing a (IMO legitimate) request for preserving order of inputs as it conveys meaning.

The question I care about is.. why are inputs not a unique list in the spec and implementation as @anton-k originally asked?

This seems to have been discussed in cardano-foundation/CIPs#231, but to no conclusive outcome or is it just discarded because nobody wants to change it (it's likely quite involved)?

In general, this makes me a sad user.

@WhatisRT
Copy link

In fact, the ledger spec defines "the set of inputs is an ordered set".

I'd call this a misleading explanation. The intent here wasn't to suggest that there is some extra structure on this set, it's just to explain how this function works. Maybe I should go and fix that language.

The reason for this is historic. For accounting you really only need a set of inputs, and that's still true today. From that point of view, an ordering on the inputs really is just metadata about the inputs, and it makes sense that such extra information should be provided by the redeemers. In fact, if you imagine two scripts in a single transaction that would require an order on the inputs, why should those orders be forced to be equal? If you want to support a use-case like that, you'd then have to re-implement a permutation in the redeemer anyways, but now the scripts have to fight on which one gets preferential treatment.

From my perspective, it's really about complexity. There are many ways in which we can make writing scripts easier that add complexity, and almost all ledger features that increase complexity look like mistakes to me today. It's either a few script authors needing ordered inputs that have to pay the price, or ledger which would have to support this forever and now everyone has to pay the unknown price of a locked-in feature. To me this seems short-sighted, but I'm not a script author.

@Quantumplation
Copy link
Contributor

Are transaction inputs provided to the script context as a set type, or as a list type? If as a list, maybe that's where the breakdown (such as it was, I'm not trying to be overly critical or annoying, just trying to increase empathy where I can) was, and why the behavior was so surprising to us and others.

@ch1bo
Copy link

ch1bo commented Feb 15, 2023

In fact, if you imagine two scripts in a single transaction that would require an order on the inputs, why should those orders be forced to be equal?

Scripts don't get to decide. They only verify whether the transaction is as they expect or not. The transaction creator (should) get to decide on the order and needs to do in a way to satisfy the scripts they want to redeem. If two scripts require ordered inputs, one ascending lex the other descending lex .. then obviously they are not compatible and will never be spent in the same transaction. This is does not work today, nor will it ever work - but it does not need to. What we (or me at least) care about is that the order of inputs is an interface between the transaction author and the script verifying the transaction.

Note, that the ledger is completely out of the picture here. It would just represent the transaction as it was created, validating it by doing it's own checks and also calling the scripts. And as you say @WhatisRT, the ledger rules do not need ordered inputs, they merely need to be unique to - for example, to check they sum up to a balanced transaction. That's great! So the ledger would not care if we just keep the order as it was in the transaction?

@Quantumplation To my understanding:

@michaelpj
Copy link
Contributor

Are transaction inputs provided to the script context as a set type, or as a list type?

We don't have a native set type - in that respect it's similar to the transaction wire format, where the set of inputs is represented as a list (which is the source of all this confusion!).

Note, that the ledger is completely out of the picture here. It would just represent the transaction as it was created, validating it by doing it's own checks and also calling the scripts.

Just to try and convey the conceptual difference that I think I see here.

My interpretation of Sebastian's position:

  1. A transaction is the wire format.
  2. The author of a transaction is the creator of the wire format, and all the details represent their intention.
  3. Therefore, all the details that were specified in the wire format should be preserved, as they represent the intention of the transaction author.

My interpretation of my own/Jared/Andre's position:

  1. A transaction is the structure used by the ledger. The wire format is a serialization format for transactions, and has no special status other than that. Other serialization formats could be used and would be equally valid.
  2. The author of a transaction can/should construct a value that corresponds to a ledger transaction, e.g. with a set of inputs. It is not their concern how it is serialised, that should be done by a library, and the details of serialization do not represent their intention. Authors who are constructing the wire format themselves are weird.
  3. Therefore, details that are specified only in the wire format are incidental and can (should) be discarded, since they're just an artifact of serialization.

I think this is the crux: otherwise we just keep talking past each other saying "why are you discarding features of 'the transaction'" and "those are not features of 'the transaction'".

@Quantumplation
Copy link
Contributor

Great summary @michaelpj

To add my own position, which is somewhere between the two:

  • I think that it's fair to draw the distinction between wire format and conceptual data structures, as you do
  • However, I'd emphasize that the on-chain format is the primary interface for external parties like dApp developers
  • those external users are rarely engaging directly with the exact specification or data structures used in the ledger code
  • Thus, any differences between the wire format and the conceptual format can be extra surprising
  • some acknowledgement of / empathy for that fact, and extra care in identifying those differences in the future and emphasizing them in ways that are discoverable to those external consumers, would go a long way
  • otherwise those external parties can come away feeling a little gaslit 😅
  • additionally and separately, I think Sebastian and I are both arguing that it would be a useful change to the specification to document that, after some point in time, the order of the inputs become one of the explicit properties of the ledger state; it might not be useful for checking the accounting of the ledger, but I've brought you at least one example where it's useful for checking the overall transaction validity

@JaredCorduan
Copy link
Contributor

some acknowledgement of / empathy for that fact, and extra care in identifying those differences in the future and emphasizing them in ways that are discoverable to those external consumers, would go a long way

for sure. above in this thread, you will see that @effectfully created an issue in the ledger repo for exactly this (and more!). the hard part for me is deciding a clean place to draw the line around what promises we can make about the script context. notice that txInfoInputs is not the only thing that has a CBOR ordering which is not preserved: there is also txInfoReferenceInputs and txInfoSignatories ("well, if the Principle Investigator signed off on this, that's all I care about", etc 😁 ) and all the maps (which are association lists in both CBOR and in plutus, but are partial functions in the spec). Notice also that the validity window is actually two separate fields on the wire, the ttl and the start time, so in theory (where I am putting on my mathematician hat that cares about strict definitions, not likelihood) someone could hope to derive meaning from which field is placed first in the transaction body. etc. Of course we are not trying to be obnoxious here, but clean guiding principles are our survival strategy against death by complexity. If we did want to move the inputs to a unique list, we'd need a clear migration strategy for the redeemer pointers, so that we are not silently changing the semantics of them (certainly doable, this is just a small example of the complexity creep).

and all of these wire concerns pale in comparison to the semantic ones, such as @WhatisRT mentioned above...

@WhatisRT
Copy link

@Quantumplation I think you make some good points about documenting things more cleanly. I don't actually know where script authors get their documentation from and I know that the specs can be quite dense. One thing that I'd like to have one day is a 'simplified spec' that only contains things script authors would care about, together with a proof that scripts that don't abuse certain features behave exactly the same on the real ledger as on the simplified ledger. That should make it easier to understand the ledger and to reason about correctness of scripts. Do you think that would help? (Only issue is that this is really a stretch goal and I'm currently pretty busy with all the Voltaire stuff, so it's unlikely to be started any time soon)

@JaredCorduan I really like your point that even the fields in the transaction itself could be reordered to intend some meaning, I didn't even think about that.

There are just so many places where the serialization can be changed without doing anything and we have to decide what to show to Plutus and what not. And once we show something to Plutus we're basically committed for doing that forever and we have to be incredibly careful with changing its semantics (since a change in semantics means possibly breaking assumptions a script made). The less we show to scripts, the freer we are to make changes in the future.

@Quantumplation
Copy link
Contributor

@WhatisRT thank you! And, actually, I chose my wording very carefully: emphasizing them in ways that are discoverable to. Because the big myth that everyone falls for about documentation is thinking you can just write the be-all-end-all documentation and solve people's confusion... But then at the end you just have yet another thing for people to read that's slightly different from the rest. It's like that xkcd comic about standards. That's not to say that both efforts you describe wouldn't be Good Things ™️. But I'm suggesting there are ways to be more subtly pervasive than that too: integrate it into our APIs / SDKs, our tooling, etc. If plutus V1 had passed it to us as a set; if our testing framework randomized the order or sorted them like the node does; if we had tools to vizualise the transaction that somehow emphasized this, etc.

Another example of this kind of surprising behavior is the mint-0-ada phenomenon 😅

Sundae was cursed to be early, so it's 1 Billion percent understandable that those things weren't in place back then with everything IOG had going on, and there are definitely ways we could have stumbled into this on our own. I'm not trying to play the blame game, just trying to emphasize how surprise can be bad, and help find ways to stop it from cropping up in future work (especially as we're about to make things way more complicated with input endorsers heh)

@WhatisRT
Copy link

I absolutely agree that discoverability is important, but I just make the rules and write them down, I have no idea what happens afterwards 😅 In fact, at some point in time I expected people to ask me whether some documentation they wrote is correct, but that never happened and at some point that expectation went away. So my approach to this is really to try and always make things as simple as possible (for many reasons, but including fewer misunderstandings). But even that isn't the answer and I've been horrified a bunch of times over the years by issues like the 0-ada thing you mentioned.

I think this is particularly hard because the ledger isn't what people directly write their code against. The more layers you pile on top (or next to it), the harder it is to be sure that everything is right. So maybe the goal is to make the source of truth as attractive as possible, so people will want to sit close to it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants