---
CIP: ?
Title: Merkelised Plutus Scripts
Authors: Las Safin <las@mlabs.city>
Status: Draft
Type: Standards
Created: 2022-11-29
License: <CC-BY-4.0>
---
Currently, the hash of a script is simply the hash of its serialisation. This CIP proposes changing this such that the hash of a script (term) is a function of its immediate children's hashes, forming a Merkle Tree from the AST. This allows one to shallowly verify a script's hash, and is useful on Cardano, because it allows scripts to check that a script hash is an instantiation of a parameterised script.
In addition, a blake2b_224
built-in function must be added.
This is inspired by BIP-144, but the motivations are very different.
Given some core logic expressible as a script, it is common to have parameters in the form of constants, e.g. fees, references to other scripts, magical numbers.
These parameters can either be put in a datum somewhere, or can be put into the script itself, either by inlining them, or applying the unapplied script to the constants.
On-chain it is currently hard to check that one script is an applied form of another script. In cases where that is necessary, datums are instead used.
By Merkelising the hashing, we make this possible, which unlocks checking that a script is an application of another script to some parameter.
Example reasons to apply the parameters to the script:
- Staking validators currently don't support datums, and all staking validators share a single rewards account. Allowing checking applied parameters makes staking validators much more powerful. (More about this below)
- Constants can be included in reference script, leading to less CPU and memory used, since they don't have to be parsed from the adjacent datum (somewhat cheap) or the script context (very expensive).
- A script address + datum can't fit in an address, if you want that you also need this (or need to change what an address is).
The hash of a script will be derived directly from the AST, rather than its serialisation. Currently, its formed by hashing the serialisation prefixed with a byte that represents its version, e.g. 0x02 for Plutus V2.
The hash of a script becomes the hash of the prefix version annotation prepended to the hash of the term.
Term
currently has 8 constructors. On-chain, annotations are always the unit type,
and are hence ignored for this algorithm. Each case/constructor is generally handled by
hashing the concatenation of a prefix (single byte corresponding to the
constructor index) along with the hashes of the arguments passed to the constructor.
Similar code can be found in Plutarch (for a slightly different AST).
To avoid giving a single script two hashes, this system must be used (exclusively) since at least a version after Plutus V2.
The algorithm for checking a script hash against a supplied script (of a new version) in the ledger will change slightly: rather than hashing the supplied serialised script directly, the decoding of the serialised script must be hashed. (NB: the hashing and decoding can be fused to avoid intermediary structures.)
To allow computing the hash in scripts, we must support blake2b-224
in Plutus scripts
as it's what is currently used. This algorithm used might change in the future, but that is
not relevant for this CIP.
Since there are no children, the hash of the Error
term is the
hash of the prefix byte for the Error
constructor.
You could theoretically choose any random number as the hash,
but it has to be proven to be random, hence hashing the prefix byte
is the best option.
In pseudocode: hash prefix
The hash of a Builtin
is the hash of the prefix prepended to the base-256 encoded
(i.e. serialised to bytestring) index of the built-in function.
Because there are less than 256 built-ins, this is currently the same
as hashing the prefix byte prepended to the byte containing the index of
the built-in.
Var
is handled the exact same way (with a different prefix),
but it's in this case feasible for the index to be more than 255.
In pseudocode: hash $ prefix <> serialiseBase256 index
These are hashed by hashing the result of prepending the prefix byte to the concatenation of the hashes of the children.
In pseudocode: hash $ foldl' (<>) prefix (hash <$> children)
This works the exact same way as above, notably, the name is excluded as it's a constant in the de-Bruijn encoding.
In pseudocode: hash $ prefix <> hash body
The universe of types used on-chain is always DefaultUni
.
Each possible data type is handled differently, with each having
a different prefix. The total number of prefixes does not exceed
255. If it did, the prefix would have to be increased to two bytes.
In addition: Negative integers and non-negative integers have separate prefixes. False and True also have separate prefixes.
The serialisation according to CIP-58, prefixed with the two-byte prefix, is hashed.
In pseudocode: hash $ prefix <> prefix' <> serialiseCIP58 n
The same algorithm as above is used, but the number hashed is 1 - n
.
In pseudocode: hash $ prefix <> prefix' <> serialiseCIP58 (1 - n)
The bytestring is hashed as-is. We use the blake2b-256 hash here, such that we can usefully check that the script refers to a bytestring that we know only the hash of.
In pseudocode: hash $ prefix <> blake2b_256 bs
The flat-encoding is hashed.
In pseudocode: hash $ prefix <> flat x
Lists and pairs are hashed like a Merkle tree, much the same way that terms are. The children have a known type, and are hashed according to how that type should be hashed, i.e. with the correct algorithm and prefix.
In pseudocode: hash $ foldl' (<>) prefix (hash <$> children)
Each has its own separate prefix, like Error
, hence:
In pseudocode: hash prefix
The CBOR
encoding is used, notably, it must be compatible with the serialiseData
built-in to be useful on-chain.
We use the blake2b-256 hash here, such that we can usefully check that
the script refers to a datum that we know only the hash of.
If the hashing algorithm for data changes, we must also change it here.
In pseudocode: hash $ prefix <> blake2b_256 (serialiseData d)
Given this minor change, we can now check that one script is the application of another script.
Concretely, given hash script
, hash original
, parameter d
(as data),
intermediate hashes h0
, h1
, hashing prefixes ver_prefix
, app_prefix
, const_prefix
, we check:
script == blake2b_224 $ ver_prefix <> h0
h0 == blake2b_224 $ app_prefix <> original <> h1
h1 == blake2b_224 $ const_data_prefix <> blake2b_256 (serialiseData d)
We essentially open the Merkle tree commitment partially and check that the supplied path is correct.
This CIP does not depend on CIP-58, but to hash integers on-chain the way it's done here, CIP-58's integer-to-bytestring serialisation built-in must be available in Plutus.
BIP144 uses this trick to avoid submitting the parts of the script that aren't used. Given that reference scripts are common in Haskell, this isn't a big win for efficiency, but it might be worth implementing for the sake of scripts used only once. This CIP however doesn't require that that be implemented.
The argument for privacy doesn't apply, private smart contracts can be achieved through the use of non-interactive zero-knowledge probabilistic proofs.
Currently, different instances of the same script will need their own reference inputs since their hashes are different. It seems feasible to allow sharing of a single reference script, given the parameters and language version as witnesses, but given the complexity involved, it is not specified in this CIP.
This makes staking validators much more powerful, since a single protocol can now manage many rewards accounts (by instantiating the script with a numeric identifier). However, it is arguably not the optimal solution due to the reference script problem described above. Even if the reference script problem is solved as described above, it seems logical to allow supplying a datum to a staking validator, or somehow combining the payment address and staking address for scripts, and using the same datum for both, while somehow solving the separate accounts problem.
Given the heavy complexity of fixing staking validators, Merkelising script hashing seems much more feasible.
Seemingly, Merkelisation is a less invasive and possibly cleaner change.
The hashing of constants might not have a clear best solution, but most notably,
it is not clear how much/less to Merkelise the hashing.
E.g., the hashing of data itself could be Merkelised. This is not done in this CIP.
The hashing of a Data
constant could also prepend the prefix directly to the serialisation,
rather than to the hash of the Data
. It is not clear what is best.
Strings are not very useful in Plutus. Hence, the hashing algorithm for them isn't optimised such that they can be easily verified.
Strings have essentially no purpose on-chain, since they're only used for tracing, which should not be used in production.
In the context of checking applied parameters, it is likely that only
Data
, Integer
, Bool
, ByteString
, will be used as parameters,
since they cover all useful behaviour in an efficient way.
If you want to parameterise your script by a pair of integers,
it is likely best to unwrap that into two separate integer parameters
for the sake of efficiency of running the script, which is likely
to be more common that checking the parameters.
Built-in lists and pairs are not commonly used as parameters, but it's plausible that they might still be the most efficient method in some scenarios. Hence, they have been included. They use Merkle-tree hashing since that's the simplest and most useful in this case.
Las Safin will implement this if IOG don't have time.
This CIP is licensed under CC-BY-4.0.