Refactor inscription parsing #2461

casey · 2023-09-20T20:09:24Z

This PR refactors inscription parsing by disentangling envelope recognition and inscription parsing.

Envelope recognition recognizes envelopes of the form OP_FALSE OP_IF "ord" <data pushes> OP_ENDIF in tapscripts. Inscription parsing converts envelopes into inscriptions.

The goal is that all valid envelopes are recognized, and that all valid envelopes result in inscriptions. This can give us some confidence in the stability of inscription numbers, since previously, failures in the inscription parsing code would result in an inscription not being assigned an inscription number.

Changes in behavior:

Inscriptions with duplicate fields are now recognized and assigned inscription IDs, but cursed.
Inscriptions with incomplete fields, i.e., a field tag followed by the end of the envelope, are recognized and assigned inscription IDs, but are cursed.

cbspears · 2023-09-21T01:21:35Z

I support this PR, however I do think this still requires a clear definition on what a "valid" ord envelope is. Because if it defined as OP_FALSE OP_IF "ord" <data pushes> OP_ENDIF wouldn't that mean #2139 is a valid Inscription envelope? Apologies if I'm missing something as I'm not nearly as technical as you are on this.

casey · 2023-09-21T01:27:12Z

I updated the the PR description to clarify that this still only recognizes envelopes in tapscripts.

satoshi0770 · 2023-09-21T01:41:39Z

finally expressive open envelope!
question about the multiple in the field, which sat would the 2nd envelope in the same reveal be assigned to?
say you have more then one envelope in 1 reveal.
support this move big time! hopefully it would help others be onboard, it should help the worry of ongoing envelope validity changes.

Psifour · 2023-09-21T02:01:37Z

I updated the the PR description to clarify that this still only recognizes envelopes in tapscripts.

While this PR doesn't extend to alternate locations in transactions where the envelope can be stored, is it an indication that we intend to not allow those alternatives or simply a matter of the expedient solution first while still open to further refinements if developers are willing to commit to writing it up and getting the PR to you and Raph?

Psifour

Huge fan of this change! I am biased as it will allow us to rewind a lot of bad engineering we did with gord to maintain inscription number parity. The core of our logic is very similar to this optimized pipeline.

ACK

casey · 2023-09-21T03:00:30Z

While this PR doesn't extend to alternate locations in transactions where the envelope can be stored, is it an indication that we intend to not allow those alternatives or simply a matter of the expedient solution first while still open to further refinements if developers are willing to commit to writing it up and getting the PR to you and Raph?

Are you think of envelopes in P2SH and P2WSH scripts? I'm personally not convinced that there's utility in recognizing envelopes in anything but tapscripts, in order to keep things simple. Storing envelopes in the signature annex is a desirable future extension, since that would allow inscribing with a single transaction, but that's a longer term change.

cbspears · 2023-09-21T03:33:06Z

While I personally share your views expressed on ICP as an unconvincing reason for recognizing P2WSH scripts, I do think there are probably reasons for utility in other parts of Bitcoin data, including P2WSH. A discussion should probably be had on whether an Inscription needs to have a utilitarian purpose to be recognized by ord, or if we simply allow as broad of a definition of a "valid envelope" as possible.

I don't consider this topic super urgent (and it should have its own Discussion) as I think this PR and the sequence # PR should get through ASAP. But I do think the discussion on valid ord envelopes should happen because it could have important implications.

Psifour · 2023-09-21T03:48:56Z

Are you think of envelopes in P2SH and P2WSH scripts? I'm personally not convinced that there's utility in recognizing envelopes in anything but tapscripts, in order to keep things simple. Storing envelopes in the signature annex is a desirable future extension, since that would allow inscribing with a single transaction, but that's a longer term change.

We both are very aligned on annexes.. I really wanted to do this with our inscription tool, but couldn't get the sign off from Luxor (yet) to inject non-standard transactions to our pool. If we get that it will likely be a massive boon as I implemented inscription from scratch for us and skipping the commit would be better for financial and organizational reasons.

As for alternative locations, I would say P2WSH is a pretty straight forward one (lower dust limit), but I could see opposition to it from both a protocol design AND Bitcoin ethos standpoint. A few novel ones exist as well, such as op_return with a minimal inscription envelope. I would agree with Charlie that NONE of these are worth holding this change up for, but would love to continue this as a discussion and/or even assist in implementation if we reach a point that we can agree on for them.

casey · 2023-09-21T15:29:29Z

Regarding inscription in the annex, if we do this, we should ideally come up with a different encoding, since using the script-based encoding in the annex doesn't make much sense, and is inefficient and overly complex compared to other possible encodings. We could either use a simple custom encoding which amounts to lists of byte strings, where each inscription is a list of bye strings, or something like CBOR.

Psifour · 2023-09-21T16:16:47Z

Regarding inscription in the annex, if we do this, we should ideally come up with a different encoding, since using the script-based encoding in the annex doesn't make much sense, and is inefficient and overly complex compared to other possible encodings. We could either use a simple custom encoding which amounts to lists of byte strings, where each inscription is a list of bye strings, or something like CBOR.

Is the 11-byte savings worth the added complexity of having a second indexing pipeline? Or do you more mean inclusion of some form of compression for the fields that currently rely on UTF-encoded plaintext?

elocremarc · 2023-09-22T03:01:37Z

What happens if there is gibberish in the envelope that isn't clear about how to parse it? Do we still index it and maybe just display the unparsed data?

elocremarc · 2023-09-22T03:26:21Z

Should the mime-type be optional when parsing too? Some executable files don't need a mime type and could get by with just a shebang. This is what I recommend in my "dapp" metaprotocol to inscribe dapp code.
See inscriptionId c5942064bea79672efc5d8331d84171ad2ebb086873e4eb24f7184b159702b87i0

casey · 2023-09-22T03:48:50Z

Is the 11-byte savings worth the added complexity of having a second indexing pipeline?

I think it's probably more than 11 bytes. But the idea of putting script in the annex is just sad. The alternate encoding could be something like CBOR, which is standard and widely implemented, but complicated. Or a simplified encoding of Vec<Vec<Vec<u8>>>, the output of which gets turned directly into a Vec<Envelope> using the parsing code in this PR.

Or do you more mean inclusion of some form of compression for the fields that currently rely on UTF-encoded plaintext?

I don't think so. I've been thinking of this recently, and I think compression of anything that ord would need to decompress must be avoided, since you can create pathological payloads, for example a mime type that un-zips to multiple gigs. So compression for bodies only, which ord can serve to the user compressed.

casey · 2023-09-22T03:49:37Z

What happens if there is gibberish in the envelope that isn't clear about how to parse it? Do we still index it and maybe just display the unparsed data?

Yah, it'll be indexed, but any gibberish is ignored.

Should the mime-type be optional when parsing too? Some executable files don't need a mime type and could get by with just a shebang. This is what I recommend in my "dapp" metaprotocol to inscribe dapp code. See inscriptionId c5942064bea79672efc5d8331d84171ad2ebb086873e4eb24f7184b159702b87i0

The mime type is already optional, inscriptions can omit them.

lifofifoX · 2023-09-22T22:34:59Z

Sorry if this is not the right place to ask, but I'm trying to understand why unrecognized even fields are the only unbound inscriptions, while duplicate/incomplete fields are bound. Would appreciate any insights.

elocremarc · 2023-09-22T23:12:35Z

Sorry if this is not the right place to ask, but I'm trying to understand why unrecognized even fields are the only unbound inscriptions, while duplicate/incomplete fields are bound. Would appreciate any insights.

like op_66? #2113

lifofifoX · 2023-09-22T23:25:22Z

Sorry if this is not the right place to ask, but I'm trying to understand why unrecognized even fields are the only unbound inscriptions, while duplicate/incomplete fields are bound. Would appreciate any insights.

like op_66? #2113

Makes sense 👍 #2109 (comment) provides some additional context as well. Curious to know if the recent discussion changes anything here, now that we might recognize/bind inscriptions with gibberish in the envelope.

casey · 2023-09-22T23:59:56Z

Sorry if this is not the right place to ask, but I'm trying to understand why unrecognized even fields are the only unbound inscriptions, while duplicate/incomplete fields are bound. Would appreciate any insights.

Even tags are intended to be used for fields which change how an inscription is first assigned to a sat, or how it subsequently transfers. For example, the proposed offset field #2383, specifies an offset to the sat the inscription is first assigned to. So, if you see an unrecognized even field, you can't make any assumptions about where the inscription is, so it's better to show it as unbound than at an erroneous location.

lifofifoX · 2023-09-23T00:14:10Z

@casey That makes sense. Thanks explaining 🙏

casey · 2023-09-24T02:17:02Z

Just confirmed that this doesn't change any existing blessed inscriptions, and recognizes four additional cursed inscriptions.

DrJingLee · 2023-09-24T03:36:25Z

Just confirmed that this doesn't change any existing blessed inscriptions, and recognizes four additional cursed inscriptions.

Any details of the four cursed？
What are the blessed numbers vs their previous cursed numbers ?

casey · 2023-09-24T03:45:56Z

Any details of the four cursed？ What are the blessed numbers vs their previous cursed numbers ?

No blessed inscriptions have changed. I just checked out the four new cursed inscriptions, which are:

2a81839ae676e6955c87102d4d798d64ed367cd35d52476cdba6854a7eee3b68i0
8a419f2e770a00820474699fda57e4767e27684dc28136bd7b06a62eebe9e8d0i0
74f11a182ec96f1f49c7870c5ddc535b46e6faa0879e6b6f94cc2b5a1bd7d358i0
0b71bd09c848be66334c0cdaa32686e98dffa8a212af694f59165cdbb588e587i0

The first three have duplicate fields, the last has an incomplete field. (The envelope ends where the previous parser would be expecting a field value.)

@Psifour Are there any other inscriptions you would expect the new envelope parser to find?

raphjaph

So much cleaner than before! Lesgo

raphjaph added 2 commits September 20, 2023 21:45

Add sequence number

f84e803

rename number to inscription_number

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23
Expired
Learn about vigilant mode

fb19723

casey force-pushed the envelope branch from 398314d to b9525eb Compare September 20, 2023 20:12

Refactor inscription parsing

e5450f2

casey force-pushed the envelope branch from b9525eb to e5450f2 Compare September 20, 2023 20:14

raphjaph and others added 6 commits September 20, 2023 22:24

quick fix

8eef2df

quick fix

574fa44

Fixy fixy!

d4896b9

Remove dbg!

9d8c829

placate clippy

e5f0d83

Rename data to payload

1f684aa

Psifour approved these changes Sep 21, 2023

View reviewed changes

casey and others added 6 commits September 21, 2023 10:54

Use empty array instead of body_tag

8814c65

tweak some tests and function

172b498

get rid of reinscription id to sequence number table

19c396b

remove sequence number from front-end

58aa9b4

address nits

9e7b3e4

Merge branch 'sequence-number' into envelope

7fc459a

casey added 5 commits September 21, 2023 23:14

Add note

b4669c9

Make infallible

78ed00d

Add tests for incomplete fields

e80cdf4

Factor out field removal

ba5659b

new_inscriptions -> envelopes

ebfe2e6

Remove note

9989606

casey marked this pull request as ready for review September 24, 2023 02:17

Merge remote-tracking branch 'upstream/master' into envelope

fe3233e

raphjaph approved these changes Sep 25, 2023

View reviewed changes

casey merged commit 35ccc84 into ordinals:master Sep 25, 2023
6 checks passed

casey deleted the envelope branch September 25, 2023 19:11

cbspears mentioned this pull request Nov 10, 2023

The Jubilee #2495

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor inscription parsing #2461

Refactor inscription parsing #2461

casey commented Sep 20, 2023 •

edited

Loading

cbspears commented Sep 21, 2023

casey commented Sep 21, 2023

satoshi0770 commented Sep 21, 2023

Psifour commented Sep 21, 2023

Psifour left a comment

casey commented Sep 21, 2023

cbspears commented Sep 21, 2023 •

edited

Loading

Psifour commented Sep 21, 2023

casey commented Sep 21, 2023

Psifour commented Sep 21, 2023

elocremarc commented Sep 22, 2023

elocremarc commented Sep 22, 2023

casey commented Sep 22, 2023

casey commented Sep 22, 2023

lifofifoX commented Sep 22, 2023

elocremarc commented Sep 22, 2023 •

edited

Loading

lifofifoX commented Sep 22, 2023

casey commented Sep 22, 2023

lifofifoX commented Sep 23, 2023

casey commented Sep 24, 2023

DrJingLee commented Sep 24, 2023

casey commented Sep 24, 2023

raphjaph left a comment

Refactor inscription parsing #2461

Refactor inscription parsing #2461

Conversation

casey commented Sep 20, 2023 • edited Loading

cbspears commented Sep 21, 2023

casey commented Sep 21, 2023

satoshi0770 commented Sep 21, 2023

Psifour commented Sep 21, 2023

Psifour left a comment

Choose a reason for hiding this comment

casey commented Sep 21, 2023

cbspears commented Sep 21, 2023 • edited Loading

Psifour commented Sep 21, 2023

casey commented Sep 21, 2023

Psifour commented Sep 21, 2023

elocremarc commented Sep 22, 2023

elocremarc commented Sep 22, 2023

casey commented Sep 22, 2023

casey commented Sep 22, 2023

lifofifoX commented Sep 22, 2023

elocremarc commented Sep 22, 2023 • edited Loading

lifofifoX commented Sep 22, 2023

casey commented Sep 22, 2023

lifofifoX commented Sep 23, 2023

casey commented Sep 24, 2023

DrJingLee commented Sep 24, 2023

casey commented Sep 24, 2023

raphjaph left a comment

Choose a reason for hiding this comment

casey commented Sep 20, 2023 •

edited

Loading

cbspears commented Sep 21, 2023 •

edited

Loading

elocremarc commented Sep 22, 2023 •

edited

Loading