Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor inscription parsing #2461

Merged
merged 22 commits into from
Sep 25, 2023
Merged

Refactor inscription parsing #2461

merged 22 commits into from
Sep 25, 2023

Conversation

casey
Copy link
Collaborator

@casey casey commented Sep 20, 2023

This PR refactors inscription parsing by disentangling envelope recognition and inscription parsing.

Envelope recognition recognizes envelopes of the form OP_FALSE OP_IF "ord" <data pushes> OP_ENDIF in tapscripts. Inscription parsing converts envelopes into inscriptions.

The goal is that all valid envelopes are recognized, and that all valid envelopes result in inscriptions. This can give us some confidence in the stability of inscription numbers, since previously, failures in the inscription parsing code would result in an inscription not being assigned an inscription number.

Changes in behavior:

  • Inscriptions with duplicate fields are now recognized and assigned inscription IDs, but cursed.
  • Inscriptions with incomplete fields, i.e., a field tag followed by the end of the envelope, are recognized and assigned inscription IDs, but are cursed.

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
@cbspears
Copy link
Contributor

I support this PR, however I do think this still requires a clear definition on what a "valid" ord envelope is. Because if it defined as OP_FALSE OP_IF "ord" <data pushes> OP_ENDIF wouldn't that mean #2139 is a valid Inscription envelope? Apologies if I'm missing something as I'm not nearly as technical as you are on this.

@casey
Copy link
Collaborator Author

casey commented Sep 21, 2023

I updated the the PR description to clarify that this still only recognizes envelopes in tapscripts.

@satoshi0770
Copy link

finally expressive open envelope!
question about the multiple in the field, which sat would the 2nd envelope in the same reveal be assigned to?
say you have more then one envelope in 1 reveal.
support this move big time! hopefully it would help others be onboard, it should help the worry of ongoing envelope validity changes.

@Psifour
Copy link
Contributor

Psifour commented Sep 21, 2023

I updated the the PR description to clarify that this still only recognizes envelopes in tapscripts.

While this PR doesn't extend to alternate locations in transactions where the envelope can be stored, is it an indication that we intend to not allow those alternatives or simply a matter of the expedient solution first while still open to further refinements if developers are willing to commit to writing it up and getting the PR to you and Raph?

Copy link
Contributor

@Psifour Psifour left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huge fan of this change! I am biased as it will allow us to rewind a lot of bad engineering we did with gord to maintain inscription number parity. The core of our logic is very similar to this optimized pipeline.

ACK

@casey
Copy link
Collaborator Author

casey commented Sep 21, 2023

While this PR doesn't extend to alternate locations in transactions where the envelope can be stored, is it an indication that we intend to not allow those alternatives or simply a matter of the expedient solution first while still open to further refinements if developers are willing to commit to writing it up and getting the PR to you and Raph?

Are you think of envelopes in P2SH and P2WSH scripts? I'm personally not convinced that there's utility in recognizing envelopes in anything but tapscripts, in order to keep things simple. Storing envelopes in the signature annex is a desirable future extension, since that would allow inscribing with a single transaction, but that's a longer term change.

@cbspears
Copy link
Contributor

cbspears commented Sep 21, 2023

While I personally share your views expressed on ICP as an unconvincing reason for recognizing P2WSH scripts, I do think there are probably reasons for utility in other parts of Bitcoin data, including P2WSH. A discussion should probably be had on whether an Inscription needs to have a utilitarian purpose to be recognized by ord, or if we simply allow as broad of a definition of a "valid envelope" as possible.

I don't consider this topic super urgent (and it should have its own Discussion) as I think this PR and the sequence # PR should get through ASAP. But I do think the discussion on valid ord envelopes should happen because it could have important implications.

@Psifour
Copy link
Contributor

Psifour commented Sep 21, 2023

Are you think of envelopes in P2SH and P2WSH scripts? I'm personally not convinced that there's utility in recognizing envelopes in anything but tapscripts, in order to keep things simple. Storing envelopes in the signature annex is a desirable future extension, since that would allow inscribing with a single transaction, but that's a longer term change.

We both are very aligned on annexes.. I really wanted to do this with our inscription tool, but couldn't get the sign off from Luxor (yet) to inject non-standard transactions to our pool. If we get that it will likely be a massive boon as I implemented inscription from scratch for us and skipping the commit would be better for financial and organizational reasons.

As for alternative locations, I would say P2WSH is a pretty straight forward one (lower dust limit), but I could see opposition to it from both a protocol design AND Bitcoin ethos standpoint. A few novel ones exist as well, such as op_return with a minimal inscription envelope. I would agree with Charlie that NONE of these are worth holding this change up for, but would love to continue this as a discussion and/or even assist in implementation if we reach a point that we can agree on for them.

@casey
Copy link
Collaborator Author

casey commented Sep 21, 2023

Regarding inscription in the annex, if we do this, we should ideally come up with a different encoding, since using the script-based encoding in the annex doesn't make much sense, and is inefficient and overly complex compared to other possible encodings. We could either use a simple custom encoding which amounts to lists of byte strings, where each inscription is a list of bye strings, or something like CBOR.

@Psifour
Copy link
Contributor

Psifour commented Sep 21, 2023

Regarding inscription in the annex, if we do this, we should ideally come up with a different encoding, since using the script-based encoding in the annex doesn't make much sense, and is inefficient and overly complex compared to other possible encodings. We could either use a simple custom encoding which amounts to lists of byte strings, where each inscription is a list of bye strings, or something like CBOR.

Is the 11-byte savings worth the added complexity of having a second indexing pipeline? Or do you more mean inclusion of some form of compression for the fields that currently rely on UTF-encoded plaintext?

@elocremarc
Copy link
Contributor

What happens if there is gibberish in the envelope that isn't clear about how to parse it? Do we still index it and maybe just display the unparsed data?

@elocremarc
Copy link
Contributor

Should the mime-type be optional when parsing too? Some executable files don't need a mime type and could get by with just a shebang. This is what I recommend in my "dapp" metaprotocol to inscribe dapp code.
See inscriptionId c5942064bea79672efc5d8331d84171ad2ebb086873e4eb24f7184b159702b87i0

@casey
Copy link
Collaborator Author

casey commented Sep 22, 2023

Is the 11-byte savings worth the added complexity of having a second indexing pipeline?

I think it's probably more than 11 bytes. But the idea of putting script in the annex is just sad. The alternate encoding could be something like CBOR, which is standard and widely implemented, but complicated. Or a simplified encoding of Vec<Vec<Vec<u8>>>, the output of which gets turned directly into a Vec<Envelope> using the parsing code in this PR.

Or do you more mean inclusion of some form of compression for the fields that currently rely on UTF-encoded plaintext?

I don't think so. I've been thinking of this recently, and I think compression of anything that ord would need to decompress must be avoided, since you can create pathological payloads, for example a mime type that un-zips to multiple gigs. So compression for bodies only, which ord can serve to the user compressed.

@casey
Copy link
Collaborator Author

casey commented Sep 22, 2023

What happens if there is gibberish in the envelope that isn't clear about how to parse it? Do we still index it and maybe just display the unparsed data?

Yah, it'll be indexed, but any gibberish is ignored.

Should the mime-type be optional when parsing too? Some executable files don't need a mime type and could get by with just a shebang. This is what I recommend in my "dapp" metaprotocol to inscribe dapp code. See inscriptionId c5942064bea79672efc5d8331d84171ad2ebb086873e4eb24f7184b159702b87i0

The mime type is already optional, inscriptions can omit them.

@lifofifoX
Copy link
Collaborator

Sorry if this is not the right place to ask, but I'm trying to understand why unrecognized even fields are the only unbound inscriptions, while duplicate/incomplete fields are bound. Would appreciate any insights.

@elocremarc
Copy link
Contributor

elocremarc commented Sep 22, 2023

Sorry if this is not the right place to ask, but I'm trying to understand why unrecognized even fields are the only unbound inscriptions, while duplicate/incomplete fields are bound. Would appreciate any insights.

like op_66? #2113

@lifofifoX
Copy link
Collaborator

Sorry if this is not the right place to ask, but I'm trying to understand why unrecognized even fields are the only unbound inscriptions, while duplicate/incomplete fields are bound. Would appreciate any insights.

like op_66? #2113

Makes sense 👍 #2109 (comment) provides some additional context as well. Curious to know if the recent discussion changes anything here, now that we might recognize/bind inscriptions with gibberish in the envelope.

@casey
Copy link
Collaborator Author

casey commented Sep 22, 2023

Sorry if this is not the right place to ask, but I'm trying to understand why unrecognized even fields are the only unbound inscriptions, while duplicate/incomplete fields are bound. Would appreciate any insights.

Even tags are intended to be used for fields which change how an inscription is first assigned to a sat, or how it subsequently transfers. For example, the proposed offset field #2383, specifies an offset to the sat the inscription is first assigned to. So, if you see an unrecognized even field, you can't make any assumptions about where the inscription is, so it's better to show it as unbound than at an erroneous location.

@lifofifoX
Copy link
Collaborator

@casey That makes sense. Thanks explaining 🙏

@casey
Copy link
Collaborator Author

casey commented Sep 24, 2023

Just confirmed that this doesn't change any existing blessed inscriptions, and recognizes four additional cursed inscriptions.

@casey casey marked this pull request as ready for review September 24, 2023 02:17
@DrJingLee
Copy link
Contributor

Just confirmed that this doesn't change any existing blessed inscriptions, and recognizes four additional cursed inscriptions.

Any details of the four cursed?
What are the blessed numbers vs their previous cursed numbers ?

@casey
Copy link
Collaborator Author

casey commented Sep 24, 2023

Any details of the four cursed? What are the blessed numbers vs their previous cursed numbers ?

No blessed inscriptions have changed. I just checked out the four new cursed inscriptions, which are:

2a81839ae676e6955c87102d4d798d64ed367cd35d52476cdba6854a7eee3b68i0
8a419f2e770a00820474699fda57e4767e27684dc28136bd7b06a62eebe9e8d0i0
74f11a182ec96f1f49c7870c5ddc535b46e6faa0879e6b6f94cc2b5a1bd7d358i0
0b71bd09c848be66334c0cdaa32686e98dffa8a212af694f59165cdbb588e587i0

The first three have duplicate fields, the last has an incomplete field. (The envelope ends where the previous parser would be expecting a field value.)

@Psifour Are there any other inscriptions you would expect the new envelope parser to find?

Copy link
Collaborator

@raphjaph raphjaph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So much cleaner than before! Lesgo

@casey casey merged commit 35ccc84 into ordinals:master Sep 25, 2023
6 checks passed
@casey casey deleted the envelope branch September 25, 2023 19:11
@cbspears cbspears mentioned this pull request Nov 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants