-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP feat: new asm
parser
#1064
base: main
Are you sure you want to change the base?
WIP feat: new asm
parser
#1064
Conversation
Let's start with concrete examples of asm-functions we would like to write and then try to develop some grammar for it. As of now, we should start with instruction sequences and comments (let's use the Tact/Fift syntax for comments). |
i.e. no |
definitely not using this syntax :) |
we might allow asm-blocks to avoid introducing high-level constructions into the assembly language, so the user might able to insert those into loops, etc. |
Hmm, then, something like this we've showed earlier (and have in our tests) won't work anymore: asm fun isIntAnInt(x: Int): Bool {
<{
TRY:<{
0 PUSHINT ADD DROP -1 PUSHINT
}>CATCH<{
2DROP 0 PUSHINT
}>
}>CONT 1 1 CALLXARGS
} Proceed with removing that? This kinda goes against POLA. UPD: Edited the original post, saying that the full range of Fift+TVM syntax there was highly experimental and subject to change in next releases. Ok, now we can proceed. |
we haven't documented it, so there is no backwards compatibility violation |
and Tact Kitchen posts are about raw immature things, sneak peek |
WIP WIP WIP
82a65b1
to
f1562ff
Compare
e52f1ec
to
40f118b
Compare
]); | ||
|
||
// NOTE: ok, I might need some help here... | ||
export const ppAstAsmExpressionList: Printer<A.AstAsmExpressionList> = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
c.braced(node.exprs.map(ppAstAsmExpression))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depending on the order PRs get merged, you might not have this problem here at all :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, but I've already removed the asm expression lists from the grammar, such that { ... }
are prohibited. At least until we may or may not make sense of them ourselves and without Fift's aid.
Didn't remove it there in pretty printer yet, but will.
// 1. Instructions cannot contain braces | ||
// 2. Cannot be braces themselves | ||
// 3. Cannot contain lowercase letters, except for the very last position | ||
asmInstruction (TVM instruction) = ~(("{" | "}") ~asmWord) asmWord |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately all newline characters get parsed as spaces
and are completely erased from AST. There is no way to pretty-print this at all.
I can imagine only one way to make this work without adding ;
or some other explicit way to terminate lines: put this into a separate grammar with its own spaces
rule.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I'm thinking of the Prettier's approach with trying to fit as much as allowed in 80, 120 or other column limits, and put all excesses on new lines. Or, to simply put a newline character after each instruction — to keep all prior primitives required for it on the same line.
The second option sounds best from legibility perspective as well:
asm fun showcase(a: Int, b: Int, c: Int): Int {
s0 s1 s2 XCHG3
s1 s2 s0 XCHG3
DROP2
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, and it would be nice to also check if the instructions are written on the same line with the opening and closing braces { ... }
, i.e. when the start row of the function body equals its end row. Because if that's the case, we should respect the author's intent and inline everything there: add spaces after instructions, not newlines.
I don't know if this should be rewritten before or after the non-Ohm parser. That said, I'll start describing examples of TVM instructions and arguments that they expect, such that we can come up with a number of primitives to express those arguments.
The things people are used to are:
42 -42
— decimal number literals, with in-between underscores_
allowed for readabilty (underscores_
aren't allowed in Fift, but may be allowed in Tact)x{babecafe_}
— hexadecimal bitstrings (up to 1023 bits) with optinal padding via the_
at the end.b{010101}
— binary bitstrings (up to 1023 bits) without padding.s0
,s1
, ...,s15
— stack registersc0
,c1
, ...,c15
— control registersMYCODE ADDRSHIFT#MOD IF: XCHG3_l 2SWAP -ROT
— TVM instructions themselves. By the way, the ones ending with_l
are useless and just legacy; most of those starting with a number have an alias where they end with that number (2DROP
orDROP2
, etc.); andIF:
and related control flow-ish ones are used with continuations. Don't confuse the latter withIF:<{
, which are purely Fift-guided and aren't TVM instructions per se.The new-ish syntax proposals:
c{...}
, a 1-to-1 replacement of Fift'sB{...} B>boc
, which are hex-encoded BoCs with no limit on bitsize, except for the account state limits of course. Quite important for PUSHREF and similar instructions.i s()
syntax, users now can use any ofs0
,s1
, ..., up tos255
The important bit is to parse deprecated things or recognize instuctions inside strings "" to provide clear error messages for users, with simple migration from the old syntax of Fift to new one of Tact.
This is another attempt at making the
asm
parser — now we're deliberately limiting ourselves to TVM instructions. In the current iteration it's still possible to write code in a very limited subset of Fift, but only if we won't restrict the list of supported TVM instructions ingrammar.ts
, i.e. during semantical actions on the lexed tokens.TODO:
grammar.ohm
ast.ts
grammar.ts
prettyPrinter.ts
→writeFunction.ts
(to actually have asm instructions written)hash.ts
compare.ts
Issue
Towards #837.
Towards #1030 (once the
grammar.ts
is written, there will be a small check for such gotchas).Checklist