Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjust the calleeSavedRegs on top frame for LoongArch64/RISCV64 #100962

Merged
merged 4 commits into from
Apr 23, 2024

Conversation

shushanhf
Copy link
Contributor

@shushanhf shushanhf commented Apr 12, 2024

Adjust the calleeSavedRegs on top frame for LoongArch64/RISCV64 to support the GSCookie.

The frame layout:
  |                       |
  |-----------------------|
  |  incoming arguments   |
  +=======================+ <---- Caller's SP
  |  Varargs regs space   | // Only for varargs main functions; not used for LA64.
  |-----------------------|
  |    MonitorAcquired    | // 8 bytes; for synchronized methods
  |-----------------------|
  |        PSP slot       | // 8 bytes (omitted in NativeAOT ABI)
  |-----------------------|
  |Callee saved registers | // multiple of 8 bytes, not including FP/RA
  |-----------------------|
  |      Saved RA         | // 8 bytes
  |-----------------------|
  |      Saved FP         | // 8 bytes
  |-----------------------|
  |  possible GS cookie   |
  |-----------------------|
  | locals, temps, etc.   |
  |-----------------------|
  |  possible GS cookie   |
  |-----------------------|
  |   Outgoing arg space  | // multiple of 8 bytes; if required (i.e., #outsz != 0)
  |-----------------------| <---- Ambient SP
  |       |               |
  ~       | Stack grows   ~
  |       | downward      |

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Apr 12, 2024
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Apr 12, 2024
@shushanhf shushanhf force-pushed the adjust_calleeSavedRegs_offset branch 5 times, most recently from 6ec45e3 to 855fdb3 Compare April 13, 2024 08:01
@am11 am11 added arch-loongarch64 arch-riscv Related to the RISC-V architecture labels Apr 13, 2024
@shushanhf shushanhf force-pushed the adjust_calleeSavedRegs_offset branch 6 times, most recently from f94cc0d to cf69c78 Compare April 15, 2024 10:06
@shushanhf shushanhf force-pushed the adjust_calleeSavedRegs_offset branch from cf69c78 to 8601cfc Compare April 16, 2024 00:44
@shushanhf
Copy link
Contributor Author

@jakobbotsch @clamp03
Could you please review this PR?
Thanks

@clamp03
Copy link
Member

clamp03 commented Apr 16, 2024

@bartlomiejko Can you review and test this PR?
cc @dotnet/samsung

@jakobbotsch
Copy link
Member

How is the stack frame layout changed compared to previously? What was the layout before this change? Why didn't it support GS cookie?
Can you show some codegen diffs?

@shushanhf
Copy link
Contributor Author

shushanhf commented Apr 16, 2024

How is the stack frame layout changed compared to previously? What was the layout before this change?

Only changed the calledSavedRegs, especially the FP/RA for LA64/RV64. The layout liking this PR's first comment description #100962 (comment) .
Before this PR, the FP/RA at bottom of frame which under the GSCookie, the FP/RA can be overwritten.

Why didn't it support GS cookie? Can you show some codegen diffs?

Before this PR, the FP/RA at bottom of frame which under the GSCookie, the FP/RA can be overwritten.

@jakobbotsch
Copy link
Member

Only changed the calledSavedRegs, especially the FP/RA for LA64/RV64. The layout liking this PR's first comment description #100962 (comment) .
Before this PR, the FP/RA at bottom of frame which under the GSCookie, the FP/RA can be overwritten.

Where does FP point to in your picture? Where did it point to before?

@shushanhf
Copy link
Contributor Author

Only changed the calledSavedRegs, especially the FP/RA for LA64/RV64. The layout liking this PR's first comment description #100962 (comment) .
Before this PR, the FP/RA at bottom of frame which under the GSCookie, the FP/RA can be overwritten.

Where does FP point to in your picture? Where did it point to before?

The new FP is pointing to the the old FP saved slot.

Can you show some codegen diffs?

/// this is the Old one.
G_M22522_IG01:              ;; offset=0x0000
  0xff74a50030  02FDC063  addi.d       sp, sp, -144
  0xff74a50034  29C00061  st.d         ra, sp, 0
  0xff74a50038  29C02076  st.d         fp, sp, 8
  0xff74a5003c  02C02076  addi.d       fp, sp, 8
  0xff74a50040  02C062CC  addi.d       t0, fp, 24 
  0xff74a50044  02801407  addi.w       a3, zero, 5
  0xff74a50048  29C02180  st.d         zero, t0, 8
  0xff74a5004c  29C00180  st.d         zero, t0, 0
  0xff74a50050  02FFFCE7  addi.d       a3, a3, -1
  0xff74a50054  02C0418C  addi.d       t0, t0, 16 
  0xff74a50058  5FFFF0E0  bne          a3, zero, 0xff74a50048
  0xff74a5005c  29C00180  st.d         zero, t0, 0
  0xff74a50060  29C202C4  st.d         a0, fp, 128
  0xff74a50064  29C1E2C5  st.d         a1, fp, 120
  0xff74a50068  29C1C2C6  st.d         a2, fp, 112
                        ;; size=60 bbWeight=1 PerfScore 0.00
G_M22522_IG02:              ;; offset=0x003C
/// This is the new ins.
G_M22522_IG01:              ;; offset=0x0000
  0xff77790030  02FDC063  addi.d       sp, sp, -144
  0xff77790034  29C20076  st.d         fp, sp, 128
  0xff77790038  29C22061  st.d         ra, sp, 136
  0xff7779003c  02C20076  addi.d       fp, sp, 128
  0xff77790040  02FE42CC  addi.d       t0, fp, -112
  0xff77790044  02801407  addi.w       a3, zero, 5
  0xff77790048  29C02180  st.d         zero, t0, 8
  0xff7779004c  29C00180  st.d         zero, t0, 0
  0xff77790050  02FFFCE7  addi.d       a3, a3, -1
  0xff77790054  02C0418C  addi.d       t0, t0, 16 
  0xff77790058  5FFFF0E0  bne          a3, zero, 0xff77790048  /// the upstream's disasm format had been updated.
  0xff7779005c  29C00180  st.d         zero, t0, 0
  0xff77790060  29FFE2C4  st.d         a0, fp, -8
  0xff77790064  29FFC2C5  st.d         a1, fp, -16
  0xff77790068  29FFA2C6  st.d         a2, fp, -24
                        ;; size=60 bbWeight=1 PerfScore 0.00

@jakobbotsch
Copy link
Member

Before this PR, the FP/RA at bottom of frame which under the GSCookie, the FP/RA can be overwritten.

Before this change, the comment in codegenloongarch64.cpp gives the following layout:

* The LoongArch64's frame layout is liking:
 *
 *      |                       |
 *      |-----------------------|
 *      |  incoming arguments   |
 *      +=======================+ <---- Caller's SP
 *      |     Arguments  Or     | // if needed.
 *      |  Varargs regs space   | // Only for varargs functions; (varargs not implemented for LoongArch64)
 *      |-----------------------|
 *      |    MonitorAcquired    | // 8 bytes; for synchronized methods
 *      |-----------------------|
 *      |        PSP slot       | // 8 bytes (omitted in NativeAOT ABI)
 *      |-----------------------|
 *      | locals, temps, etc.   |
 *      |-----------------------|
 *      |  possible GS cookie   |
 *      |-----------------------|
 *      |      Saved FP         | // 8 bytes
 *      |-----------------------|
 *      |      Saved RA         | // 8 bytes
 *      |-----------------------|
 *      |Callee saved registers | // not including FP/RA; multiple of 8 bytes
 *      |-----------------------|
 *      |   Outgoing arg space  | // multiple of 8 bytes; if required (i.e., #outsz != 0)
 *      |-----------------------| <---- Ambient SP
 *      |       |               |
 *      ~       | Stack grows   ~
 *      |       | downward      |
 *              V

The GS cookie exists between the locals and FP/RA pair. I do not see how FP/RA can be overwritten without hitting the GS cookie. Was the comment wrong?

Do you mean that the FP/RA of caller's frame can be overwritten?

@jakobbotsch
Copy link
Member

If I understand correctly, the change here means you have to use negative offsets from FP to access locals more often. Does LA64/RV64 allow encoding as many negative offsets as it allows positive offsets?

@shushanhf
Copy link
Contributor Author

shushanhf commented Apr 16, 2024

  •  |Callee saved registers | // not including FP/RA; multiple of 8 bytes
    

The GS cookie exists between the locals and FP/RA pair. I do not see how FP/RA can be overwritten without hitting the GS cookie. Was the comment wrong?
Do you mean that the FP/RA of caller's frame can be overwritten?

The GS Cookie is only used liking the localalloc. After finished the frame allocation, when using the localalloc to allocate a new stack space, the new space maybe overwritten the new real size allocated while the GSCookie is used to recognize this case.
Before this PR which the FP/RA under GS cookie, the FP/RA maybe overwritten while the GS cookie may not recognize this.

@shushanhf
Copy link
Contributor Author

If I understand correctly, the change here means you have to use negative offsets from FP to access locals more often. Does LA64/RV64 allow encoding as many negative offsets as it allows positive offsets?

yes

@shushanhf shushanhf force-pushed the adjust_calleeSavedRegs_offset branch from 8601cfc to 23fc354 Compare April 16, 2024 09:22
@shushanhf shushanhf force-pushed the adjust_calleeSavedRegs_offset branch 2 times, most recently from b78f2e0 to c237762 Compare April 19, 2024 06:39
@shushanhf
Copy link
Contributor Author

I think the CI went wrong. e.g.

@azure-pipelines
runtime (Build Formatting linux x64) Failing after 4m — Build Formatting linux x64 failed
[Details](https://github.com/dotnet/runtime/pull/100962/checks?check_run_id=24011496326)
@azure-pipelines
runtime (Build Formatting windows x64) Failing after 7m — Build Formatting windows x64 failed

but format-patch is empty.

@shushanhf shushanhf force-pushed the adjust_calleeSavedRegs_offset branch from c237762 to feaa635 Compare April 19, 2024 09:46
@jakobbotsch
Copy link
Member

I think the CI went wrong. e.g.

@azure-pipelines
runtime (Build Formatting linux x64) Failing after 4m — Build Formatting linux x64 failed
[Details](https://github.com/dotnet/runtime/pull/100962/checks?check_run_id=24011496326)
@azure-pipelines
runtime (Build Formatting windows x64) Failing after 7m — Build Formatting windows x64 failed

but format-patch is empty.

You can run jit-format locally, see the docs at https://github.com/dotnet/jitutils and https://github.com/dotnet/jitutils/blob/main/doc/formatting.md.

@am11
Copy link
Member

am11 commented Apr 19, 2024

The underlying error shows up on Windows leg:

D:\a_work\1\s\venv\Scripts\python.exe D:\a_work\1\s/src/coreclr/scripts/jitformat.py -r D:\a_work\1\s -o windows -a x64
...
2024-04-19T09:58:48.1997764Z No cdac-build-tool set or does not exist

started happening after #100650 merge. cc @lambdageek

Copy link
Member

@jakobbotsch jakobbotsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change looks good to me now. Thanks for addressing all the feedback! I will give RISC-V folks some time to review as well.

Copy link
Contributor

@tomeksowi tomeksowi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sirntar, you did more work with stack frames/OSR, could you review when you return on Monday?

@shushanhf shushanhf force-pushed the adjust_calleeSavedRegs_offset branch from feaa635 to 3092323 Compare April 20, 2024 01:14
src/coreclr/jit/codegenriscv64.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/codegenriscv64.cpp Outdated Show resolved Hide resolved
to support the GSCookie.
The frame layout:
   |                       |
   |-----------------------|
   |  incoming arguments   |
   +=======================+ <---- Caller's SP
   |  Varargs regs space   | // Only for varargs main functions; not used for LA64.
   |-----------------------|
   |    MonitorAcquired    | // 8 bytes; for synchronized methods
   |-----------------------|
   |        PSP slot       | // 8 bytes (omitted in NativeAOT ABI)
   |-----------------------|
   |Callee saved registers | // multiple of 8 bytes, not includting FP/RA
   |-----------------------|
   |      Saved RA         | // 8 bytes
   |-----------------------|
   |      Saved FP         | // 8 bytes
   |-----------------------|
   |  possible GS cookie   |
   |-----------------------|
   | locals, temps, etc.   |
   |-----------------------|
   |  possible GS cookie   |
   |-----------------------|
   |   Outgoing arg space  | // multiple of 8 bytes; if required (i.e., #outsz != 0)
   |-----------------------| <---- Ambient SP
   |       |               |
   ~       | Stack grows   ~
   |       | downward      |
@shushanhf shushanhf force-pushed the adjust_calleeSavedRegs_offset branch from 3092323 to e254dd6 Compare April 22, 2024 07:26
Copy link
Member

@sirntar sirntar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jakobbotsch
Copy link
Member

@shushanhf Can you please avoid amending/force pushing to PRs? It makes it hard for us to see exactly what changed and to review the new changes. If you can push them as new commits that makes it much easier. We squash every PR on merge, so the history will be cleaned up anyway.

@rzsc
Copy link
Contributor

rzsc commented Apr 22, 2024

PR tested on riscv64 architecture and ready to be merged. Thanks!

@shushanhf
Copy link
Contributor Author

shushanhf commented Apr 23, 2024

@shushanhf Can you please avoid amending/force pushing to PRs?

Ok, thanks, I will.

It makes it hard for us to see exactly what changed and to review the new changes. If you can push them as new commits that makes it much easier. We squash every PR on merge, so the history will be cleaned up anyway.

Sorry, I didn't know this.
I thought the github has a new button Update branch which I thought I had to update the PR's base as soon as possible.
And I thought the github reviewing is liking the gerrit which different patches within the same PR are easy to diff where independed of the base.
Maybe the github should add the similar feature.

@jakobbotsch jakobbotsch merged commit edb6e7c into dotnet:main Apr 23, 2024
108 of 110 checks passed
@jakobbotsch
Copy link
Member

Thanks for helping me understand the motivation for the change, and for making LA64/RV64 more similar to ARM64 around OSR.

@jakobbotsch
Copy link
Member

@BruceForstall If you have any feedback around this then please feel free to leave it. To sum up the motivation of the change you can see #100962 (comment).

@shushanhf shushanhf deleted the adjust_calleeSavedRegs_offset branch April 23, 2024 11:53
matouskozak pushed a commit to matouskozak/runtime that referenced this pull request Apr 30, 2024
…et#100962)

The frame layout:
   |                       |
   |-----------------------|
   |  incoming arguments   |
   +=======================+ <---- Caller's SP
   |  Varargs regs space   | // Only for varargs main functions; not used for LA64.
   |-----------------------|
   |    MonitorAcquired    | // 8 bytes; for synchronized methods
   |-----------------------|
   |        PSP slot       | // 8 bytes (omitted in NativeAOT ABI)
   |-----------------------|
   |Callee saved registers | // multiple of 8 bytes, not includting FP/RA
   |-----------------------|
   |      Saved RA         | // 8 bytes
   |-----------------------|
   |      Saved FP         | // 8 bytes
   |-----------------------|
   |  possible GS cookie   |
   |-----------------------|
   | locals, temps, etc.   |
   |-----------------------|
   |  possible GS cookie   |
   |-----------------------|
   |   Outgoing arg space  | // multiple of 8 bytes; if required (i.e., #outsz != 0)
   |-----------------------| <---- Ambient SP
   |       |               |
   ~       | Stack grows   ~
   |       | downward      |
michaelgsharp pushed a commit to michaelgsharp/runtime that referenced this pull request May 9, 2024
…et#100962)

The frame layout:
   |                       |
   |-----------------------|
   |  incoming arguments   |
   +=======================+ <---- Caller's SP
   |  Varargs regs space   | // Only for varargs main functions; not used for LA64.
   |-----------------------|
   |    MonitorAcquired    | // 8 bytes; for synchronized methods
   |-----------------------|
   |        PSP slot       | // 8 bytes (omitted in NativeAOT ABI)
   |-----------------------|
   |Callee saved registers | // multiple of 8 bytes, not includting FP/RA
   |-----------------------|
   |      Saved RA         | // 8 bytes
   |-----------------------|
   |      Saved FP         | // 8 bytes
   |-----------------------|
   |  possible GS cookie   |
   |-----------------------|
   | locals, temps, etc.   |
   |-----------------------|
   |  possible GS cookie   |
   |-----------------------|
   |   Outgoing arg space  | // multiple of 8 bytes; if required (i.e., #outsz != 0)
   |-----------------------| <---- Ambient SP
   |       |               |
   ~       | Stack grows   ~
   |       | downward      |
@github-actions github-actions bot locked and limited conversation to collaborators May 24, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
arch-loongarch64 arch-riscv Related to the RISC-V architecture area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants