Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Target request: x86_64 without ADX #143

Open
andres-erbsen opened this issue May 2, 2023 · 2 comments
Open

Target request: x86_64 without ADX #143

andres-erbsen opened this issue May 2, 2023 · 2 comments

Comments

@andres-erbsen
Copy link

andres-erbsen commented May 2, 2023

It would be nice to use CryptOpt to generate plain x86_64 code that does not depend on the ADX extension, to serve as a fallback from CryptOpt-optimized fast assembly in distributed binaries. This is a requirement for deployment in BoringSSL, and I hear it may be relevant to adoption of mit-plv/fiat-crypto#1582 as well.

I am thinking of use of CryptOpt in this context as primarily an assurance benefit, though if it's decently fast still, even better.

I would be happy to do the work for adapting CryptOpt here if you think that this would be a good first project to hack on in the CryptOpt codebase.

@dderjoel
Copy link
Collaborator

dderjoel commented May 2, 2023

There is a couple things here

  • we'd need switches '--no-adx' and '--no-bmi2' which would then forbid 'mulx' and 'adcx' and 'adox' instructions for the former and forbid 'bzhi' and 'shlx' instructions for the latter.
  • not using mulx results in using mul, imul instead. Those implicitly use rdx and rax. Currently, as only mulx is supported, the register allocator has an option "one source argument must be in rdx", which is already a big (complex) block of code. (I wanted to refactor this ages ago already.) to support mul, we'd need an option 'one must be in rax and rdx will be overwritten'. (Preferably, the optimizer could then later choose either mul / mulx depending on where arguments are and if flags are alive). (Assemblyline also currently does not have support for mul, but that should be simple to add).
  • Currently, the optimizer chooses to between add with carry and add with overflow if both flags are in the same state, (i.e. alive / killed / zero). Otherwise it uses the usable flag (i.e. chooses adcx if OF is alive, etc). This could be easily changed to always use add-with-carry. However, I have not looked into the details yet. There are many many templates for each combination of where arguments are, I'm unsure how complex that is to only use add/adc combinations.
  • the bzhi is used as an alternative to and rxx, 0xfffff, can be easily turned off by now allowing alternatives in &-operations.
  • the shlx can read / write to different locations with the shift amount in a register, where shl operates in place with an imm. I think that could be a rather simple change in the shift templates.

I would be happy to do the work for adapting CryptOpt here if you think that this would be a good first project to hack on in the CryptOpt codebase.

Hard to tell. I wonder if it would make sense to dive into that now or refactor beforehand to have some sort of capability system, based on which CryptOpt can emit code constructs. (Thinking of bringing this to Go-Assembly or ARM).

@andres-erbsen
Copy link
Author

andres-erbsen commented May 3, 2023

Ok, thank you for the overview! Adding support for more constrained register allocation seems to be the main challenge here, and I don't feel up to tackling it right now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants