Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bitblasting of multipliers differs wildly between Cryptol and SAW #1788

Open
weaversa opened this issue Dec 15, 2022 · 13 comments
Open

Bitblasting of multipliers differs wildly between Cryptol and SAW #1788

weaversa opened this issue Dec 15, 2022 · 13 comments
Labels
subsystem: hardware Issues related to verification of hardware type: bug Issues reporting bugs or unexpected/unwanted behavior
Milestone

Comments

@weaversa
Copy link
Contributor

Consider the following C function:

#include <stdint.h>

uint32_t mult(uint32_t x) {
  return x * 0x85EBCA77U;
}

We can compile the code to LLVM like so:

$ clang -emit-llvm -c mult.c -o mult.bc

We then create AIGs from this and an identical function in Cryptol like so:

$saw
sawscript> m <- llvm_load_module "mult.bc"
sawscript> mult <- llvm_extract m "mult"
sawscript> write_aig "saw_mult" {{ mult }}
sawscript> write_aig "cryptol_mult" {{ \x -> x * 0x85EBCA77 }}

These two AIGs are wildly different, making them very difficult to prove equivalent. We can see their different structure using abc:

$ abc -c "&r saw_mult; &if -K 3; &ps -n"
ABC command line: "&r saw_mult; &if -K 3; &ps -n".

saw_mult : i/o =     32/     32  and =    5476  lev =  176 (114.56)  mem = 0.06 MB
Mapping (K=3)  :  lut =   1093  edge =    2862  lev =   56 (33.47)  mem = 0.04 MB
NPN CLASS STATISTICS (for 1093 LUT4 present in the current mapping):
  1: Class   2 :  Count =      3   (   0.27 %)   F = (!d*(!c*!b))
  2: Class   5 :  Count =    376   (  34.37 %)   F = (!d*!c)
  3: Class  15 :  Count =     11   (   1.01 %)   F = (!d*!(c*b))
  4: Class  21 :  Count =    254   (  23.22 %)   F = (!d)
  5: Class  81 :  Count =      1   (   0.09 %)   F = 16(b,c,d)
  6: Class  83 :  Count =      1   (   0.09 %)   F = 17(b,c,d)
  7: Class 103 :  Count =      1   (   0.09 %)   F = 18(b,c,d)
  8: Class 105 :  Count =      8   (   0.73 %)   F = 19(b,c,d)
  9: Class 120 :  Count =    408   (  37.29 %)   F = (d+!(!c*!b))
 10: Class 180 :  Count =     30   (   2.74 %)   F = (d+c)
Other     :  Count =      0   (   0.00 %)
Approximate number of 4:1 MUX structures: All =      0  (   0.00 %)  MFFC =      0  (   0.00 %)
$ abc -c "&r cryptol_mult; &if -K 3; &ps -n"
ABC command line: "&r cryptol_mult; &if -K 3; &ps -n".

cryptol_mult : i/o =     32/     32  and =    3790  lev =  128 (64.81)  mem = 0.05 MB
Mapping (K=3)  :  lut =    799  edge =    2158  lev =   46 (23.44)  mem = 0.03 MB
NPN CLASS STATISTICS (for 799 LUT4 present in the current mapping):
  1: Class   5 :  Count =      2   (   0.25 %)   F = (!d*!c)
  2: Class  13 :  Count =     14   (   1.75 %)   F = (!d*(c+b))
  3: Class  15 :  Count =      1   (   0.12 %)   F = (!d*!(c*b))
  4: Class  83 :  Count =    274   (  34.25 %)   F = 17(b,c,d)
  5: Class 120 :  Count =      4   (   0.50 %)   F = (d+!(!c*!b))
  6: Class 180 :  Count =    238   (  29.75 %)   F = (d+c)
  7: Class 220 :  Count =    266   (  33.25 %)   F = (d+(c+!b))
Other     :  Count =      0   (   0.00 %)
Approximate number of 4:1 MUX structures: All =      0  (   0.00 %)  MFFC =      0  (   0.00 %)

The bitblasted AIG from SAW is multiplying via some process I can't identify, whereas the Cryptol AIG seems to be using standard full adder chain (the 17(b,c,d) is a MUX, and + is XOR).

When I look at the sawscript representation of both I get:

sawscript> show mult
[17:31:20.235] "\\(arg_0 : Vec 32 Bool) -> bvMul 32 (bvNat 32 2246822519) arg_0"
sawscript> show {{ \x -> x * 0x85EBCA77 }}
[17:31:46.314] "let { x@1 = Vec 32 Bool\n      x@2 = TCNum 32\n    }\n in \\(x : x@1) ->\n      ecMul x@1 (PRingSeqBool x@2) x\n        (ecNumber (TCNum 2246822519) x@1 (PLiteralSeqBool x@2))"

Could it be that bvMul and ecMul are using different bitblasting methods?

To reach farther - let me strongly suggest bitblasting methods between Cryptol primitives and SAW primitives be synced to make property testing between spec and implementation easier for backend solvers.

@eddywestbrook
Copy link
Contributor

ecMul on bitvectors unfolds to bvMul in SAW core. If you want to see how this works, it is controlled by these definitions in Cryptol.sawcore:

PRingWord : (n : Nat) -> PRing (Vec n Bool);
PRingWord n =
  { ringZero = bvNat n 0
  , add = bvAdd n
  , sub = bvSub n
  , mul = bvMul n
  , neg = bvNeg n
  , int = intToBv n
  };

ecMul : (a : sort 0) -> PRing a -> a -> a -> a;
ecMul a pa = pa.mul;

Cryptol multiplication gets translated to ecMul applied to an object of type PRing (Vec n Bool), which is a typeclass instance of the ring methods on the bitvector type Vec n Bool. That typeclass instance is supplied by PRingWord, which defined the ring methods to just be the standard bitvector operations, including bvMul. If you unfold ecMul and PRingWord in the SAW core translation, it should reduce to an application of bvMul.

@weaversa
Copy link
Contributor Author

I think I found the issue.

Even though the C and Cryptol agree on order of arguments (see here):

x * 0x85EBCA77U;
x * 0x85EBCA77

The transformation from C -> LLVM -> SAWCore swaps the order to 2246822519 * x.

sawscript> show {{ mult }}
"\\(arg_0 : Vec 32 Bool) -> bvMul 32 (bvNat 32 2246822519) arg_0"
sawscript> show (normalize_term {{ \x -> x * 0x85EBCA77 }})
"\\(x : Vec 32 Bool) -> bvMul 32 x (bvNat 32 2246822519)"

I don't have llvm-dis installed so I can't see if the swap is happening in LLVM or SAW. Would someone check to see where the reordering is happening?

@weaversa
Copy link
Contributor Author

I wonder if the symbolic execution of bvMul (and other 2-input bv intrinsics) can be made to produce identical results in cases where one side or the other is a constant?

@RyanGlScott
Copy link
Contributor

One can inspect that LLVM bitcode by compiling it with clang -emit-llvm -S:

$ clang -emit-llvm -S mult.c
$ cat mult.ll
; ModuleID = 'mult.c'
source_filename = "mult.c"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-pc-linux-gnu"

; Function Attrs: noinline nounwind optnone uwtable
define dso_local i32 @mult(i32 %0) #0 {
  %2 = alloca i32, align 4
  store i32 %0, i32* %2, align 4
  %3 = load i32, i32* %2, align 4
  %4 = mul i32 %3, -2048144777
  ret i32 %4
}

attributes #0 = { noinline nounwind optnone uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="all" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }

!llvm.module.flags = !{!0}
!llvm.ident = !{!1}

!0 = !{i32 1, !"wchar_size", i32 4}
!1 = !{!"clang version 10.0.0-4ubuntu1 "}

From this, we know that the multiplication still puts the 0x85EBCA77U constant on the right.

Your hunch about the reordering occurring during the C -> LLVM -> SAWCore pipeline. More specifically, it happens during the translation from LLVM bitcode to What4 to SAWCore. What4 aggressively optimizes intermediate expressions to put them into a sort of normal form. (Usually, this normal form is amenable to easier treatment by SMT solvers.) This can be seen in this What4 program:

{-# LANGUAGE DataKinds #-}
{-# LANGUAGE GADTs #-}
{-# LANGUAGE TypeApplications #-}
module Main where

import qualified Data.BitVector.Sized as BVS
import Data.Foldable (forM_)
import System.IO (FilePath)

import Data.Parameterized.Nonce (newIONonceGenerator)
import Data.Parameterized.Some (Some(..))

import What4.Config
import What4.Expr
import What4.Interface
import What4.Solver
import What4.Protocol.SMTLib2


main :: IO ()
main = do
  Some ng <- newIONonceGenerator
  sym <- newExprBuilder FloatIEEERepr EmptyExprBuilderState ng
  let w = knownNat @32
  x <- freshConstant sym (safeSymbol "x") (BaseBVRepr w)
  let cBV = BVS.mkBV w 0x85EBCA77
  putStrLn $ "c = " ++ show cBV
  c <- bvLit sym w cBV

  putStrLn "c on the left"
  r1 <- bvMul sym c x
  print $ printSymExpr r1

  putStrLn "c on the right"
  r2 <- bvMul sym x c
  print $ printSymExpr r2

Regardless of which side of the bvMul you put the constant on, What4 will optimize both expressions to the same thing:

c = BV 2246822519
c on the left
bvSum (bvMul 0x85ebca77:[32] cx@0:bv)
c on the right
bvSum (bvMul 0x85ebca77:[32] cx@0:bv)

On the other hand, {{ \x -> x * 0x85EBCA77 }} goes from Cryptol -> SAWCore -> AIG, without ever using What4 as an intermediate format. This explains why cryptol_mult never rearranges the position of the constant in the multiplication.


Having never really used write_aig much myself, I'm not sure if I have a strong recommendation about the right course of action here. I will note that write_aig_external does go through What4, so that is one way to make the output consistent:

$ ~/Software/saw-0.9/bin/saw
sawscript> m <- llvm_load_module "mult.bc"
sawscript> mult <- llvm_extract m "mult"
sawscript> write_aig_external "saw_mult_external" {{ mult }}
sawscript> write_aig_external "cryptol_mult_external" {{ \x -> x * 0x85EBCA77 }}
$ abc -c "&r saw_mult_external; &if -K 3; &ps -n"
ABC command line: "&r saw_mult_external; &if -K 3; &ps -n".

saw_mult_external : i/o =     32/     32  and =    2929  lev =   76 (38.81)  mem = 0.04 MB
Mapping (K=3)  :  lut =    554  edge =    1612  lev =   30 (14.56)  mem = 0.02 MB
NPN CLASS STATISTICS (for 554 LUT4 present in the current mapping):
  1: Class   5 :  Count =      2   (   0.36 %)   F = (!d*!c)
  2: Class  15 :  Count =      2   (   0.36 %)   F = (!d*!(c*b))
  3: Class  83 :  Count =    247   (  44.50 %)   F = 17(b,c,d)
  4: Class 120 :  Count =      2   (   0.36 %)   F = (d+!(!c*!b))
  5: Class 180 :  Count =     48   (   8.65 %)   F = (d+c)
  6: Class 220 :  Count =    253   (  45.59 %)   F = (d+(c+!b))
Other     :  Count =      0   (   0.00 %)
Approximate number of 4:1 MUX structures: All =      0  (   0.00 %)  MFFC =      0  (   0.00 %)
$ abc -c "&r cryptol_mult_external; &if -K 3; &ps -n"
ABC command line: "&r cryptol_mult_external; &if -K 3; &ps -n".

cryptol_mult_external : i/o =     32/     32  and =    2929  lev =   76 (38.81)  mem = 0.04 MB
Mapping (K=3)  :  lut =    554  edge =    1612  lev =   30 (14.56)  mem = 0.02 MB
NPN CLASS STATISTICS (for 554 LUT4 present in the current mapping):
  1: Class   5 :  Count =      2   (   0.36 %)   F = (!d*!c)
  2: Class  15 :  Count =      2   (   0.36 %)   F = (!d*!(c*b))
  3: Class  83 :  Count =    247   (  44.50 %)   F = 17(b,c,d)
  4: Class 120 :  Count =      2   (   0.36 %)   F = (d+!(!c*!b))
  5: Class 180 :  Count =     48   (   8.65 %)   F = (d+c)
  6: Class 220 :  Count =    253   (  45.59 %)   F = (d+(c+!b))
Other     :  Count =      0   (   0.00 %)
Approximate number of 4:1 MUX structures: All =      0  (   0.00 %)  MFFC =      0  (   0.00 %)

@weaversa
Copy link
Contributor Author

Thanks for delving into this.

Having never really used write_aig much myself

It's not just write_aig that's vulnerable, it's any call to a non-SMT solver such as abc or a CNF solver. This issue (that w4 is used to optimize only some terms) makes any command that passes a property from Cryptol or otherwise (not through w4) less useful. What would you think about a pair of new saw commands that rewrite a Term/ProofScript using w4?

Also, I'm not 100% sure it's just w4 that's the culprit -- proving the two equivalent goes instantly using w4_abc_verilog, but takes a very long time using w4_abc_aiger (which I assume passes through w4).

sawscript> m <- llvm_load_module "mult.bc"
sawscript> mult <- llvm_extract m "mult"
sawscript> prove w4_abc_verilog {{ \x -> mult x == x * 0x85EBCA77 }}
[16:24:35.962] Valid
sawscript> prove w4_abc_aiger {{ \x -> mult x == x * 0x85EBCA77 }}
...waiting...

@weaversa
Copy link
Contributor Author

More specifically, it happens during the translation from LLVM bitcode to What4 to SAWCore. What4 aggressively optimizes intermediate expressions to put them into a sort of normal form.

If that's the case, then why do write_aig and write_aig_external produce drastically different results for saw_mult? I would expect instead for w4 to change cryptol_mult to look more like saw_mult and not the other way around. There's not much structural difference between write_aig and write_aig_external for cryptol_mult except the commands produce circuits with different input/output orderings...why? Is that a result of w4?

sawscript> write_aig_external "cryptol_mult_external.aig" {{ \x -> x * 0x85EBCA77 }}
sawscript> write_aig "cryptol_mult.aig" {{ \x -> x * 0x85EBCA77 }}
$ abc -c "&r cryptol_mult.aig; &if -K 3; &ps -n"
ABC command line: "&r cryptol_mult.aig; &if -K 3; &ps -n".

cryptol_mult : i/o =     32/     32  and =    3790  lev =  128 (64.81)  mem = 0.05 MB
Mapping (K=3)  :  lut =    799  edge =    2158  lev =   46 (23.44)  mem = 0.03 MB
NPN CLASS STATISTICS (for 799 LUT4 present in the current mapping):
  1: Class   5 :  Count =      2   (   0.25 %)   F = (!d*!c)
  2: Class  13 :  Count =     14   (   1.75 %)   F = (!d*(c+b))
  3: Class  15 :  Count =      1   (   0.12 %)   F = (!d*!(c*b))
  4: Class  83 :  Count =    274   (  34.25 %)   F = 17(b,c,d)
  5: Class 120 :  Count =      4   (   0.50 %)   F = (d+!(!c*!b))
  6: Class 180 :  Count =    238   (  29.75 %)   F = (d+c)
  7: Class 220 :  Count =    266   (  33.25 %)   F = (d+(c+!b))
Other     :  Count =      0   (   0.00 %)
Approximate number of 4:1 MUX structures: All =      0  (   0.00 %)  MFFC =      0  (   0.00 %)
$ abc -c "&r cryptol_mult_external.aig; &if -K 3; &ps -n"
ABC command line: "&r cryptol_mult_external.aig; &if -K 3; &ps -n".

cryptol_mult_external : i/o =     32/     32  and =    2929  lev =   76 (38.81)  mem = 0.04 MB
Mapping (K=3)  :  lut =    554  edge =    1612  lev =   30 (14.56)  mem = 0.02 MB
NPN CLASS STATISTICS (for 554 LUT4 present in the current mapping):
  1: Class   5 :  Count =      2   (   0.36 %)   F = (!d*!c)
  2: Class  15 :  Count =      2   (   0.36 %)   F = (!d*!(c*b))
  3: Class  83 :  Count =    247   (  44.50 %)   F = 17(b,c,d)
  4: Class 120 :  Count =      2   (   0.36 %)   F = (d+!(!c*!b))
  5: Class 180 :  Count =     48   (   8.65 %)   F = (d+c)
  6: Class 220 :  Count =    253   (  45.59 %)   F = (d+(c+!b))
Other     :  Count =      0   (   0.00 %)
Approximate number of 4:1 MUX structures: All =      0  (   0.00 %)  MFFC =      0  (   0.00 %)
$ abc -c "&cec cryptol_mult.aig cryptol_mult_external.aig"
ABC command line: "&cec cryptol_mult.aig cryptol_mult_external.aig".

Disproved at least one output of the miter (zero-based number 0).
Networks are NOT EQUIVALENT.  Time =     0.00 sec

Here we use ABC's Boolean Matching command that does PP-equivalence (http://www.eecs.umich.edu/~imarkov/pubs/conf/date10-match.pdf)

$ abc -c "bm cryptol_mult.aig cryptol_mult_external.aig"
ABC command line: "bm cryptol_mult.aig cryptol_mult_external.aig".

...
*** Circuits are equivalent ***
Init Time = 0.14
Simulation Time = 0.02
SAT Time = 0.00
Overall Time = 0.16

@RyanGlScott
Copy link
Contributor

What would you think about a pair of new saw commands that rewrite a Term/ProofScript using w4?

Yes, having commands like w4_rewrite : Term -> Term and w4_simplify : ProofGoal () is probably a good idea. We could also consider introducing sbv_rewrite and sbv_simplify for consistency's sake, although the it's unclear if one would ever need them.

Also, I'm not 100% sure it's just w4 that's the culprit -- proving the two equivalent goes instantly using w4_abc_verilog, but takes a very long time using w4_abc_aiger (which I assume passes through w4).

Very confusingly, I'm not even sure w4_abc_aiger actually uses What4 at all. At least, judging from a quick look over the series of function calls that w4_abc_aiger makes, you eventually wind up in the saw-core-aig library, which converts SAWCore terms directly to AIG without going through What4. If that is the case, then we ought to reconsider the name w4_abc_aiger, as the w4_ prefix would be misleading.

In contrast, I am reasonably certain that w4_abc_verilog does use What4. It eventually calls into the What4.Protocol.VerilogWriter module, which handles converts a What4 term to Verilog.

If that's the case, then why do write_aig and write_aig_external produce drastically different results for saw_mult? I would expect instead for w4 to change cryptol_mult to look more like saw_mult and not the other way around. There's not much structural difference between write_aig and write_aig_external for cryptol_mult except the commands produce circuits with different input/output orderings...why? Is that a result of w4?

Hah, this turns out to be yet another case of misleadingly named functions! write_aig, like w4_abc_aiger, converts directly from SAWCore to AIG. write_aig_external, on the other hand, uses Verilog as an intermediate representation, and the conversion to Verilog happens through... What4! In general, it appears that anything in SAW that involves Verilog will go through What4 at some point. Alas, we don't consistently signpost all of the Verilog commands with w4_ to make this obvious, as evidenced by the fact that we have both w4_abc_verilog (which uses a w4_ prefix) and write_aig_external (which doesn't use a w4_ prefix).

So yes, I do expect that the use of What4 explains much of the difference between the outputs of write_aig and write_aig_external. I'm not sure if it explains everything (perhaps the use of Verilog also matters), but What4 certainly plays a contributing role.

@robdockins
Copy link
Contributor

Note that goal_eval and goal_eval_uninit already do the round-trip via What4 for proof goals. The corresponding term rewriting functions are in #927

@RyanGlScott
Copy link
Contributor

#1828 is a related issue that discusses the issue of write_aig producing differently structured AIG files depending on which side of a multiplication a constant appears, which is independent of What4.

RyanGlScott added a commit that referenced this issue Mar 9, 2023
The fix for #1828 also addresses many of the symptoms of #1788, so let's add
a test case to ensure that those symptoms remain cured.
RyanGlScott added a commit that referenced this issue Mar 9, 2023
The fix for #1828 also addresses many of the symptoms of #1788, so let's add
a test case to ensure that those symptoms remain cured.
@RyanGlScott
Copy link
Contributor

Most of this issue has been addressed in #1788. I will leave this issue open as a reminder to land #927, which adds commands for round-tripping via What4.

@sauclovian-g
Copy link
Contributor

also see #906, which is somewhat related, and I would have sworn there was another issue talking about using what4 to simplify terms but I can't find it now.

@sauclovian-g sauclovian-g added type: bug Issues reporting bugs or unexpected/unwanted behavior subsystem: hardware Issues related to verification of hardware labels Nov 5, 2024
@sauclovian-g sauclovian-g added this to the Someday milestone Nov 5, 2024
@RyanGlScott
Copy link
Contributor

Perhaps you are thinking of #1436? In particular, #927 would also address part (2) of this comment in that issue.

@sauclovian-g
Copy link
Contributor

Yes, I think that must be the one I was thinking of.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
subsystem: hardware Issues related to verification of hardware type: bug Issues reporting bugs or unexpected/unwanted behavior
Projects
None yet
Development

No branches or pull requests

5 participants