Improve performance of anomaEncode / anomaDecode in the Core evaluator #2975

paulcadman · 2024-08-27T20:27:04Z

This PR:

Adds a new implementation of {decode, encode}ByteString functions, used by anomaEncode and anomaDecode in the Core evaluator
Adds property tests for roundtripping and benchmarks for the new functions.

The old implementation used bitvec to manipulate the ByteString. This was far too slow. The new implementation uses bit operations directly on the input integer and ByteArray.

It's now possible to run anoma-app-patterns:Tests/Swap.juvix to completion.

For encoding, if the size of the output integer exceeds 64 bits (and therefore a BigInt must be used) then the new implementation has quadratic time complexity in the number of input bytes if an implementation of ByteString -> Integer is used as follows:

byteStringToIntegerLE :: ByteString -> Integer
byteStringToIntegerLE = BS.foldr (\b acc -> acc `shiftL` 8 .|. fromIntegral b) 0

byteStringToInteger' :: ByteString -> Integer
byteStringToInteger' = BS.foldl' (\acc b -> acc `shiftL` 8 .|. fromIntegral b) 0

I think this is because shiftL is expensive for large Integers. To mitigate this I'm splitting the input ByteString into 1024 byte chunks and processing each separately. Using this we get 100x speed up at ~0.25Mb input over the non-chunked approach and linear time-complexity thereafter.

Benchmarks

The benchmarks for encoding and decoding 250000 bytes:

 ByteString Encoding to/from integer
      encode bytes to integer:   OK
        59.1 ms ± 5.3 ms
      decode bytes from integer: OK
        338  ms ±  16 ms

The previous implementation would never complete for this input.

Benchmarks for encoding and decoding 2 * 250000 bytes:

    ByteString Encoding to/from integer
      encode bytes to integer:   OK
        121  ms ± 8.3 ms
      decode bytes from integer: OK
        651  ms ±  27 ms

Benchmarks for encoding and decoding 4 * 250000 bytes:

    ByteString Encoding to/from integer
      encode bytes to integer:   OK
        249  ms ±  17 ms
      decode bytes from integer: OK
        1.317 s ±  16 ms

lukaszcz · 2024-08-28T09:16:21Z

Can't we solve this just by eliminating the use of shifts on Integers? They are the culprits here, but just bit-testing should be constant time. I changed two functions in Nockma.Encoding.Base to:

-- | Binary encode an integer to a vector of bits, ordered from least to most significant bits.
-- NB: 0 is encoded as the empty bit vector is specified by the Hoon serialization spec
writeIntegral :: forall a r. (Integral a, Member BitWriter r) => a -> Sem r ()
writeIntegral x
  | x < 0 = error "integerToVectorBits: negative integers are not supported in this implementation"
  | otherwise = unfoldBits 0 (fromIntegral x)
  where
    len = bitLength x

    unfoldBits :: Int -> Integer -> Sem r ()
    unfoldBits idx n
      | idx == len = return ()
      | otherwise = writeBit (Bit (testBit n idx)) <> unfoldBits (idx + 1) n

-- | Computes the number of bits required to store the argument in binary
-- NB: 0 is encoded to the empty bit vector (as specified by the Hoon serialization spec), so 0 has bit length 0.
bitLength :: (Integral a) => a -> Int
bitLength n
  | n == 0 = 0
  | otherwise = fromIntegral (integerLog2 (abs (fromIntegral n))) + 1

and afterwards Tests/Swap.juvix in anoma-app-patterns finishes in reasonable time. The decodes seem to be more-or-less instantenous (as they should), encodes still take time, because I didn't change anything there.

I think this chunk-division thing might be complicating things too much.

Perhaps the most efficient way of doing this conversion would be to use one of the low-level library functions, e.g., integerToMutableByteArray (https://hackage.haskell.org/package/base-4.20.0.1/docs/GHC-Num.html#v:integerToMutableByteArray). There's also integerFromByteArray. Unfortunately, my search with Hoogle didn't turn up any higher-level wrappers around those functions.

paulcadman · 2024-08-28T11:10:15Z

Can't we solve this just by eliminating the use of shifts on Integers? They are the culprits here, but just bit-testing should be constant time. I changed two functions in Nockma.Encoding.Base to:

Thank you, this is a better approach!

I also looked at the low-level library functions integerToMutableByteArray et al. They seem quite complicated to use as they involve doing memory allocations and pointer manipulation.

paulcadman · 2024-08-28T17:17:23Z

Can't we solve this just by eliminating the use of shifts on Integers? They are the culprits here, but just bit-testing should be constant time. I changed two functions in Nockma.Encoding.Base to:

Thank you, this is a better approach!

@lukaszcz I've reverted to the bitvec implementation using your suggestion for decoding. For me the juvix eval Tests/Swap.juvix test is still too slow without the chunking of the bytestring when encoding to an Integer (1m45s without chunking vs 20s with chunking). So I have kept the chunking bytestring -> integer function for encoding.

lukaszcz · 2024-08-29T14:55:23Z

The encoding function using shiftL is still quadratic, only the constant factor is reduced by chunking. This is not long-term sustainable -- the encoding should just be linear.

Ultimately, we should implement this in linear time using low-level bit/byte manipulation. Maybe it's possible to just do it "by hand" given the definition of Integer: https://hackage.haskell.org/package/base-4.20.0.1/docs/GHC-Num.html#t:Integer
One can just pattern match on an Integer, box the values, convert to ByteArray, prepend one byte indicating which constructor was used. Decode by reading the first byte (tag), converting the rest to ByteArray, matching on it to get the unboxed value, and creating appropriate Integer using the constructors.

lukaszcz · 2024-08-29T15:26:52Z

Or maybe we could just use Data.Serialize? The serialization/deserialization implementation for Integer should be linear there.

paulcadman · 2024-08-29T15:49:06Z

Or maybe we could just use Data.Serialize? The serialization/deserialization implementation for Integer should be linear there.

anomaEncode 'serializes' an arbitrary ByteString to an Integer. So I'm not sure we can use Data.Serialize because serialize : Integer -> ByteString is not surjective?

This reduces the number of shiftL performed on a large Integer

…n `byteStringToIntegerBE`

@lukaszcz

For decoding, use an implementation from @lukaszcz to avoid calling shiftR when writing the bits of an Integer. For encoding I continue to use the chunked encoding of ByteString to Integer. Co-authored-by: Lukasz Czajka <lukasz@heliax.dev>

paulcadman requested review from janmasrovira and lukaszcz August 27, 2024 20:28

paulcadman self-assigned this Aug 27, 2024

paulcadman added this to the 0.6.6 milestone Aug 27, 2024

paulcadman added the anoma label Aug 27, 2024

paulcadman marked this pull request as draft August 28, 2024 07:54

paulcadman force-pushed the bytestring-integer-encoding-optimized-2 branch from fe59ade to 2a60339 Compare August 28, 2024 17:09

paulcadman marked this pull request as ready for review August 28, 2024 17:17

paulcadman force-pushed the bytestring-integer-encoding-optimized-2 branch from 2a60339 to 04a042a Compare August 28, 2024 17:17

lukaszcz force-pushed the bytestring-integer-encoding-optimized-2 branch from 04a042a to 0d2758f Compare August 29, 2024 14:55

lukaszcz approved these changes Aug 29, 2024

View reviewed changes

paulcadman force-pushed the bytestring-integer-encoding-optimized-2 branch 2 times, most recently from 4cd4f8a to 0005f1c Compare August 30, 2024 13:41

paulcadman and others added 5 commits August 30, 2024 17:05

Use a more efficient version of encode/decoding to ByteString

4d74796

Add benchmark for bytestring encoding to integer

30abf6e

Use a chunked version of byteStringToIntegerBE

e146f0a

This reduces the number of shiftL performed on a large Integer

The encoding/decoding size range must be greater than the chunkSize i…

1b7df9d

…n `byteStringToIntegerBE`

paulcadman force-pushed the bytestring-integer-encoding-optimized-2 branch from 0005f1c to b725f2e Compare August 30, 2024 16:05

paulcadman merged commit 87c5f0a into main Aug 30, 2024
4 checks passed

paulcadman deleted the bytestring-integer-encoding-optimized-2 branch August 30, 2024 17:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance of anomaEncode / anomaDecode in the Core evaluator #2975

Improve performance of anomaEncode / anomaDecode in the Core evaluator #2975

paulcadman commented Aug 27, 2024 •

edited

Loading

lukaszcz commented Aug 28, 2024 •

edited

Loading

paulcadman commented Aug 28, 2024

paulcadman commented Aug 28, 2024

lukaszcz commented Aug 29, 2024

lukaszcz commented Aug 29, 2024

paulcadman commented Aug 29, 2024

Improve performance of anomaEncode / anomaDecode in the Core evaluator #2975

Improve performance of anomaEncode / anomaDecode in the Core evaluator #2975

Conversation

paulcadman commented Aug 27, 2024 • edited Loading

Benchmarks

lukaszcz commented Aug 28, 2024 • edited Loading

paulcadman commented Aug 28, 2024

paulcadman commented Aug 28, 2024

lukaszcz commented Aug 29, 2024

lukaszcz commented Aug 29, 2024

paulcadman commented Aug 29, 2024

paulcadman commented Aug 27, 2024 •

edited

Loading

lukaszcz commented Aug 28, 2024 •

edited

Loading