Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Base64 library to utils #2884

Merged
merged 11 commits into from
Dec 29, 2021

Conversation

ernestognw
Copy link
Member

@ernestognw ernestognw commented Sep 25, 2021

Fixes #2859

Description

As previously discussed on #2859, seems like Base64 has been used as a way to construct Safe-URL tokenURIs for ERC721. This PR aims to provide a standard library to perform those operations through Open Zeppelin utils.

PR Checklist

  • Tests
  • Documentation
  • Changelog entry
Notes

I haven't been able to run docs locally to start creating the documentation for this, and according to the checklist, I don't know how to update the Changelog entry. Would be helpful to have some guide from one of the team members

@ernestognw ernestognw force-pushed the feature/add-base64-library-#2859 branch from 73b6c96 to 00e6097 Compare September 25, 2021 20:44

import "../utils/Base64.sol";

contract Base64Mock {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just my 2 cents but as an alternative, you can have the following line at the top of the contract:
using Base64 for bytes;

and then in your encode:
value.encode();

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it looks better just using the .encode but I'm following the same pattern used for Strings mock. I'll wait if there's somebody from the OZ team who has input on this and why to use one or the other.

@ernestognw ernestognw force-pushed the feature/add-base64-library-#2859 branch from e87d3b7 to 184adb8 Compare October 8, 2021 06:14
@ernestognw ernestognw force-pushed the feature/add-base64-library-#2859 branch from 184adb8 to 7c187e0 Compare October 8, 2021 06:16
@Amxx
Copy link
Collaborator

Amxx commented Nov 24, 2021

Hello @ernestognw, thank you very much for this PR, and sorry we didn't answer it sooner.

This is really something interesting that we would love to merge. I addition to actually providing code and tests, you have done a great job documenting the changes.

One of the reasons for the late answer, is we have been reluctant to dive into the assembly code. The tests are showing it works, but we are still not really comfortable with it. Our opinion is that this code would not be paid for, and just be used as part of "view" operation. Thus it might not be necessary to make it as efficient as possible, and maybe we would prefer having more easily understandable code.

Have you tried writing it in pure solidity, without any assembly ?

@cameel
Copy link
Contributor

cameel commented Nov 28, 2021

Have you tried writing it in pure solidity, without any assembly ?

I just stumbled upon this PR and decided to give it a try myself to check how the new optimizer in the compiler would fare against that hand-optimized code and see what we could improve.

Here's a version of the library in pure Solidity (WARNING: not tested at all, might be buggy):

library Base64 {
    bytes internal constant _TABLE = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";

    function encode(bytes memory data) internal pure returns (string memory) {
        if (data.length == 0) return "";

        bytes memory table = _TABLE;
        uint256 encodedLen = 4 * ((data.length + 2) / 3);

        bytes memory result = new bytes(encodedLen);

        unchecked {
            uint resultPtr = 0;

            for (uint dataPtr = 0; dataPtr + 3 <= data.length; dataPtr += 3) {
                uint input =
                    uint(uint8(data[dataPtr    ])) << 16 +
                    uint(uint8(data[dataPtr + 1])) <<  8 +
                    uint(uint8(data[dataPtr + 2]));

                result[resultPtr++] = table[((input >> 18) & 0x3f)];
                result[resultPtr++] = table[((input >> 12) & 0x3f)];
                result[resultPtr++] = table[((input >>  6) & 0x3f)];
                result[resultPtr++] = table[( input        & 0x3f)];
            }

            if (data.length % 3 == 2) {
                result[resultPtr + 1] = 0x3d;
                result[resultPtr + 2] = 0x3d;
            }
            else if (data.length % 3 == 1)
                result[resultPtr + 1] = 0x3d;
        }

        return string(result);
    }
}

You can see the resulting assembly at the end of this post. Unfortunately looks like there are still a few things that the optimizer could do but does not:

  • Inlining. The most visible difference is that there's a bunch of extra functions compared to the hand-optimized assembly. The inliner is currently a little too conservative due to the risk of running into "stack too deep" errors (inlining usually results in more local variables). That's actually something I'm working on right now - we have recently improved the EVM code generation from Yul and now we're looking into relaxing these rules a little.
    • In addition to removing jumps if all these functions got inlined, it would allow many other optimizations to be applied more broadly. For example there's some unnecessary shifting and masking due to conversions and cleaning of unused bits that could be removed if the code was not in separate functions.
  • Array length checks. Looks like they were not optimized out in this case.
  • Redundant mload8 instructions. The inline assembly version can reads the 3 input bytes it needs in each loop cycle with a single mload instruction. The pure Solidity version does 3 mloads instead.
  • Redundant mstore8 instructions. In the final switch, the hand-optimized assembly is a tiny bit more efficient because it can write two bytes using a single mstore instruction while the compiler does two mstore8 instead. Note however that the inline assembly version pays for this by wasting 32 bytes of memory in the result array (the Solidity version does not).

For now as you can see, the hand-optimized version should be cheaper. I think I'll add this code as a test case for the optimizer though and it should improve in future versions of the compiler.

Pure Solidity version passed through Yul optimizer

This is the result from running solc --optimize --ir-optimized --debug-info none <input file> on the library and the Base64Mock contract from the PR. I included only the functions that are different or not present in the inline assembly version, the whole code was a bit longer.

function memory_array_index_access_bytes(baseRef, index) -> addr
{
    if iszero(lt(index, mload(baseRef)))
    {
        mstore(0, shl(224, 0x4e487b71))
        mstore(4, 0x32)
        revert(0, 0x24)
    }
    addr := add(add(baseRef, index), 32)
}
function read_from_memoryt_bytes1(ptr) -> returnValue
{
    returnValue := and(mload(ptr), shl(248, 255))
}
function convert_bytes1_to_uint8(value) -> converted
{ converted := shr(248, value) }
function convert_uint8_to_uint256(value) -> converted
{ converted := and(value, 0xff) }
function fun_encode(var_data_mpos) -> var__mpos
{
    if iszero(mload(var_data_mpos))
    {
        var__mpos := allocate_memory_array_string()
        leave
    }
    let expr_35_mpos := copy_literal_to_memory_84d8a590de33e00cbdc16e1f28c3506f5ec15c599fab9a6a4bcd575cc2f110ce()
    let expr_mpos := allocate_and_zero_memory_array_bytes(checked_mul_uint256(checked_div_uint256(checked_add_uint256(mload(var_data_mpos)))))
    let var_resultPtr := 0x00
    let var_dataPtr := var_resultPtr
    for { }
    0x01
    {
        var_dataPtr := add(var_dataPtr, 0x03)
    }
    {
        let _1 := 0x03
        if gt(add(var_dataPtr, _1), mload(var_data_mpos)) { break }
        let _2 := convert_uint8_to_uint256(convert_bytes1_to_uint8(read_from_memoryt_bytes1(memory_array_index_access_bytes(var_data_mpos, var_dataPtr))))
        let _3 := 0x01
        let _4 := shl(add(0x10, convert_uint8_to_uint256(convert_bytes1_to_uint8(read_from_memoryt_bytes1(memory_array_index_access_bytes(var_data_mpos, add(var_dataPtr, _3)))))), _2)
        let _5 := 0x02
        let result := shl(add(0x08, convert_uint8_to_uint256(convert_bytes1_to_uint8(read_from_memoryt_bytes1(memory_array_index_access_bytes(var_data_mpos, add(var_dataPtr, _5)))))), _4)
        let _6 := 0x3f
        mstore8(memory_array_index_access_bytes(expr_mpos, var_resultPtr), byte(0x00, read_from_memoryt_bytes1(memory_array_index_access_bytes(expr_35_mpos, and(shr(18, result), _6)))))
        mstore8(memory_array_index_access_bytes(expr_mpos, add(var_resultPtr, _3)), byte(0x00, read_from_memoryt_bytes1(memory_array_index_access_bytes(expr_35_mpos, and(shr(12, result), _6)))))
        mstore8(memory_array_index_access_bytes(expr_mpos, add(var_resultPtr, _5)), byte(0x00, read_from_memoryt_bytes1(memory_array_index_access_bytes(expr_35_mpos, and(shr(6, result), _6)))))
        let _7 := read_from_memoryt_bytes1(memory_array_index_access_bytes(expr_35_mpos, and(result, _6)))
        let _8 := add(var_resultPtr, _1)
        var_resultPtr := add(var_resultPtr, 0x04)
        mstore8(memory_array_index_access_bytes(expr_mpos, _8), byte(0x00, _7))
    }
    let _9 := mod(mload(var_data_mpos), 0x03)
    switch eq(_9, 0x02)
    case 0 {
        if eq(_9, 0x01)
        {
            mstore8(memory_array_index_access_bytes(expr_mpos, add(var_resultPtr, 0x01)), 0x3d)
        }
    }
    default {
        mstore8(memory_array_index_access_bytes(expr_mpos, add(var_resultPtr, 0x01)), 0x3d)
        mstore8(memory_array_index_access_bytes(expr_mpos, add(var_resultPtr, 0x02)), 0x3d)
    }
    var__mpos := expr_mpos
}

Inline assembly version passed through Yul optimizer

function checked_add_uint256_794(x) -> sum
{
    if gt(x, not(32)) { panic_error_0x11() }
    sum := add(x, 0x20)
}
function fun_encode(var_data_mpos) -> var_mpos
{
    if iszero(mload(var_data_mpos))
    {
        var_mpos := allocate_memory_array_string()
        leave
    }
    let expr_37_mpos := copy_literal_to_memory_84d8a590de33e00cbdc16e1f28c3506f5ec15c599fab9a6a4bcd575cc2f110ce()
    let expr := checked_mul_uint256(checked_div_uint256(checked_add_uint256(mload(var_data_mpos))))
    let expr_mpos := allocate_and_zero_memory_array_string(checked_add_uint256_794(expr))
    mstore(expr_mpos, expr)
    let usr$dataPtr := var_data_mpos
    let usr$endPtr := add(var_data_mpos, mload(var_data_mpos))
    let usr$resultPtr := add(expr_mpos, 0x20)
    for { } lt(usr$dataPtr, usr$endPtr) { }
    {
        let _1 := 0x03
        usr$dataPtr := add(usr$dataPtr, _1)
        let usr$input := mload(usr$dataPtr)
        let _2 := 1
        let _3 := 0x3F
        let _4 := mload(add(add(expr_37_mpos, and(shr(18, usr$input), _3)), _2))
        let _5 := 248
        mstore(usr$resultPtr, shl(_5, _4))
        mstore(add(usr$resultPtr, _2), shl(_5, mload(add(add(expr_37_mpos, and(shr(12, usr$input), _3)), _2))))
        mstore(add(usr$resultPtr, 0x02), shl(_5, mload(add(add(expr_37_mpos, and(shr(6, usr$input), _3)), _2))))
        mstore(add(usr$resultPtr, _1), shl(_5, mload(add(add(expr_37_mpos, and(usr$input, _3)), _2))))
        usr$resultPtr := add(usr$resultPtr, 0x04)
    }
    switch mod(mload(var_data_mpos), 0x03)
    case 1 {
        mstore(add(usr$resultPtr, not(1)), shl(240, 15677))
    }
    case 2 {
        mstore(add(usr$resultPtr, not(0)), shl(248, 61))
    }
    var_mpos := expr_mpos
}

@Amxx
Copy link
Collaborator

Amxx commented Dec 21, 2021

Note I'm trying to evaluate the different implementation. It's been long overdue, but we would want this to be part of our next release.

I'm afraid to say that @cameel's implementation is not correct. It just doesn't return the expected value. I'll be trying to fix it so I can then accurately measure the impact on gas cost (if that was ever called as part of a transaction) and deployment cost.

@Amxx
Copy link
Collaborator

Amxx commented Dec 21, 2021

There is a major issue with the solidity implementation, and that is that, if the data length is not a multiple of 3, the last iteration of the loop will fail. Reading data[dataPtr + 1] or data[dataPtr + 2] will fall outside of the array, and even with unchecked, this causes a faillure

@Amxx
Copy link
Collaborator

Amxx commented Dec 21, 2021

Here is a working solidity implementation. Even with optimisation is is SIGNIFICANTLY more expensive to run:

library Base64V2 {
    bytes private constant TABLE = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";

        function encode(bytes memory data) public pure returns (string memory) {
        if (data.length == 0) return "";

        bytes memory table     = TABLE;
        bytes memory result    = new bytes(4 * ((data.length + 2) / 3));
        uint256      resultPtr = 0;

        for (uint256 dataPtr = 0; dataPtr < data.length; dataPtr += 3) {
            uint24 chunk = (                            (uint24(uint8(data[dataPtr + 0])) << 16))
                         + (dataPtr + 1 < data.length ? (uint24(uint8(data[dataPtr + 1])) <<  8) : 0)
                         + (dataPtr + 2 < data.length ? (uint24(uint8(data[dataPtr + 2]))      ) : 0);
            
            result[resultPtr++] = table[uint8(chunk >> 18) & 0x3f];
            result[resultPtr++] = table[uint8(chunk >> 12) & 0x3f];
            result[resultPtr++] = table[uint8(chunk >>  6) & 0x3f];
            result[resultPtr++] = table[uint8(chunk      ) & 0x3f];
        }

        if (data.length % 3 == 1) {
            result[--resultPtr] = 0x3d;
            result[--resultPtr] = 0x3d;
        }
        else if (data.length % 3 == 2) {
            result[--resultPtr] = 0x3d;
        }

        return (string(result));
    }
}

@Amxx
Copy link
Collaborator

Amxx commented Dec 21, 2021

The bytecode for the solidity version is 77.2% larger. This increases the deployment cost by 156k gas.

I'll go for the assembly any days. Its actually pretty clear / simple when you known of base64 encoding works

uint256 encodedLen = 4 * ((data.length + 2) / 3);

// Add some extra buffer at the end required for the writing
string memory result = new string(encodedLen + 32);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need the additional space for ?
the padding equals already fit into the encodedLen

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's because the code operates on whole 32-byte slots rather than single bytes. It uses mstore instead of mstore8 so when setting the value for a byte it also writes over 31 bytes after it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then lets use mstore8 !

@cameel
Copy link
Contributor

cameel commented Dec 21, 2021

I'm afraid to say that @cameel's implementation is not correct.

Thanks for fixing it. Like I said, my version was not tested at all. I was mostly interested in a rough comparison of the generated bytecode to get a good picture of whether going to assembly here is worth it here or not and for that purpose it was good enough.

Even with optimisation is is SIGNIFICANTLY more expensive to run:

Some of it is because your version does not have the unchecked block. It's not something I'd recommend in general but might be a better middle-ground before going straight to assembly.

But yeah, it's not the only thing that makes it more expensive. I'm pretty disappointed by stuff that was not optimized out too. In the current version of the compiler the Solidity code is much more expensive. Just wanted to let you know that many of the missed optimization opportunities here are things we're actively working on right now so it's not going to stay like this indefinitely.

Amxx
Amxx previously approved these changes Dec 21, 2021
Amxx
Amxx previously approved these changes Dec 21, 2021
@Amxx Amxx mentioned this pull request Dec 21, 2021
@Drjacky
Copy link

Drjacky commented Dec 29, 2021

Are we still need something to get this PR merged? 🤔

@Amxx
Copy link
Collaborator

Amxx commented Dec 29, 2021

@Drjacky We still need some internal review. This should make it into 4.5

Copy link
Contributor

@frangio frangio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me.

@ernestognw
Copy link
Member Author

Hello @ernestognw, thank you very much for this PR, and sorry we didn't answer it sooner.

I'm happy it's getting merged soon. To be honest, the optimization discussion went out of my scope, so thanks @cameel for providing extra examples and doing some tests.

Let me know if there's anything else I can do. Otherwise here's a lovely meme:

simpsons-leonardo-nimoy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature: Include Base64 library in utils
6 participants