Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StringLowering #6271

Merged
merged 47 commits into from
Feb 5, 2024
Merged

StringLowering #6271

merged 47 commits into from
Feb 5, 2024

Conversation

kripken
Copy link
Member

@kripken kripken commented Feb 2, 2024

This extends StringGathering by replacing the gathered string globals to imported
globals. It adds a custom section with the strings that the imports are expected to
provide. It also replaces the string type with extern.

This is a complete lowering of strings, except for string operations that are a TODO.

After running this, no strings remain in the wasm, and the outside JS is expected
to provide the proper imports, which it can do by processing the JSON of the
strings in the custom section "string.consts", which looks like

["foo", "bar", ..]

That is, an array of strings, which are imported as

(import "string.const" "0" (global $string.const_foo (ref extern))) ;; foo
(import "string.const" "1" (global $string.const_bar (ref extern))) ;; bar
...

cc @gkdn This is the pass that you will want to run manually at the end of the
pipeline, to get imported strings.

@kripken kripken requested a review from tlively February 2, 2024 00:41
@tlively
Copy link
Member

tlively commented Feb 3, 2024

Would you be open to treating this custom section format as a POC? There are some changes that I think we should consider for the productionized version, but that shouldn't have to block experimentation.

importIndex++;
global->init = nullptr;

auto str = json::Value::make(std::string(c->string.str).c_str());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use string_view rather than c_str to properly handle nul bytes. Maybe we can fix the json API in a separate PR?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's worth fixing separately, I agree.


void updateTypes(Module* module) {
TypeMapper::TypeUpdates updates;
updates[HeapType::string] = HeapType::ext;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a module ever uses the fact that string <: any, this is going to cause problems, but that's probably ok for experimentation.

Copy link
Member Author

@kripken kripken Feb 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. If that is ever an issue it seems like we'd need to internalize/externalize in a lot of places...?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is why I think we should make our internal string type be a subtype of extern instead of any, but of course that would break compatibility with the stringref proposal. This is a change we should make once we don't need to support stringref directly anymore and can get away with only supporting imported strings. Hopefully that will be soon?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, yeah, that could help. But implementing a not-quite-stringref has downsides too. I'm not sure what's best in the long term, but we don't need to decide now.

@kripken
Copy link
Member Author

kripken commented Feb 5, 2024

Would you be open to treating this custom section format as a POC? There are some changes that I think we should consider for the productionized version, but that shouldn't have to block experimentation.

Definitely, yeah, this is just a first version to start iterating.

The format here is iirc the best idea @jakobkummerow had in the past, and I can't think of anything better - the section is just JSON that JS can quickly parse into an array of strings and then provide as a single import to wasm. But as we experiment we can explore more.

@kripken
Copy link
Member Author

kripken commented Feb 5, 2024

@gkdn does this look good as a first step?

Also please let me know which string instructions you use, that I should implement lowering for (the full stringref proposal is large, but I'm sure we need only a small part of it).

@gkdn
Copy link
Contributor

gkdn commented Feb 5, 2024

Seems reasonable. I won't be able to test it out until we have the instruction lowering in place since types are converted to externref but can try it when we have other parts in place.

The instructions we are currently using:
string.as_wtf16
string.from_code_point
string.new_wtf16_array
string.encode_wtf16_array
string.compare
string.eq
stringview_wtf16.length
stringview_wtf16.get_codeunit
stringview_wtf16.slice

@kripken
Copy link
Member Author

kripken commented Feb 5, 2024

Sounds good, thanks @gkdn , landing.

@kripken kripken merged commit d490318 into WebAssembly:main Feb 5, 2024
13 checks passed
@kripken kripken deleted the string.lowering branch February 5, 2024 23:50
kripken added a commit that referenced this pull request Feb 8, 2024
radekdoulik pushed a commit to dotnet/binaryen that referenced this pull request Jul 12, 2024
This extends StringGathering by replacing the gathered string globals to imported
globals. It adds a custom section with the strings that the imports are expected to
provide. It also replaces the string type with extern.

This is a complete lowering of strings, except for string operations that are a TODO.

After running this, no strings remain in the wasm, and the outside JS is expected
to provide the proper imports, which it can do by processing the JSON of the
strings in the custom section "string.consts", which looks like

["foo", "bar", ..]

That is, an array of strings, which are imported as

(import "string.const" "0" (global $string.const_foo (ref extern))) ;; foo
(import "string.const" "1" (global $string.const_bar (ref extern))) ;; bar
radekdoulik pushed a commit to dotnet/binaryen that referenced this pull request Jul 12, 2024
@gkdn gkdn mentioned this pull request Aug 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants