Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for TruffleRuby in the gem #35

Merged
merged 8 commits into from
May 1, 2022
Merged

Conversation

eregon
Copy link
Member

@eregon eregon commented Apr 21, 2022

  • This is necessary since lib/strscan.rb was added (Import JRuby's strscan #25), as the
    TruffleRuby stdlib strscan.rb is no longer found first by require "strscan".
  • Write conditions in a way that any other Ruby implementation would
    simply use its stdlib until it is added explicit support in this gem.

Fixes oracle/truffleruby#2420

eregon added 3 commits April 21, 2022 17:28
* This is necessary since lib/strscan.rb was added (ruby#25), as the
  TruffleRuby stdlib strscan.rb is no longer found first by `require "strscan"`.
* Write conditions in a way that any other Ruby implementation would
  simply use its stdlib until it is added explicit support in this gem.
@eregon eregon requested review from hsbt and kou April 21, 2022 16:08
This was referenced Apr 21, 2022
@eregon
Copy link
Member Author

eregon commented Apr 21, 2022

FYI, this is the strscan implementation in TruffleRuby: https://github.com/oracle/truffleruby/blob/master/lib/truffle/strscan.rb
So it's mostly Ruby code + some Primitive calls and some internal helpers.
I don't think it is a good idea to move that code in this gem (at this time at least), because e.g. the TruffleRuby team wants the freedom to rename these primitives, rename internal helpers like StringValue/match_in_region, refactor it, optimize it, use new internals, etc, without breaking other TruffleRuby releases.
If it was pure-Ruby, no Primitives, no internal helpers, then it would make sense to move it here but it's not the case.

@headius
Copy link
Contributor

headius commented Apr 21, 2022

Why is the special case for TruffleRuby not guarded with RUBY_ENGINE == "truffleruby"?

The default case for this gem is the "ruby" engine (CRuby), which should build and load the C extension. JRuby is an exception because we load a jar file and then boot the extension class. TruffleRuby is an exception because it is not directly supported by this gem and this gem's code will not be loaded.

Only the exceptions should be guarded, falling back on the default (CRuby, engine == "ruby").

@headius
Copy link
Contributor

headius commented Apr 21, 2022

I should also mention, strscan.rb was explicitly excluded from the gem because its exception only applied to JRuby (platform = "java"). This will mean CRuby starts shipping both strscan.rb and strscan.so whenever this is released.

@headius
Copy link
Contributor

headius commented Apr 21, 2022

It is also worth pointing out that this change makes the strscan gem basically a no-op on TruffleRuby without anyone knowing. Users installing new versions of strscan to get new features or bug fixes (or security fixes, should that be necessary) will not get those changes and will not be informed why that is the case.

I understand you do not want to ship your strscan.rb outside of TruffleRuby, but perhaps there's a better alternative than making this gem silently do nothing at all when running on TruffleRuby.

@eregon
Copy link
Member Author

eregon commented Apr 21, 2022

Why is the special case for TruffleRuby not guarded with RUBY_ENGINE == "truffleruby"?

It's written in the PR description, and explained in more details here: https://bugs.ruby-lang.org/issues/18567#note-30.
Suppose there is another Ruby implementation, the logic I added will just work while they have strscan in their stdlib (as they had before, since it was not a gem with its own repo before).
Once/if they want to move their code for strscan here, they can of course do it, but until then it just works instead of breaking.
This is a general theme for my PRs adding TruffleRuby support, as much as possible I try to write code that works for any Ruby implementation vs just fixing for TruffleRuby.

I don't think any argument could counter the advantage of it works instead of breaking for non-CRuby implementations for new default extension gems. Hence the "else" case must be "any other Ruby implementation".

Regarding keeping strscan.rb in the truffleruby repository, ultimately this is my decision and I ask you to respect it. I already explained my reasoning and while I understand your arguments it does not change my decision (security issue is extremely unlikely for that Ruby code, the API is fairly stable, etc).

It is also worth pointing out that this change makes the strscan gem basically a no-op on TruffleRuby without anyone knowing.

I think users care little about that. They care that it works and that it's secure, and in fact TruffleRuby strscan.rb is more secure than a C extension.

@headius
Copy link
Contributor

headius commented Apr 21, 2022

I don't think any argument could counter the advantage of it works instead of breaking for non-CRuby implementations

Except that this has been the standard for all gems with C extensions due to the default target being the "ruby" engine using C extensions. This changes the fallback case from the C extension to "just try to load it from stdlib and pray". I cannot speak for the other gem maintainers as to whether this is acceptable.]

Regarding keeping strscan.rb in the truffleruby repository, ultimately this is my decision and I ask you to respect it.

I said I understand that you don't want to ship it. That's your prerogative, but it require adding another exception to the current default of using a C extension.

I think users care little about that.

I can speak from experience that when users install updated gems, expecting updated functionality, and then those gems don't actually update anything, they get confused and file issues. Those issues may end up in this repository, and we will have to explain that the strscan gem doesn't really ship support for TruffleRuby, and upgrading the gem will not upgrade strscan functionality in TruffleRuby.

Perhaps we can at least add an install-time message pointing out that the strscan gem does not actually ship any code for "any other Ruby implementation" and just passes through to stdlib?

@headius
Copy link
Contributor

headius commented Apr 21, 2022

Another point to discuss: you pend some tests that are not supported by TruffleRuby's strscan, but you are masking them in this repository. Any time a new feature is added or a bug is fixed here, the related tests will have to be masked until TruffleRuby ships its updated support.

JRuby currently passes all tests except for one on Windows that I will fix on the JRuby side before release. Any new bug fixes or features will be added in parallel to the JRuby extension.

@headius
Copy link
Contributor

headius commented Apr 21, 2022

I also want to make clear I'm not opposed to this pull request, but it is adding several exceptional changes specifically to allow truffle ruby to keep its implementation out of the gem. That's the decision that you have made, and I sympathize with the challenge of getting these standard gems to work properly. My concerns are primarily that the gem will now silently do nothing on implementations other than JRuby and CRuby, and we may have to mask out tests for future changes just for TruffleRuby.

@eregon
Copy link
Member Author

eregon commented Apr 21, 2022

"just try to load it from stdlib and pray"

You make it sound like this is unreliable but absolutely not, it's fully supported. That is exactly what happens if there is no other version of the gem installed, and the shipped-with-myruby in-stdlib default gem is used, it behaves exactly like a stdlib, and that will very often be the case for strscan.
I'd bet the vast majority of usages of require 'strscan' will never add strscan to their Gemfile, and IMHO for good reason, it avoids downloading&installing a newer/duplicate version when the stdlib version works just fine.

I can speak from experience that when users install updated gems, expecting updated functionality, and then those gems don't actually update anything, they get confused and file issues.

It sounds rather unlikely to me for strscan, but it is indeed a possibility. In general TruffleRuby stays pretty close feature-wise to such gems which delegate to stdlib, and I already have experience with FFI for this.
I will watch this repo to handle any TruffleRuby-related issue, I think that mostly handles that concern.

you pend some tests that are not supported by TruffleRuby's strscan, but you are masking them in this repository.

Seems of little concern given the few pended tests, and it's useful to actually note the differences.
Note that some of the differences are intentional, notably StringScanner#[] seems inconsistent currently on CRuby (s[:c] raises if last match was with a Regexp but doesn't if last match was with a String).
Again I have experience with FFI for this and it's no big problem in practice.

@eregon
Copy link
Member Author

eregon commented Apr 21, 2022

My concerns are primarily that the gem will now silently do nothing on implementations other than JRuby and CRuby

Instead of not working at all, or by luck already using the stdlib version (if Gemfile.lock version = default gem/stdlib version, or if strscan.rb takes precedence over strscan.so when there was no strscan.rb in this gem), so from that point of view it can't be any worse.

strscan.gemspec Outdated
@@ -24,7 +24,7 @@ Gem::Specification.new do |s|
s.files = %w{lib/strscan.jar lib/strscan.rb}
s.platform = "java"
else
s.files = %w{ext/strscan/extconf.rb ext/strscan/strscan.c}
s.files = %w{ext/strscan/extconf.rb ext/strscan/strscan.c lib/strscan.rb}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know why we need this.
If we don't install lib/strscan.rb, strscan.rb in TruffleRuby will be loaded. Because there are no files that match require "strscan" in the installed gem.

Copy link
Member Author

@eregon eregon Apr 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One practical reason is otherwise bundle exec rake test can't work without this (because the file exists in the repo, regardless of the current gem platform).
If the tests were actually done only on an installed gem (would be good, but maybe as an extra CI step) that wouldn't be an issue, but that's uncommon and arguably impractical in development.

I think it is good to have thisgem/lib/strscan.rb in $LOADED_FEATURES, as a way to show the gem was loaded (and for TruffleRuby then delegated to stdlib). In fact just looking at $LOADED_FEATURES shows what happened, which I find a very good thing. Example:

$ ruby -e 'require "strscan"; puts $" 
...
/home/eregon/.rubies/truffleruby-dev/lib/mri/rubygems/path_support.rb
/home/eregon/.rubies/truffleruby-dev/lib/truffle/strscan.rb
/home/eregon/.rubies/truffleruby-dev/lib/gems/gems/strscan-3.0.3/lib/strscan.rb

Another reason is I feel is it's extremely confusing to only include some lib/*.rb on some gem platforms and not on others, I think nobody expects that and I'd think it also wouldn't work e.g. with gem 'strscan', github: 'ruby/strscan' or gem 'strscan', path: '...' in a Gemfile.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a CI step to run tests on the installed gem, that seems a really good addition because it will be as close as we can to an actual usage of the gem.
That works fine except for 2.4 which doesn't have strscan as a default gem, so just skipped on 2.4.
(we can't use bundle exec ruby run-test.rb for 2.4 as that would add $PWD/lib to $LOAD_PATH and not load the gem, due to I think the gemspec in Gemfile)

Copy link
Member Author

@eregon eregon Apr 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One practical reason is otherwise bundle exec rake test can't work without this (because the file exists in the repo, regardless of the current gem platform).

One precision, for bundle exec rake test we need changes to lib/strscan.rb (as in this PR).
That itself doesn't force us to add to lib/strscan.rb so s.files.
But I find it very dangerous and brittle to have such a large difference between testing the gem locally with bundle exec rake test and an actual real world usage of the gem. lib/strscan.rb should exist in both or neither, otherwise we're testing completely different things, and the other points I mentioned above apply as well.

Also this would be required for a Ruby implementation which wants to delegate to the strscan.so in stdlib (i.e., doesn't implement in a strscan.rb file but some strscan.someotherext).

Copy link
Member

@kou kou Apr 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the tests were actually done only on an installed gem (would be good, but maybe as an extra CI step) that wouldn't be an issue, but that's uncommon and arguably impractical in development.

csv gem does this https://github.com/ruby/csv/blob/master/.github/workflows/test.yml#L74 because it can detect packaging problem. And it's useful to detect packaging problem before we release a gem.

I think it is good to have thisgem/lib/strscan.rb in $LOADED_FEATURES, as a way to show the gem was loaded (and for TruffleRuby then delegated to stdlib).

Why is the information needed? It will just confuse users: "I want a new feature introduced by X.Y.Z but can't use the feature even when strscan X.Y.Z is loaded." Because TruffleRuby's bulit-in strscan may not be implemented the new feature yet. If thisgem/lib/strscan.rb isn't in $LOADED_FEATURES, users can think that "I can't use the new feature because strscan X.Y.Z isn't used".

Another reason is I feel is it's extremely confusing to only include some lib/*.rb on some gem platforms and not on others, I think nobody expects that and I'd think it also wouldn't work e.g. with gem 'strscan', github: 'ruby/strscan' or gem 'strscan', path: '...' in a Gemfile.

I don't think so. (BTW, I feel that it's extremely confusing gem "strscan", "= X.Y.Z" is just ignored on TruffleRuby.)
I think that nobody doesn't care about it.

gem 'strscan', path: '...' doesn't work only on TruffleRuby. But I don't think it's a problem because TruffleRuby users don't need to use it. TruffleRuby always uses built-in strscan.

How about moving lib/strscan.rb to lib/jruby/strscan.rb and using strscan.rb only with JRuby? a2b7f6f

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

csv gem does this https://github.com/ruby/csv/blob/master/.github/workflows/test.yml#L74 because it can detect packaging problem. And it's useful to detect packaging problem before we release a gem.

That looks good, it's very similar to what I did but has a stronger guarantee the lib/ isn't accidentally on $LOAD_PATH for that step.

Why is the information needed? [...] users can think that "I can't use the new feature because strscan X.Y.Z isn't used".

Because users/developers might want to ensure the strscan gem is loaded/activated (e.g. I and other TruffleRuby devs certainly want to see that, it's very helpful for debugging what gets loaded, and e.g., the puts $".grep(/\/strscan\./) output would be unclear without this), otherwise they might think it's a bug of Bundler or of their usage (but it's not). The fact strscan-1.x.y/lib/strscan.rb is loaded, as well as truffleruby/lib/truffle/strscan.rb and not strscan.so should be pretty clear that the used code is from the truffleruby stdlib. I think this is by far the clearest result for everyone.
Not having a single file from the gem in $LOADED_FEATURES sounds like the gem was not used/activated at all, even though that's not true.
We can't see the gem in $LOAD_PATH (except by debugging in lib/strscan.rb), even though it was added to $LOAD_PATH, but we need to remove from $LOAD_PATH to actually load the stdlib. So at least we should preserve the trace that the gem was used/activated in $LOADED_FEATURES.

BTW, I feel that it's extremely confusing gem "strscan", "= X.Y.Z" is just ignored on TruffleRuby.

It's not ignored with the lib/strscan.rb, and we'll see strscan-X.Y.Z is in $", that's part of the point :)
The behavior might not be completely the same, that's anyway true for any version due to a different implementation of StringScanner (and it's also true for the Java extension in this gem, unless you also modify it accordingly for every change).
I think users can quickly figure out if e.g. they use a new method and it works on CRuby but not on TruffleRuby it must TruffleRuby's fault. And even in case they file an issue here I'll handle it.

How about moving lib/strscan.rb to lib/jruby/strscan.rb and using strscan.rb only with JRuby? a2b7f6f

I dislike that, it's more magic, more difference between local development and the packaged gem (this feels hacky to me), and it's brittle. For instance that would break if at some point TruffleRuby (or some other Ruby implementation) decides to implement some of StringScanner in a C extension in stdlib, because then the "wrong" strscan.so could get loaded by require "strscan.so" from truffleruby's strscan.rb (require_relative could help, but necessarily easy as .so are not always siblings of .rb).
Or if TruffleRuby (or some other Ruby implementation) implements all of StringScanner in some strscan.so or strscan.someextensionext then it would also break.
Or if there is at some point some Ruby file like lib/strscan/foo.rb in this gem, then it could be loaded from the gem instead of truffleruby stdlib, which is bad unless considered carefully whether that's safe.

And there is a performance advantage of having a lib/strscan.rb on CRuby too, it will avoid scanning the $LOAD_PATH for every require "strscan" after the first one: https://bugs.ruby-lang.org/issues/10902. That seems a significant advantage.

Ultimately if this is the approach you decide to take, then go ahead and I'll adapt the PR. However, if that approach is used I'm pretty sure I (and likely others too) will waste time to debug the magic of why there is no trace of the gem loaded in $", why local dev is so different than the real gem, etc. I think always having lib/strscan.rb is respecting the simplicity principle of programming.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not having a single file from the gem in $LOADED_FEATURES sounds like the gem was not used/activated at all, even though that's not true.

We should use Gem.loaded_specs or something, or see Gemfile.lock instead of using $LOADED_FEATURES to confirm whether strscan gem is activated or not. Because they are the master information.

We can't see the gem in $LOAD_PATH (except by debugging in lib/strscan.rb), even though it was added to $LOAD_PATH, but we need to remove from $LOAD_PATH to actually load the stdlib.

Generally, we should not modify $LOAD_PATH in a library to keep simple and easy to debug. (It's more magic. :-)

The behavior might not be completely the same, that's anyway true for any version due to a different implementation of StringScanner (and it's also true for the Java extension in this gem, unless you also modify it accordingly for every change).

We'll keep synchronizing the C implementation and Java implementation for new behavior. I think that it's one of reasons that Java implementation joins to this repository.

I dislike that, it's more magic, more difference between local development and the packaged gem (this feels hacky to me), and it's brittle.

Sorry. It's my fault. I should have write it as 390cebb#diff-188ab539bda7d5de390e7146eec51a5644a193d91e445103f86f4735eaaf0b9eR6 to align the configuration in .gemspec.

For instance that would break if at some point TruffleRuby (or some other Ruby implementation) decides to implement some of StringScanner in a C extension in stdlib, because then the "wrong" strscan.so could get loaded by require "strscan.so" from truffleruby's strscan.rb (require_relative could help, but necessarily easy as .so are not always siblings of .rb).

I can't understand this situation. Who does generate the "wrong" strscan.so? I think that this gem doesn't generate strscan.so with TruffleRuby with your extconf.rb change.

Or if TruffleRuby (or some other Ruby implementation) implements all of StringScanner in some strscan.so or strscan.someextensionext then it would also break.

ditto.

Or if there is at some point some Ruby file like lib/strscan/foo.rb in this gem, then it could be loaded from the gem instead of truffleruby stdlib, which is bad unless considered carefully whether that's safe.

CI will (should) detect it as a failure.

And there is a performance advantage of having a lib/strscan.rb on CRuby too, it will avoid scanning the $LOAD_PATH for every require "strscan" after the first one: https://bugs.ruby-lang.org/issues/10902. That seems a significant advantage.

It seems that it's fixed in all maintained CRuby. So it doesn't affect to this case.

Ultimately if this is the approach you decide to take, then go ahead and I'll adapt the PR. However, if that approach is used I'm pretty sure I (and likely others too) will waste time to debug the magic of why there is no trace of the gem loaded in $", why local dev is so different than the real gem, etc. I think always having lib/strscan.rb is respecting the simplicity principle of programming.

Using $" is a wrong approach to confirm loaded gems as I said above. And local dev isn't difference with the real gem. Both of them doesn't have lib/strscan.rb with lib/jruby/strscan.rb approach.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't understand this situation. Who does generate the "wrong" strscan.so? I think that this gem doesn't generate strscan.so with TruffleRuby with your extconf.rb change.

You're right, since this gem won't generate strscan.so on truffleruby (with this PR) that should no longer be an issue.

CI will (should) detect it as a failure.

That's a bit hopeful, because it might appear to work but some part might not, but it's probably unlikely to have a lib/strscan/foo.rb.

Both of them doesn't have lib/strscan.rb with lib/jruby/strscan.rb approach.

Alright, at least that's consistent between the gem and local dev. I'll cherry-pick your commit here and adapt the PR for it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kou All done now, could you re-review?

test/strscan/test_stringscanner.rb Outdated Show resolved Hide resolved
run-test.rb Outdated Show resolved Hide resolved
eregon added 3 commits April 22, 2022 12:07
* It is less noisy in the test output, and such skipped tests are not issues of this gem.
@kou kou merged commit 4cdb3ee into ruby:master May 1, 2022
@kou
Copy link
Member

kou commented May 1, 2022

Thanks!

@headius
Copy link
Contributor

headius commented May 10, 2022

Sorry folks, I should have caught this but the move of lib/strscan.rb to lib/jruby/strscan.rb breaks this gem on JRuby 9.4. It was hidden (appeared to pass) because we accidentally had included strscan.rb in the lib dir, so it still loads ok.

I will open an issue and we will need to get strscan.rb back into lib, because that is how we load the extension.

@headius
Copy link
Contributor

headius commented May 10, 2022

You know what, I made a mistake. See #38.

kou pushed a commit that referenced this pull request May 10, 2022
The strscan.rb file is only needed by JRuby, which does not have an
extension file format that loads itself. As noted in #35 having this
file present on TruffleRuby prevents their built-in strscan.rb from
loading, so the file was moved to lib/jruby/strscan.rb with lib/jruby
added as a require path. However this results in both lib/strscan.rb
and lib/jruby/strscan.rb getting copied into JRuby's stdlib when the
default gem is installed, leading to the confusing situation in #38.

This PR moves the jruby-specific strscan.rb under the jruby extension,
renaming that directory to be explicitly for JRuby, and adding that
new ext/jruby/lib path to the require paths so there's no nesting of
require paths to copy files multiple times.

* ext/java => ext/jruby
* lib/jruby/strscan.rb => ext/jruby/lib/strscan.rb
* Changes to Rakefile and gemspec for new locations

Fixes #38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

require "strscan.so" uses the strscan gem and does not work
3 participants