Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a timeout in CI and enable verbose tests for truffleruby #217

Merged

Conversation

eregon
Copy link
Contributor

@eregon eregon commented Mar 11, 2024

@eregon
Copy link
Contributor Author

eregon commented Mar 11, 2024

So https://github.com/Shopify/liquid-c/actions/runs/8235861511/job/22520963167?pr=217 failed after:

...
TokenizerTest#test_tokenize_for_liquid_tag = 0.00 s = .
TokenizerTest#test_non_utf8_compatible_source = 0.00 s = .
Error: The operation was canceled.

Unfortunately we don't see the started & hanging test due to I guess some flushing issue of GitHub Actions on cancellation (TruffleRuby does not buffer writes so that cannot be the issue).

@eregon
Copy link
Contributor Author

eregon commented Mar 11, 2024

Running locally on truffleruby-dev, the test that hangs seems to be various tests of VariableTest like:

VariableTest#test_write_fixnum =
VariableTest#test_filter_with_variable_arg_after_const_arg =
VariableTest#test_filter_with_const_arg =

@eregon eregon force-pushed the ci-tweaks-investigate-truffleruby-hang branch from 4cb5322 to 1defef0 Compare March 11, 2024 16:22
@eregon
Copy link
Contributor Author

eregon commented Mar 11, 2024

I have tried TEST_OPTS="-v -s0" bundle exec ruby -Ilib -Itest test/unit/variable_test.rb and that reproduces, to get a stacktrace.
But unfortunately that shows a bunch of threads because apparently minitest/parallel.rb gets used. Found MT_CPU=0/MT_CPU=1 to disable that.
Got a more useful stacktrace now, will take a look later.

@eregon
Copy link
Contributor Author

eregon commented Mar 11, 2024

Attaching with gdb --pid $PID I see:

86	        } while (new_size > capacity);
(gdb) bt
#0  write_fixnum (output=841328705388478728, fixnum=5) at ../../../../ext/liquid_c/liquid_vm.c:86
#1  0x00007f061cc2396e in write_obj (output=841328705388478728, obj=5) at ../../../../ext/liquid_c/liquid_vm.c:115
#2  0x00007f061cc239e5 in write_obj (output=841328705388478728, obj=841328705388478808) at ../../../../ext/liquid_c/liquid_vm.c:127
#3  0x00007f061cc2477a in vm_render_until_error (uncast_args=140734667024768) at ../../../../ext/liquid_c/liquid_vm.c:426
#4  0x00007f061f018f52 in rb_tr_setjmp_wrapper_pointer1_to_pointer (func=0x7f061cc23ed3 <vm_render_until_error>, arg1=140734667024768) at trampoline.c:225
#5  0x00007f062f777056 in ffi_call_unix64 () at ../libffi-3.4.4/src/x86/unix64.S:104
#6  0x00007f062f7758ec in ffi_call_int (cif=cif@entry=0x55e5558ef160, fn=<optimized out>, rvalue=<optimized out>, avalue=<optimized out>, closure=closure@entry=0x0) at ../libffi-3.4.4/src/x86/ffi64.c:673
#7  0x00007f062f776463 in ffi_call (cif=0x55e5558ef160, fn=<optimized out>, rvalue=<optimized out>, avalue=<optimized out>) at ../libffi-3.4.4/src/x86/ffi64.c:710
#8  0x00007f0629865235 in ?? () from /home/eregon/.rubies/truffleruby-dev/bin/../lib/librubyvm.so
#9  0x00000000558e1f00 in ?? ()
#10 0x000055e5558ef160 in ?? ()
#11 0x00007f061f018e96 in rb_tr_setjmp_wrapper_pointer2_int_pointer2_to_pointer (func=0x7b431aaf8, yielded_arg=139664266727062, callback_arg=139662636253920, argc=0, argv=0x7f05b431ab40, blockarg=139662343549288) at trampoline.c:215
#12 0x00007f05bdd28360 in ?? ()
#13 0x000055e55591d4c0 in ?? ()
#14 0x00007f0638590960 in ?? () from /lib64/libc.so.6
#15 0x000055e5558e1fd8 in ?? ()
#16 0x00007f0629865e3b in ?? () from /home/eregon/.rubies/truffleruby-dev/bin/../lib/librubyvm.so

So it sounds like an unbounded loop while in the C extension.

(gdb) p new_size 
$1 = 1
(gdb) p capacity
$2 = 0

@eregon
Copy link
Contributor Author

eregon commented Mar 11, 2024

My guess is oracle/truffleruby@b7cb26e is what broke this.
OTOH it should never have worked in the first place.
The question is whether rb_str_capacity(str) can return 0 or not.
If the String in question is e.g. "" allocated in Ruby then its capacity is truly 0 from truffleruby's POV, one will need to reallocate to append/write anything to it.

@peterzhu2118
Copy link
Member

The question is whether rb_str_capacity(str) can return 0 or not.

I agree, this is a bug in liquid-c. This should be pretty simple to fix, can you send a PR?

@eregon
Copy link
Contributor Author

eregon commented Mar 11, 2024

Confirmed that

  describe "rb_str_capacity" do
    it "returns 0 for empty String" do
      @s.rb_str_capacity("").should == 0
    end

    it "returns 1 for 'a'" do
      @s.rb_str_capacity("a").should == 1
    end
  end

passes on truffleruby but "fails" on CRuby which returns 15 for both cases (due to embedded String).

@eregon
Copy link
Contributor Author

eregon commented Mar 11, 2024

@peterzhu2118 I added a commit to this PR, could you review it?
I think it's good to include the timeout and verbose tests anyway.

@peterzhu2118
Copy link
Member

Thank you for debugging and the fix!

@peterzhu2118 peterzhu2118 merged commit 5a786af into Shopify:main Mar 11, 2024
9 checks passed
@eregon eregon deleted the ci-tweaks-investigate-truffleruby-hang branch March 11, 2024 21:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants