-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
undocumented usage of perl for cc_test #4691
Comments
Categorizing as bug, as pulling in an documented run-time dependency can cause breakage for users not otherwise using perl. The main question is if we really need perl (then we should document it as a run-time dependency of bazel) or whether an appropriate set up can be achieved with more standard tools. |
It's the actual test rules that are problematic. |
Note that this is the fallback code path, so if you make sure that your test framework generates a test.xml, that should workaround the issue. Any suggestions on what to do here? We do need to escape the characters when generating the test.xml. |
Perhaps rewrite Relatedly, I would like the ability to customize the test setup executable. One can usually use |
We discussed that, but it means that we'd unconditionally need a C++ compiler in order to run tests. We could bundle a pre-compiled binary into Bazel, but what platform would it be compiled for? For cross-compilation, you need a binary for the target platform, not the host platform. We could bundle a binary and also ship the source, but then you still need a C++ compiler in order to do cross-compilation. We could use Python, but then we require that everyone has python installed, even if they 'only' do web or C++ development. We also discussed moving the fallback codepath into Bazel (i.e., have the test.xml generation in Bazel itself). However, that's also causing problems with remote execution, where we now would force test.xml generation onto the local machine. Except, of course, if we also require that remote execution provides X for some value of X. It's not impossible to solve these problems, but there's no free lunch. For now, I'd prefer we keep doing it in shell, but maybe there's a more standard tool than perl that we can use to do the escaping? awk? sed? |
FWIW, Bazel doesn't work at all without a C++ compiler on the system:
|
That's true, and we should fix that independently of this problem. I'd rather not add another dependency on having a C++ compiler available. |
+1
I'd give another +1 if I weren't out of them already. |
python is already required as per documentation: |
@ahippler : Like with C++, that's true and we should fix that independently of this problem. In fact it'd be best if Bazel would only require toolchains that it needs for the build, i.e. if you don't build Python rules then Bazel shouldn't require a Python installation. Only exception (if you could call it an exception) should be Java: the JDK is always available because we bundle one with Bazel, because Bazel itself needs one in order to run. |
Thanks to @aehlig we now know what this command is supposed to do:
|
(Note that any solution for test-setup needs to work with remote execution - this makes it very difficult to use pre-compiled binaries, because you don't know which platform the test action will actually run on.) |
@ulfjack : True. Bazel could include a precompiled embedded binary for the host platform, and the remote execution platform author would have to provide one. Bazel would select the active one with toolchain rules. WDYT? |
Let's not forget that the tests currently depend on Bash, so a remote execution worker would have to have Bash installed anyway. |
That would make it much more difficult to change test-setup since all such changes would have to be rolled out to all remote execution systems. |
...meaning that requiring a test-setup binary would not make things worse. And we could provide a reference implementation in a GitHub repo. |
I think one option would be to fork test-setup, and have different implementations for Linux/Mac and Windows. |
Forking test-setup would also introduce the synchronization difficulty you alluded to. |
The options I see:
The second option seems to be the best tradeoff. WDYT? |
I agree. Note that I've been working on splitting test-setup into two separate steps - right now it runs the test and then generates a test.xml file if there isn't one. The way it's done is triggering a code path on Linux and MacOS that has an inherent race condition. @agoulti proposed that we split up the two parts - run the test first and then run a separate action to generate the test.xml file if the test didn't generate one. That'll fix the race condition and potentially make the test-setup script a bit simpler. |
There's some background in #4608. |
There's also the option I've raised before of rewriting |
@nlopezgi , another question: do you know how Bazel cancels a remotely running test action in case the user presses Ctrl+C? Does Bazel dispatch this to the remote service or does it just close the connection? |
re: compiler: I did not state I did not want to require it for tests, just that I was not sure its the right choice and don't want to be the one to make it w/o at least conferring with some other folks (I'll get back to you once I've confirmed). imo, if some tool is to be required for all tests, i'd rather it is the c++ compiler (instead of perl or python), but not sure what the trade-offs (effort/maintenance) between c++ vs a clever sed program would be. re: canceling a test: not sure, you'd want to ask @ola-rozenfeld about api details |
I was too optimistic with the "clever sed program". The task is to UTF-8-decode an octet-stream, test decoded characters if they fall into any of some disjoint ranges and replace them with "?", then UTF-8-encode the result again. I'm not aware of an efficient way to do this without encoding and decoding. |
You don't need to decode and re-encode - you can simply test on the utf-8 representation using seds regexp support. |
How? Does |
sed supports matching binary, and you know the utf-8 encoding, so you can check for specific utf-8 ranges, like so:
Here, I'm replacing all two-byte utf-8 sequences with a single '?' character. See https://en.wikipedia.org/wiki/UTF-8 for the multi-byte ranges. |
After a lot of trial and error, I've come up with a sed script that - I think - does what we want:
|
First, add a single white space character (' ') to the end of each line. Second, replace all (possibly empty) sequences of legal characters followed by a character with the sequence of legal characters followed by a question mark character ('?'). Third, remove the trailing question mark character ('?') from each line. |
Hats off, that's quite impressive. I think you made a couple mistakes:
Could you compare your results with mine? |
Sorry, I got confused, gimme a minute to correct this. |
This is wrong. I meant to say: you need to cover the U+80..U+7FF (two UTF-8 octets) and U+800..U+D7FF (three UTF-8 octets) ranges. The 2-octet domain is covered correctly ( |
Ok, how about this:
|
Wait, there's still one range missing. Gah! |
Ah I see where I was wrong, you are right to match 0800-CFFF and match D000-D7FF separately. |
Another try:
|
Ok, I wrote a small Java program to double-check the pattern, and it returned the expected ranges:
|
I'm glad we are free now. |
Ok, I have a patch which conflicts with my other changes to test-setup.sh. Both are a bit risky, and we need to pick one to be merged first. |
If you have changes lined up, merge those first. |
Patch is here: https://bazel-review.googlesource.com/c/bazel/+/68711 There are still a couple possible issues with this that we need to look into. My primary concern is what should happen if the default charset of the current machine is NOT UTF-8. The previous Perl solution was broken as well: it did a charset conversion from the default charset to UTF-8 on input, but also from UTF-8 to the default charset on output, which can actually re-introduce illegal characters (we might want to file a bug for that, or note it on the existing bug), breaking the resulting XML. The new code intentionally does not perform any charset conversion. Ideally, we'd do a charset conversion from the default charset to UTF-8 before running the sed script. Note that I override the LOCALE before running sed, which probably sets the default charset to ISO-8859-1 (this needs to be double-checked!).
|
(I'm on vacation for the rest of August, and won't be able to work on this until I'm back.) |
Thanks!
Please elaborate. How and where exactly are the conversions done?
Do you intend to finish this task yourself or to appoint it to someone else (and if so, whom)? |
If it's still open when I'm back, I'll do it. If someone comes in and finishes my changes, I'm happy, too. Perl does implicit charset conversion on every read and write from a file or stdin/stdout. I believe it converts from the platform default charset to UTF-8 internally. That's my reading of the docs, anyway. |
I have a pending patch. |
Fixes bazelbuild#4691. PiperOrigin-RevId: 230308181
Description of the problem / feature request:
cc_test uses a inline perl script for failed tests.
bazel/tools/test/test-setup.sh
Line 153 in eb067ea
Feature requests: what underlying problem are you trying to solve with this feature?
The usage of perl is not documented.
Windows does not have Perl installed by default.
Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.
What operating system are you running Bazel on?
Windows 10
What's the output of
bazel info release
?0.10.1
The perl script replaces invalid XML characters and invalid sequence in CDATA.
To get rid of perl bash or python could be used.
The text was updated successfully, but these errors were encountered: