`correctness_saturating_casts` seem to be failing on i386 #8532

LebedevRI · 2024-12-20T16:18:30Z

Is i386 just no longer supported globally?

      Test #301: correctness_saturating_casts ...............................Subprocess aborted***Exception:  28.13 sec
correctness_saturating_casts: /build/reproducible-path/halide-19.0.0/test/correctness/saturating_casts.cpp:130: void test_saturating() [source_t = int, target_t = float]: Assertion `result(i) == correct_result' failed.

        Start 301: correctness_saturating_casts
608/623 Test #748: tutorial_lesson_16_rgb_run .................................   Passed   18.31 sec
609/623 Test #742: tutorial_lesson_11_cross_compilation .......................   Passed   21.04 sec
610/623 Test #726: python_tutorial_lesson_11_cross_compilation ................   Passed   27.34 sec
611/623 Test #309: correctness_simd_op_check_arm ..............................   Passed   79.06 sec
612/623 Test #315: correctness_simd_op_check_x86 ..............................   Passed   82.19 sec
613/623 Test #204: correctness_image_io .......................................   Passed   95.13 sec
      Test #301: correctness_saturating_casts ...............................Subprocess aborted***Exception:  20.16 sec
correctness_saturating_casts: /build/reproducible-path/halide-19.0.0/test/correctness/saturating_casts.cpp:130: void test_saturating() [source_t = int, target_t = float]: Assertion `result(i) == correct_result' failed.

        Start 301: correctness_saturating_casts
614/623 Test #250: correctness_mul_div_mod ....................................   Passed   97.15 sec
615/623 Test  #50: correctness_boundary_conditions ............................   Passed  111.10 sec
616/623 Test #238: correctness_lossless_cast ..................................   Passed  103.11 sec
      Test #301: correctness_saturating_casts ...............................Subprocess aborted***Exception:   9.46 sec
correctness_saturating_casts: /build/reproducible-path/halide-19.0.0/test/correctness/saturating_casts.cpp:130: void test_saturating() [source_t = int, target_t = float]: Assertion `result(i) == correct_result' failed.

        Start 301: correctness_saturating_casts
617/623 Test #310: correctness_simd_op_check_hvx ..............................   Passed  102.14 sec
      Test #301: correctness_saturating_casts ...............................Subprocess aborted***Exception:   6.00 sec
correctness_saturating_casts: /build/reproducible-path/halide-19.0.0/test/correctness/saturating_casts.cpp:130: void test_saturating() [source_t = int, target_t = float]: Assertion `result(i) == correct_result' failed.

619/623 Test #754: tutorial_lesson_22_jit_performance .........................   Passed   57.43 sec
620/623 Test #696: python_correctness_boundary_conditions .....................   Passed   76.35 sec
621/623 Test #746: tutorial_lesson_15_build_gens ..............................   Passed  134.53 sec
        Start 747: tutorial_lesson_15_check_files
622/623 Test #747: tutorial_lesson_15_check_files .............................   Passed    0.01 sec
623/623 Test  #94: correctness_cross_compilation ..............................   Passed  141.90 sec

99% tests passed, 1 tests failed out of 623

Label Time Summary:
correctness              = 2066.76 sec*proc (351 tests)
correctness_multi_gpu    =   0.04 sec*proc (1 test)
error                    = 111.40 sec*proc (130 tests)
generator                =   9.90 sec*proc (67 tests)
internal                 =   1.34 sec*proc (1 test)
python                   = 326.17 sec*proc (42 tests)
runtime_internal         =   0.35 sec*proc (6 tests)
tutorial                 = 274.96 sec*proc (18 tests)
warning                  =  13.10 sec*proc (5 tests)

Total Test time (real) = 143.18 sec

The following tests did not run:
         36 - correctness_multi_gpu_gpu_multi_device (Skipped)
         39 - correctness_async_device_copy (Skipped)
         97 - correctness_cuda_8_bit_dot_product (Skipped)
        100 - correctness_custom_cuda_context (Skipped)
        110 - correctness_device_buffer_copies_with_profile (Skipped)
        111 - correctness_device_buffer_copy (Skipped)
        112 - correctness_device_copy_at_inner_loop (Skipped)
        113 - correctness_device_crop (Skipped)
        114 - correctness_device_slice (Skipped)
        119 - correctness_dynamic_allocation_in_gpu_kernel (Skipped)
        134 - correctness_extern_stage_on_device (Skipped)
        144 - correctness_float16_t_neon_op_check (Skipped)
        150 - correctness_fuse_gpu_threads (Skipped)
        157 - correctness_gpu_allocation_cache (Skipped)
        158 - correctness_gpu_alloc_group_profiling (Skipped)
        159 - correctness_gpu_arg_types (Skipped)
        160 - correctness_gpu_assertion_in_kernel (Skipped)
        161 - correctness_gpu_bounds_inference_failure (Skipped)
        162 - correctness_gpu_condition_lifting (Skipped)
        163 - correctness_gpu_cpu_simultaneous_read (Skipped)
        165 - correctness_gpu_different_blocks_threads_dimensions (Skipped)
        166 - correctness_gpu_dynamic_shared (Skipped)
        167 - correctness_gpu_f16_intrinsics (Skipped)
        169 - correctness_gpu_give_input_buffers_device_allocations (Skipped)
        170 - correctness_gpu_jit_explicit_copy_to_device (Skipped)
        173 - correctness_gpu_metal_completion_handler_error_check (Skipped)
        174 - correctness_gpu_mixed_dimensionality (Skipped)
        175 - correctness_gpu_mixed_shared_mem_types (Skipped)
        178 - correctness_gpu_non_monotonic_shared_mem_size (Skipped)
        182 - correctness_gpu_param_allocation (Skipped)
        183 - correctness_gpu_reuse_shared_memory (Skipped)
        184 - correctness_gpu_specialize (Skipped)
        185 - correctness_gpu_store_in_register_with_no_lanes_loop (Skipped)
        186 - correctness_gpu_sum_scan (Skipped)
        187 - correctness_gpu_texture (Skipped)
        188 - correctness_gpu_thread_barrier (Skipped)
        189 - correctness_gpu_transpose (Skipped)
        191 - correctness_gpu_vectorized_shared_memory (Skipped)
        198 - correctness_hexagon_scatter (Skipped)
        221 - correctness_invalid_gpu_loop_nests (Skipped)
        233 - correctness_load_library (Skipped)
        267 - correctness_parallel_gpu_nested (Skipped)
        294 - correctness_register_shuffle (Skipped)
        327 - correctness_specialize_to_gpu (Skipped)
        344 - correctness_target_query (Skipped)
        345 - correctness_tiled_matmul (Skipped)
        381 - correctness_vectorized_gpu_allocation (Skipped)
        478 - error_five_d_gpu_buffer (Skipped)
        624 - generator_aot_define_extern_opencl (Skipped)
        625 - generator_aotcpp_define_extern_opencl (Skipped)
        638 - generator_aot_gpu_object_lifetime (Skipped)
        639 - generator_aotcpp_gpu_object_lifetime (Skipped)
        640 - generator_aot_gpu_only (Skipped)
        641 - generator_aotcpp_gpu_only (Skipped)
        642 - generator_aot_gpu_texture (Skipped)
        643 - generator_aotcpp_gpu_texture (Skipped)
        651 - generator_aot_metal_completion_handler_override (Skipped)
        652 - generator_aotcpp_metal_completion_handler_override (Skipped)
        659 - generator_aot_opencl_runtime (Skipped)
        660 - generator_aotcpp_opencl_runtime (Skipped)

The following tests FAILED:
        301 - correctness_saturating_casts (Subprocess aborted) correctness
Errors while running CTest

The text was updated successfully, but these errors were encountered:

abadams · 2024-12-20T16:37:27Z

x86-32 in general is supposed to work, and we have buildbots for it, though there might be testing holes. I can't think off the top of my head what might be meaningfully different between our build bot config and vanilla i386. I'll see if I can repro.

LebedevRI · 2024-12-20T16:42:20Z

Package is being built with/against LLVM19+Clang19 with ThinLTO + -O3
in i386 chroot on a normal amd64 machine, if that helps.

abadams · 2024-12-20T17:37:19Z

Looks like it's a saturating cast from int to float, which doesn't actually saturate - it's just an int to float conversion. We expect the int to float conversion inside Halide-generated code via saturating cast to be bit-exact with an int to float cast in the calling binary.

Maybe this is an 80-bit floating point issue. In my build it looks like the test binary uses x87 instructions to generate the reference, but the generated code uses an sse2 instruction. That's the mismatch I was wondering about that could potentially cause failure, but it isn't causing failure for me. But if the rounding modes between the two are out of sync, that could get the wrong answer for large ints.

I think it's reasonable to ask for a bit-exact float comparison in this situation on other architectures, but we shouldn't be risking a mismatch between x87 and sse2. What do you think of just adding this to the top of main:


#ifdef __i386__
    printf("[SKIP] Skipping test because it requires bit-exact int to float casts,\n"
           "and on i386 it is hard to guarantee that the test binary won't use x87 instructions.\n");
    return 0;
#endif

LebedevRI · 2024-12-20T17:50:55Z

Pehaps CMake should be specifying -mfpmath=sse for that test?

abadams · 2024-12-20T17:56:10Z

Can you test if that fixes it for you? I still can't actually repro.

LebedevRI · 2024-12-20T19:23:25Z

Oh duh. -mfpmath=sse is not going to work because debian-i386 is supposed to work witout sse:
https://wiki.debian.org/ArchitectureSpecificsMemo#i386-1
So i suppose that test should be guarded with something like #if !__i386__ || (__i386__ && __SSE__).

LebedevRI · 2024-12-23T00:41:41Z

I've gone with

diff --git a/test/correctness/saturating_casts.cpp b/test/correctness/saturating_casts.cpp
index 0ce1cfda7..14da87e31 100644
--- a/test/correctness/saturating_casts.cpp
+++ b/test/correctness/saturating_casts.cpp
@@ -290,6 +290,12 @@ void test_one_source() {
 }
 
 int main(int argc, char **argv) {
+#if __i386__ && !__SSE__
+    printf("[SKIP] Skipping test because it requires bit-exact int to float casts,\n"
+           "and on i386 without SSE it is hard to guarantee that the test binary won't use x87 instructions.\n");
+    return 0;
+#endif
+
     test_one_source<int8_t>();
     test_one_source<uint8_t>();
     test_one_source<int16_t>();

for now, seems to be enough to disable that test.

abadams linked a pull request Dec 23, 2024 that will close this issue

Skip test when code could be using x87 #8537

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`correctness_saturating_casts` seem to be failing on i386 #8532

`correctness_saturating_casts` seem to be failing on i386 #8532

LebedevRI commented Dec 20, 2024

abadams commented Dec 20, 2024

LebedevRI commented Dec 20, 2024 •

edited

Loading

abadams commented Dec 20, 2024 •

edited

Loading

LebedevRI commented Dec 20, 2024

abadams commented Dec 20, 2024

LebedevRI commented Dec 20, 2024

LebedevRI commented Dec 23, 2024

correctness_saturating_casts seem to be failing on i386 #8532

correctness_saturating_casts seem to be failing on i386 #8532

Comments

LebedevRI commented Dec 20, 2024

abadams commented Dec 20, 2024

LebedevRI commented Dec 20, 2024 • edited Loading

abadams commented Dec 20, 2024 • edited Loading

LebedevRI commented Dec 20, 2024

abadams commented Dec 20, 2024

LebedevRI commented Dec 20, 2024

LebedevRI commented Dec 23, 2024

`correctness_saturating_casts` seem to be failing on i386 #8532

`correctness_saturating_casts` seem to be failing on i386 #8532

LebedevRI commented Dec 20, 2024 •

edited

Loading

abadams commented Dec 20, 2024 •

edited

Loading