Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

correctness_saturating_casts seem to be failing on i386 #8532

Open
LebedevRI opened this issue Dec 20, 2024 · 7 comments · May be fixed by #8537
Open

correctness_saturating_casts seem to be failing on i386 #8532

LebedevRI opened this issue Dec 20, 2024 · 7 comments · May be fixed by #8537

Comments

@LebedevRI
Copy link
Contributor

Is i386 just no longer supported globally?

      Test #301: correctness_saturating_casts ...............................Subprocess aborted***Exception:  28.13 sec
correctness_saturating_casts: /build/reproducible-path/halide-19.0.0/test/correctness/saturating_casts.cpp:130: void test_saturating() [source_t = int, target_t = float]: Assertion `result(i) == correct_result' failed.

        Start 301: correctness_saturating_casts
608/623 Test #748: tutorial_lesson_16_rgb_run .................................   Passed   18.31 sec
609/623 Test #742: tutorial_lesson_11_cross_compilation .......................   Passed   21.04 sec
610/623 Test #726: python_tutorial_lesson_11_cross_compilation ................   Passed   27.34 sec
611/623 Test #309: correctness_simd_op_check_arm ..............................   Passed   79.06 sec
612/623 Test #315: correctness_simd_op_check_x86 ..............................   Passed   82.19 sec
613/623 Test #204: correctness_image_io .......................................   Passed   95.13 sec
      Test #301: correctness_saturating_casts ...............................Subprocess aborted***Exception:  20.16 sec
correctness_saturating_casts: /build/reproducible-path/halide-19.0.0/test/correctness/saturating_casts.cpp:130: void test_saturating() [source_t = int, target_t = float]: Assertion `result(i) == correct_result' failed.

        Start 301: correctness_saturating_casts
614/623 Test #250: correctness_mul_div_mod ....................................   Passed   97.15 sec
615/623 Test  #50: correctness_boundary_conditions ............................   Passed  111.10 sec
616/623 Test #238: correctness_lossless_cast ..................................   Passed  103.11 sec
      Test #301: correctness_saturating_casts ...............................Subprocess aborted***Exception:   9.46 sec
correctness_saturating_casts: /build/reproducible-path/halide-19.0.0/test/correctness/saturating_casts.cpp:130: void test_saturating() [source_t = int, target_t = float]: Assertion `result(i) == correct_result' failed.

        Start 301: correctness_saturating_casts
617/623 Test #310: correctness_simd_op_check_hvx ..............................   Passed  102.14 sec
      Test #301: correctness_saturating_casts ...............................Subprocess aborted***Exception:   6.00 sec
correctness_saturating_casts: /build/reproducible-path/halide-19.0.0/test/correctness/saturating_casts.cpp:130: void test_saturating() [source_t = int, target_t = float]: Assertion `result(i) == correct_result' failed.

619/623 Test #754: tutorial_lesson_22_jit_performance .........................   Passed   57.43 sec
620/623 Test #696: python_correctness_boundary_conditions .....................   Passed   76.35 sec
621/623 Test #746: tutorial_lesson_15_build_gens ..............................   Passed  134.53 sec
        Start 747: tutorial_lesson_15_check_files
622/623 Test #747: tutorial_lesson_15_check_files .............................   Passed    0.01 sec
623/623 Test  #94: correctness_cross_compilation ..............................   Passed  141.90 sec

99% tests passed, 1 tests failed out of 623

Label Time Summary:
correctness              = 2066.76 sec*proc (351 tests)
correctness_multi_gpu    =   0.04 sec*proc (1 test)
error                    = 111.40 sec*proc (130 tests)
generator                =   9.90 sec*proc (67 tests)
internal                 =   1.34 sec*proc (1 test)
python                   = 326.17 sec*proc (42 tests)
runtime_internal         =   0.35 sec*proc (6 tests)
tutorial                 = 274.96 sec*proc (18 tests)
warning                  =  13.10 sec*proc (5 tests)

Total Test time (real) = 143.18 sec

The following tests did not run:
         36 - correctness_multi_gpu_gpu_multi_device (Skipped)
         39 - correctness_async_device_copy (Skipped)
         97 - correctness_cuda_8_bit_dot_product (Skipped)
        100 - correctness_custom_cuda_context (Skipped)
        110 - correctness_device_buffer_copies_with_profile (Skipped)
        111 - correctness_device_buffer_copy (Skipped)
        112 - correctness_device_copy_at_inner_loop (Skipped)
        113 - correctness_device_crop (Skipped)
        114 - correctness_device_slice (Skipped)
        119 - correctness_dynamic_allocation_in_gpu_kernel (Skipped)
        134 - correctness_extern_stage_on_device (Skipped)
        144 - correctness_float16_t_neon_op_check (Skipped)
        150 - correctness_fuse_gpu_threads (Skipped)
        157 - correctness_gpu_allocation_cache (Skipped)
        158 - correctness_gpu_alloc_group_profiling (Skipped)
        159 - correctness_gpu_arg_types (Skipped)
        160 - correctness_gpu_assertion_in_kernel (Skipped)
        161 - correctness_gpu_bounds_inference_failure (Skipped)
        162 - correctness_gpu_condition_lifting (Skipped)
        163 - correctness_gpu_cpu_simultaneous_read (Skipped)
        165 - correctness_gpu_different_blocks_threads_dimensions (Skipped)
        166 - correctness_gpu_dynamic_shared (Skipped)
        167 - correctness_gpu_f16_intrinsics (Skipped)
        169 - correctness_gpu_give_input_buffers_device_allocations (Skipped)
        170 - correctness_gpu_jit_explicit_copy_to_device (Skipped)
        173 - correctness_gpu_metal_completion_handler_error_check (Skipped)
        174 - correctness_gpu_mixed_dimensionality (Skipped)
        175 - correctness_gpu_mixed_shared_mem_types (Skipped)
        178 - correctness_gpu_non_monotonic_shared_mem_size (Skipped)
        182 - correctness_gpu_param_allocation (Skipped)
        183 - correctness_gpu_reuse_shared_memory (Skipped)
        184 - correctness_gpu_specialize (Skipped)
        185 - correctness_gpu_store_in_register_with_no_lanes_loop (Skipped)
        186 - correctness_gpu_sum_scan (Skipped)
        187 - correctness_gpu_texture (Skipped)
        188 - correctness_gpu_thread_barrier (Skipped)
        189 - correctness_gpu_transpose (Skipped)
        191 - correctness_gpu_vectorized_shared_memory (Skipped)
        198 - correctness_hexagon_scatter (Skipped)
        221 - correctness_invalid_gpu_loop_nests (Skipped)
        233 - correctness_load_library (Skipped)
        267 - correctness_parallel_gpu_nested (Skipped)
        294 - correctness_register_shuffle (Skipped)
        327 - correctness_specialize_to_gpu (Skipped)
        344 - correctness_target_query (Skipped)
        345 - correctness_tiled_matmul (Skipped)
        381 - correctness_vectorized_gpu_allocation (Skipped)
        478 - error_five_d_gpu_buffer (Skipped)
        624 - generator_aot_define_extern_opencl (Skipped)
        625 - generator_aotcpp_define_extern_opencl (Skipped)
        638 - generator_aot_gpu_object_lifetime (Skipped)
        639 - generator_aotcpp_gpu_object_lifetime (Skipped)
        640 - generator_aot_gpu_only (Skipped)
        641 - generator_aotcpp_gpu_only (Skipped)
        642 - generator_aot_gpu_texture (Skipped)
        643 - generator_aotcpp_gpu_texture (Skipped)
        651 - generator_aot_metal_completion_handler_override (Skipped)
        652 - generator_aotcpp_metal_completion_handler_override (Skipped)
        659 - generator_aot_opencl_runtime (Skipped)
        660 - generator_aotcpp_opencl_runtime (Skipped)

The following tests FAILED:
        301 - correctness_saturating_casts (Subprocess aborted) correctness
Errors while running CTest
@abadams
Copy link
Member

abadams commented Dec 20, 2024

x86-32 in general is supposed to work, and we have buildbots for it, though there might be testing holes. I can't think off the top of my head what might be meaningfully different between our build bot config and vanilla i386. I'll see if I can repro.

@LebedevRI
Copy link
Contributor Author

LebedevRI commented Dec 20, 2024

Package is being built with/against LLVM19+Clang19 with ThinLTO + -O3
in i386 chroot on a normal amd64 machine, if that helps.

@abadams
Copy link
Member

abadams commented Dec 20, 2024

Looks like it's a saturating cast from int to float, which doesn't actually saturate - it's just an int to float conversion. We expect the int to float conversion inside Halide-generated code via saturating cast to be bit-exact with an int to float cast in the calling binary.

Maybe this is an 80-bit floating point issue. In my build it looks like the test binary uses x87 instructions to generate the reference, but the generated code uses an sse2 instruction. That's the mismatch I was wondering about that could potentially cause failure, but it isn't causing failure for me. But if the rounding modes between the two are out of sync, that could get the wrong answer for large ints.

I think it's reasonable to ask for a bit-exact float comparison in this situation on other architectures, but we shouldn't be risking a mismatch between x87 and sse2. What do you think of just adding this to the top of main:


#ifdef __i386__
    printf("[SKIP] Skipping test because it requires bit-exact int to float casts,\n"
           "and on i386 it is hard to guarantee that the test binary won't use x87 instructions.\n");
    return 0;
#endif

@LebedevRI
Copy link
Contributor Author

Pehaps CMake should be specifying -mfpmath=sse for that test?

@abadams
Copy link
Member

abadams commented Dec 20, 2024

Can you test if that fixes it for you? I still can't actually repro.

@LebedevRI
Copy link
Contributor Author

Oh duh. -mfpmath=sse is not going to work because debian-i386 is supposed to work witout sse:
https://wiki.debian.org/ArchitectureSpecificsMemo#i386-1
So i suppose that test should be guarded with something like #if !__i386__ || (__i386__ && __SSE__).

@LebedevRI
Copy link
Contributor Author

I've gone with

diff --git a/test/correctness/saturating_casts.cpp b/test/correctness/saturating_casts.cpp
index 0ce1cfda7..14da87e31 100644
--- a/test/correctness/saturating_casts.cpp
+++ b/test/correctness/saturating_casts.cpp
@@ -290,6 +290,12 @@ void test_one_source() {
 }
 
 int main(int argc, char **argv) {
+#if __i386__ && !__SSE__
+    printf("[SKIP] Skipping test because it requires bit-exact int to float casts,\n"
+           "and on i386 without SSE it is hard to guarantee that the test binary won't use x87 instructions.\n");
+    return 0;
+#endif
+
     test_one_source<int8_t>();
     test_one_source<uint8_t>();
     test_one_source<int16_t>();

for now, seems to be enough to disable that test.

@abadams abadams linked a pull request Dec 23, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants