Python bindings: `add_python_test()`: do set `HL_JIT_TARGET` too #8156

LebedevRI · 2024-03-15T02:25:58Z

This one took quite a bit of digging.

I wanted to enable opencl tests on debian package, and boundary_conditions.py+division.py were failing when run with HL_TARGET=host OCL_ICD_VENDORS=no-opencl-please.missing env variables with clGetPlatformIDs failed, which made no sense to me.

Empty HL_JIT_TARGET results in opencl being detected, unsurprisingly.

This one took quite a bit of digging. I wanted to enable opencl tests on debian package, and `boundary_conditions.py`+`division.py` were failing when run with `HL_TARGET=host OCL_ICD_VENDORS=no-opencl-please.missing` env variables with `clGetPlatformIDs failed`, which made no sense to me. Empty `HL_JIT_TARGET` results in `opencl` being detected, unsurprisingly.

mcourteaux · 2024-03-15T15:02:37Z

Empty HL_JIT_TARGET results in opencl being detected, unsurprisingly.

That surprises me. Isn't a automatically constructed JIT target supposed to have no accelerators enabled? The source seems to confirm this.

Halide/src/Target.cpp

Lines 146 to 334 in f841a27

    
           Target calculate_host_target() { 
        
               Target::OS os = Target::OSUnknown; 
        
           #ifdef __linux__ 
        
               os = Target::Linux; 
        
           #endif 
        
           #ifdef _WIN32 
        
               os = Target::Windows; 
        
           #endif 
        
           #ifdef __APPLE__ 
        
               os = Target::OSX; 
        
           #endif 
        
               bool use_64_bits = (sizeof(size_t) == 8); 
        
               int bits = use_64_bits ? 64 : 32; 
        
               int vector_bits = 0; 
        
               Target::Processor processor = Target::Processor::ProcessorGeneric; 
        
               std::vector<Target::Feature> initial_features; 
        
           #if __riscv 
        
               Target::Arch arch = Target::RISCV; 
        
           #else 
        
           #if defined(__arm__) || defined(__aarch64__) 
        
               Target::Arch arch = Target::ARM; 
        
           #else 
        
           #if defined(__powerpc__) && (defined(__FreeBSD__) || defined(__linux__)) 
        
               Target::Arch arch = Target::POWERPC; 
        
           #if defined(__linux__) 
        
               unsigned long hwcap = getauxval(AT_HWCAP); 
        
               unsigned long hwcap2 = getauxval(AT_HWCAP2); 
        
           #elif defined(__FreeBSD__) 
        
               unsigned long hwcap, hwcap2; 
        
               elf_aux_info(AT_HWCAP, &hwcap, sizeof(hwcap)); 
        
               elf_aux_info(AT_HWCAP2, &hwcap2, sizeof(hwcap2)); 
        
           #endif 
        
               bool have_altivec = (hwcap & PPC_FEATURE_HAS_ALTIVEC) != 0; 
        
               bool have_vsx = (hwcap & PPC_FEATURE_HAS_VSX) != 0; 
        
               bool arch_2_07 = (hwcap2 & PPC_FEATURE2_ARCH_2_07) != 0; 
        
               user_assert(have_altivec) 
        
                   << "The POWERPC backend assumes at least AltiVec support. This machine does not appear to have AltiVec.\n"; 
        
               if (have_vsx) initial_features.push_back(Target::VSX); 
        
               if (arch_2_07) initial_features.push_back(Target::POWER_ARCH_2_07); 
        
           #else 
        
               Target::Arch arch = Target::X86; 
        
               VendorSignatures vendor_signature = get_vendor_signature(); 
        
               int info[4]; 
        
               cpuid(info, 1, 0); 
        
               unsigned family = 0, model = 0; 
        
               detect_family_and_model(info[0], family, model); 
        
               bool have_sse41 = (info[2] & (1 << 19)) != 0;   // ECX[19] 
        
               bool have_sse2 = (info[3] & (1 << 26)) != 0;    // EDX[26] 
        
               bool have_sse3 = (info[2] & (1 << 0)) != 0;     // ECX[0] 
        
               bool have_avx = (info[2] & (1 << 28)) != 0;     // ECX[28] 
        
               bool have_f16c = (info[2] & (1 << 29)) != 0;    // ECX[29] 
        
               bool have_rdrand = (info[2] & (1 << 30)) != 0;  // ECX[30] 
        
               bool have_fma = (info[2] & (1 << 12)) != 0;     // ECX[12] 
        
               user_assert(have_sse2) 
        
                   << "The x86 backend assumes at least sse2 support. This machine does not appear to have sse2.\n" 
        
                   << "cpuid returned: " 
        
                   << std::hex << info[0] 
        
                   << ", " << info[1] 
        
                   << ", " << info[2] 
        
                   << ", " << info[3] 
        
                   << std::dec << "\n"; 
        
               if (vendor_signature == VendorSignatures::AuthenticAMD) { 
        
                   processor = get_amd_processor(family, model, have_sse3); 
        
                   if (processor == Target::Processor::ZnVer4) { 
        
                       Target t{os, arch, bits, processor, initial_features, vector_bits}; 
        
                       t.set_features({Target::SSE41, Target::AVX, 
        
                                       Target::F16C, Target::FMA, 
        
                                       Target::AVX2, Target::AVX512, 
        
                                       Target::AVX512_Skylake, Target::AVX512_Cannonlake, 
        
                                       Target::AVX512_Zen4}); 
        
                       return t; 
        
                   } 
        
               } 
        
               // Processors not specifically detected by model number above use the cpuid 
        
               // feature bits to determine what flags are supported. For future models, 
        
               // detect them explicitly above rather than extending the code below. 
        
               if (have_sse41) { 
        
                   initial_features.push_back(Target::SSE41); 
        
               } 
        
               if (have_avx) { 
        
                   initial_features.push_back(Target::AVX); 
        
               } 
        
               if (have_f16c) { 
        
                   initial_features.push_back(Target::F16C); 
        
               } 
        
               if (have_fma) { 
        
                   initial_features.push_back(Target::FMA); 
        
               } 
        
               if (use_64_bits && have_avx && have_f16c && have_rdrand) { 
        
                   // So far, so good.  AVX2/512? 
        
                   // Call cpuid with eax=7, ecx=0 
        
                   int info2[4]; 
        
                   cpuid(info2, 7, 0); 
        
                   int info3[4]; 
        
                   cpuid(info3, 7, 1); 
        
                   const uint32_t avx2 = 1U << 5; 
        
                   const uint32_t avx512f = 1U << 16; 
        
                   const uint32_t avx512dq = 1U << 17; 
        
                   const uint32_t avx512pf = 1U << 26; 
        
                   const uint32_t avx512er = 1U << 27; 
        
                   const uint32_t avx512cd = 1U << 28; 
        
                   const uint32_t avx512bw = 1U << 30; 
        
                   const uint32_t avx512vl = 1U << 31; 
        
                   const uint32_t avx512ifma = 1U << 21; 
        
                   const uint32_t avx512 = avx512f | avx512cd; 
        
                   const uint32_t avx512_knl = avx512 | avx512pf | avx512er; 
        
                   const uint32_t avx512_skylake = avx512 | avx512vl | avx512bw | avx512dq; 
        
                   const uint32_t avx512_cannonlake = avx512_skylake | avx512ifma;  // Assume ifma => vbmi 
        
                   if ((info2[1] & avx2) == avx2) { 
        
                       initial_features.push_back(Target::AVX2); 
        
                   } 
        
                   if ((info2[1] & avx512) == avx512) { 
        
                       initial_features.push_back(Target::AVX512); 
        
                       // TODO: port to family/model -based detection. 
        
                       if ((info2[1] & avx512_knl) == avx512_knl) { 
        
                           initial_features.push_back(Target::AVX512_KNL); 
        
                       } 
        
                       // TODO: port to family/model -based detection. 
        
                       if ((info2[1] & avx512_skylake) == avx512_skylake) { 
        
                           initial_features.push_back(Target::AVX512_Skylake); 
        
                       } 
        
                       // TODO: port to family/model -based detection. 
        
                       if ((info2[1] & avx512_cannonlake) == avx512_cannonlake) { 
        
                           initial_features.push_back(Target::AVX512_Cannonlake); 
        
                           const uint32_t avxvnni = 1U << 4;     // avxvnni (note, not avx512vnni) result in eax 
        
                           const uint32_t avx512bf16 = 1U << 5;  // bf16 result in eax, with cpuid(eax=7, ecx=1) 
        
                           // TODO: port to family/model -based detection. 
        
                           if ((info3[0] & avxvnni) == avxvnni && 
        
                               (info3[0] & avx512bf16) == avx512bf16) { 
        
                               initial_features.push_back(Target::AVX512_SapphireRapids); 
        
                           } 
        
                       } 
        
                   } 
        
                   // AVX10 converged vector instructions. 
        
                   const uint32_t avx10 = 1U << 19; 
        
                   if (info2[3] & avx10) { 
        
                       int info_avx10[4]; 
        
                       cpuid(info_avx10, 0x24, 0x0); 
        
                       // This checks that the AVX10 version is greater than zero. 
        
                       // It isn't really needed as for now only one version exists, but 
        
                       // the docs indicate bits 0:7 of EBX should be >= 0 so... 
        
                       if ((info[1] & 0xff) >= 1) { 
        
                           initial_features.push_back(Target::AVX10_1); 
        
                           const uint32_t avx10_128 = 1U << 16; 
        
                           const uint32_t avx10_256 = 1U << 17; 
        
                           const uint32_t avx10_512 = 1U << 18; 
        
                           // Choose the maximum one that is available. 
        
                           if (info[1] & avx10_512) { 
        
                               vector_bits = 512; 
        
                           } else if (info[1] & avx10_256) { 
        
                               vector_bits = 256; 
        
                           } else if (info[1] & avx10_128) {  // Not clear it is worth turning on AVX10 for this case. 
        
                               vector_bits = 128; 
        
                           } 
        
                       } 
        
                   } 
        
                   // APX register extensions, etc. 
        
                   const uint32_t apx = 1U << 21; 
        
                   if (info3[3] & apx) { 
        
                       initial_features.push_back(Target::X86APX); 
        
                   } 
        
               } 
        
           #endif 
        
           #endif 
        
           #endif 
        
               return {os, arch, bits, processor, initial_features, vector_bits}; 
        
           }

Together with the definition of get_jit_target_from_environment():

Halide/src/Target.cpp

Lines 645 to 665 in f841a27

    
           Target get_jit_target_from_environment() { 
        
               Target host = get_host_target(); 
        
               host.set_feature(Target::JIT); 
        
               string target = Internal::get_env_variable("HL_JIT_TARGET"); 
        
               if (target.empty()) { 
        
                   set_sanitizer_bits(host); 
        
                   return host; 
        
               } else { 
        
                   Target t(target); 
        
                   t.set_feature(Target::JIT); 
        
                   user_assert((t.os == host.os && t.arch == host.arch && t.bits == host.bits) || Internal::WasmModule::can_jit_target(t)) 
        
                       << "HL_JIT_TARGET must match the host OS, architecture, and bit width.\n" 
        
                       << "HL_JIT_TARGET was " << target << ". " 
        
                       << "Host is " << host.to_string() << ".\n"; 
        
                   user_assert(!t.has_feature(Target::NoBoundsQuery)) 
        
                       << "The Halide JIT requires the use of bounds query, but HL_JIT_TARGET was specified with no_bounds_query: " << target; 
        
                   set_sanitizer_bits(t); 
        
                   return t; 
        
               } 
        
           }

It should just construct the default host (without accelerators).

This leads me to believe that the HL_JIT_TARGET is actually set somewhere else in the process of launching the tests in question.

mcourteaux · 2024-03-15T15:06:56Z

Hmm, the C++ version of the tests does the same as your proposed patch:

Halide/cmake/HalideTestHelpers.cmake

Lines 68 to 72 in f841a27

    
               set_tests_properties(${TARGET} PROPERTIES 
        
                                    LABELS "${args_GROUPS}" 
        
                                    ENVIRONMENT "HL_TARGET=${Halide_TARGET};HL_JIT_TARGET=${Halide_TARGET}" 
        
                                    SKIP_REGULAR_EXPRESSION "\\[SKIP\\]" 
        
                                    WILL_FAIL ${args_EXPECT_FAILURE})

LebedevRI · 2024-03-19T01:06:29Z

@steven-johnson @alexreinking thanks. Question: is there any interest in adding a tests_standalone directory,
which will find_package(halide) and add_subdirectory() all of the existing tests/apps?

alexreinking · 2024-03-19T11:47:54Z

@LebedevRI we already test the apps in CI using find_package against a newly installed version of Halide. That's also how we test package breakages.

Maybe I'm not understanding the key difference between what you're proposing and what we're doing?

LebedevRI · 2024-03-19T12:19:42Z

Yes, but not test/ and not python_bindings/test/.
Roughly, i'm wondering about building halide without tests (-DWITH_TESTS=OFF),
install it, and then build tests against installed halide version, indeed, like the apps are tested.

alexreinking · 2024-03-20T02:47:59Z

@LebedevRI - aha! I misunderstood you.

I would welcome that contribution. I don't think we're testing JIT mode from the installed package at the moment.

I just wonder how much practical benefit running all of them offers? What is the right balance and/or duplication between the build and package copies of the same tests in CI? Can we avoid affecting contributors' workflows at all?

There might be something I'm not seeing, too 🙂

LebedevRI · 2024-03-20T12:31:10Z

@LebedevRI - aha! I misunderstood you.

I would welcome that contribution. I don't think we're testing JIT mode from the installed package at the moment.

Aha.

I just wonder how much practical benefit running all of them offers?

I'm mainly just bothered that the whole halide needs to be reconfigured with different Halide_TARGET,
this just feels suboptimal, now that i'm looking to enable opencl tests on debian package,
plus, might additionally allow to enable to run those deb package halide tests as
https://wiki.debian.org/ContinuousIntegration/autopkgtest, without building the whole package, just tests.

What is the right balance and/or duplication between the build and package copies of the same tests in CI? Can we avoid affecting contributors' workflows at all?

Yeah, i would guess that only one of the ways should be used by the whatever wishes to run the tests.

There might be something I'm not seeing, too 🙂

alexreinking · 2024-03-20T13:37:29Z

I'm mainly just bothered that the whole halide needs to be reconfigured with different Halide_TARGET,
this just feels suboptimal, now that i'm looking to enable opencl tests on debian package,

This is a great point.

steven-johnson requested a review from alexreinking March 15, 2024 16:35

alexreinking approved these changes Mar 15, 2024

View reviewed changes

steven-johnson merged commit 8864e8a into halide:main Mar 18, 2024

LebedevRI deleted the python-jit-target branch March 18, 2024 23:10

abadams mentioned this pull request Apr 2, 2024

oss-fuzz is of dubious value for our use-cases #8171

Open

BrewTestBot mentioned this pull request Jul 17, 2024

halide 18.0.0 Homebrew/homebrew-core#177657

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python bindings: `add_python_test()`: do set `HL_JIT_TARGET` too #8156

Python bindings: `add_python_test()`: do set `HL_JIT_TARGET` too #8156

LebedevRI commented Mar 15, 2024

mcourteaux commented Mar 15, 2024 •

edited

Loading

mcourteaux commented Mar 15, 2024

LebedevRI commented Mar 19, 2024

alexreinking commented Mar 19, 2024

LebedevRI commented Mar 19, 2024

alexreinking commented Mar 20, 2024

LebedevRI commented Mar 20, 2024

alexreinking commented Mar 20, 2024

Python bindings: add_python_test(): do set HL_JIT_TARGET too #8156

Python bindings: add_python_test(): do set HL_JIT_TARGET too #8156

Conversation

LebedevRI commented Mar 15, 2024

mcourteaux commented Mar 15, 2024 • edited Loading

mcourteaux commented Mar 15, 2024

LebedevRI commented Mar 19, 2024

alexreinking commented Mar 19, 2024

LebedevRI commented Mar 19, 2024

alexreinking commented Mar 20, 2024

LebedevRI commented Mar 20, 2024

alexreinking commented Mar 20, 2024

Python bindings: `add_python_test()`: do set `HL_JIT_TARGET` too #8156

Python bindings: `add_python_test()`: do set `HL_JIT_TARGET` too #8156

mcourteaux commented Mar 15, 2024 •

edited

Loading