Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python bindings: add_python_test(): do set HL_JIT_TARGET too #8156

Merged
merged 1 commit into from
Mar 18, 2024

Conversation

LebedevRI
Copy link
Contributor

This one took quite a bit of digging.

I wanted to enable opencl tests on debian package, and boundary_conditions.py+division.py were failing when run with HL_TARGET=host OCL_ICD_VENDORS=no-opencl-please.missing env variables with clGetPlatformIDs failed, which made no sense to me.

Empty HL_JIT_TARGET results in opencl being detected, unsurprisingly.

This one took quite a bit of digging.

I wanted to enable opencl tests on debian package,
and `boundary_conditions.py`+`division.py` were failing
when run with `HL_TARGET=host OCL_ICD_VENDORS=no-opencl-please.missing`
env variables with `clGetPlatformIDs failed`, which made no sense to me.

Empty `HL_JIT_TARGET` results in `opencl` being detected,
unsurprisingly.
@mcourteaux
Copy link
Contributor

mcourteaux commented Mar 15, 2024

Empty HL_JIT_TARGET results in opencl being detected, unsurprisingly.

That surprises me. Isn't a automatically constructed JIT target supposed to have no accelerators enabled? The source seems to confirm this.

Halide/src/Target.cpp

Lines 146 to 334 in f841a27

Target calculate_host_target() {
Target::OS os = Target::OSUnknown;
#ifdef __linux__
os = Target::Linux;
#endif
#ifdef _WIN32
os = Target::Windows;
#endif
#ifdef __APPLE__
os = Target::OSX;
#endif
bool use_64_bits = (sizeof(size_t) == 8);
int bits = use_64_bits ? 64 : 32;
int vector_bits = 0;
Target::Processor processor = Target::Processor::ProcessorGeneric;
std::vector<Target::Feature> initial_features;
#if __riscv
Target::Arch arch = Target::RISCV;
#else
#if defined(__arm__) || defined(__aarch64__)
Target::Arch arch = Target::ARM;
#else
#if defined(__powerpc__) && (defined(__FreeBSD__) || defined(__linux__))
Target::Arch arch = Target::POWERPC;
#if defined(__linux__)
unsigned long hwcap = getauxval(AT_HWCAP);
unsigned long hwcap2 = getauxval(AT_HWCAP2);
#elif defined(__FreeBSD__)
unsigned long hwcap, hwcap2;
elf_aux_info(AT_HWCAP, &hwcap, sizeof(hwcap));
elf_aux_info(AT_HWCAP2, &hwcap2, sizeof(hwcap2));
#endif
bool have_altivec = (hwcap & PPC_FEATURE_HAS_ALTIVEC) != 0;
bool have_vsx = (hwcap & PPC_FEATURE_HAS_VSX) != 0;
bool arch_2_07 = (hwcap2 & PPC_FEATURE2_ARCH_2_07) != 0;
user_assert(have_altivec)
<< "The POWERPC backend assumes at least AltiVec support. This machine does not appear to have AltiVec.\n";
if (have_vsx) initial_features.push_back(Target::VSX);
if (arch_2_07) initial_features.push_back(Target::POWER_ARCH_2_07);
#else
Target::Arch arch = Target::X86;
VendorSignatures vendor_signature = get_vendor_signature();
int info[4];
cpuid(info, 1, 0);
unsigned family = 0, model = 0;
detect_family_and_model(info[0], family, model);
bool have_sse41 = (info[2] & (1 << 19)) != 0; // ECX[19]
bool have_sse2 = (info[3] & (1 << 26)) != 0; // EDX[26]
bool have_sse3 = (info[2] & (1 << 0)) != 0; // ECX[0]
bool have_avx = (info[2] & (1 << 28)) != 0; // ECX[28]
bool have_f16c = (info[2] & (1 << 29)) != 0; // ECX[29]
bool have_rdrand = (info[2] & (1 << 30)) != 0; // ECX[30]
bool have_fma = (info[2] & (1 << 12)) != 0; // ECX[12]
user_assert(have_sse2)
<< "The x86 backend assumes at least sse2 support. This machine does not appear to have sse2.\n"
<< "cpuid returned: "
<< std::hex << info[0]
<< ", " << info[1]
<< ", " << info[2]
<< ", " << info[3]
<< std::dec << "\n";
if (vendor_signature == VendorSignatures::AuthenticAMD) {
processor = get_amd_processor(family, model, have_sse3);
if (processor == Target::Processor::ZnVer4) {
Target t{os, arch, bits, processor, initial_features, vector_bits};
t.set_features({Target::SSE41, Target::AVX,
Target::F16C, Target::FMA,
Target::AVX2, Target::AVX512,
Target::AVX512_Skylake, Target::AVX512_Cannonlake,
Target::AVX512_Zen4});
return t;
}
}
// Processors not specifically detected by model number above use the cpuid
// feature bits to determine what flags are supported. For future models,
// detect them explicitly above rather than extending the code below.
if (have_sse41) {
initial_features.push_back(Target::SSE41);
}
if (have_avx) {
initial_features.push_back(Target::AVX);
}
if (have_f16c) {
initial_features.push_back(Target::F16C);
}
if (have_fma) {
initial_features.push_back(Target::FMA);
}
if (use_64_bits && have_avx && have_f16c && have_rdrand) {
// So far, so good. AVX2/512?
// Call cpuid with eax=7, ecx=0
int info2[4];
cpuid(info2, 7, 0);
int info3[4];
cpuid(info3, 7, 1);
const uint32_t avx2 = 1U << 5;
const uint32_t avx512f = 1U << 16;
const uint32_t avx512dq = 1U << 17;
const uint32_t avx512pf = 1U << 26;
const uint32_t avx512er = 1U << 27;
const uint32_t avx512cd = 1U << 28;
const uint32_t avx512bw = 1U << 30;
const uint32_t avx512vl = 1U << 31;
const uint32_t avx512ifma = 1U << 21;
const uint32_t avx512 = avx512f | avx512cd;
const uint32_t avx512_knl = avx512 | avx512pf | avx512er;
const uint32_t avx512_skylake = avx512 | avx512vl | avx512bw | avx512dq;
const uint32_t avx512_cannonlake = avx512_skylake | avx512ifma; // Assume ifma => vbmi
if ((info2[1] & avx2) == avx2) {
initial_features.push_back(Target::AVX2);
}
if ((info2[1] & avx512) == avx512) {
initial_features.push_back(Target::AVX512);
// TODO: port to family/model -based detection.
if ((info2[1] & avx512_knl) == avx512_knl) {
initial_features.push_back(Target::AVX512_KNL);
}
// TODO: port to family/model -based detection.
if ((info2[1] & avx512_skylake) == avx512_skylake) {
initial_features.push_back(Target::AVX512_Skylake);
}
// TODO: port to family/model -based detection.
if ((info2[1] & avx512_cannonlake) == avx512_cannonlake) {
initial_features.push_back(Target::AVX512_Cannonlake);
const uint32_t avxvnni = 1U << 4; // avxvnni (note, not avx512vnni) result in eax
const uint32_t avx512bf16 = 1U << 5; // bf16 result in eax, with cpuid(eax=7, ecx=1)
// TODO: port to family/model -based detection.
if ((info3[0] & avxvnni) == avxvnni &&
(info3[0] & avx512bf16) == avx512bf16) {
initial_features.push_back(Target::AVX512_SapphireRapids);
}
}
}
// AVX10 converged vector instructions.
const uint32_t avx10 = 1U << 19;
if (info2[3] & avx10) {
int info_avx10[4];
cpuid(info_avx10, 0x24, 0x0);
// This checks that the AVX10 version is greater than zero.
// It isn't really needed as for now only one version exists, but
// the docs indicate bits 0:7 of EBX should be >= 0 so...
if ((info[1] & 0xff) >= 1) {
initial_features.push_back(Target::AVX10_1);
const uint32_t avx10_128 = 1U << 16;
const uint32_t avx10_256 = 1U << 17;
const uint32_t avx10_512 = 1U << 18;
// Choose the maximum one that is available.
if (info[1] & avx10_512) {
vector_bits = 512;
} else if (info[1] & avx10_256) {
vector_bits = 256;
} else if (info[1] & avx10_128) { // Not clear it is worth turning on AVX10 for this case.
vector_bits = 128;
}
}
}
// APX register extensions, etc.
const uint32_t apx = 1U << 21;
if (info3[3] & apx) {
initial_features.push_back(Target::X86APX);
}
}
#endif
#endif
#endif
return {os, arch, bits, processor, initial_features, vector_bits};
}

Together with the definition of get_jit_target_from_environment():

Halide/src/Target.cpp

Lines 645 to 665 in f841a27

Target get_jit_target_from_environment() {
Target host = get_host_target();
host.set_feature(Target::JIT);
string target = Internal::get_env_variable("HL_JIT_TARGET");
if (target.empty()) {
set_sanitizer_bits(host);
return host;
} else {
Target t(target);
t.set_feature(Target::JIT);
user_assert((t.os == host.os && t.arch == host.arch && t.bits == host.bits) || Internal::WasmModule::can_jit_target(t))
<< "HL_JIT_TARGET must match the host OS, architecture, and bit width.\n"
<< "HL_JIT_TARGET was " << target << ". "
<< "Host is " << host.to_string() << ".\n";
user_assert(!t.has_feature(Target::NoBoundsQuery))
<< "The Halide JIT requires the use of bounds query, but HL_JIT_TARGET was specified with no_bounds_query: " << target;
set_sanitizer_bits(t);
return t;
}
}

It should just construct the default host (without accelerators).

This leads me to believe that the HL_JIT_TARGET is actually set somewhere else in the process of launching the tests in question.

@mcourteaux
Copy link
Contributor

Hmm, the C++ version of the tests does the same as your proposed patch:

set_tests_properties(${TARGET} PROPERTIES
LABELS "${args_GROUPS}"
ENVIRONMENT "HL_TARGET=${Halide_TARGET};HL_JIT_TARGET=${Halide_TARGET}"
SKIP_REGULAR_EXPRESSION "\\[SKIP\\]"
WILL_FAIL ${args_EXPECT_FAILURE})

@steven-johnson steven-johnson merged commit 8864e8a into halide:main Mar 18, 2024
@LebedevRI LebedevRI deleted the python-jit-target branch March 18, 2024 23:10
@LebedevRI
Copy link
Contributor Author

@steven-johnson @alexreinking thanks. Question: is there any interest in adding a tests_standalone directory,
which will find_package(halide) and add_subdirectory() all of the existing tests/apps?

@alexreinking
Copy link
Member

@LebedevRI we already test the apps in CI using find_package against a newly installed version of Halide. That's also how we test package breakages.

Maybe I'm not understanding the key difference between what you're proposing and what we're doing?

@LebedevRI
Copy link
Contributor Author

Yes, but not test/ and not python_bindings/test/.
Roughly, i'm wondering about building halide without tests (-DWITH_TESTS=OFF),
install it, and then build tests against installed halide version, indeed, like the apps are tested.

@alexreinking
Copy link
Member

@LebedevRI - aha! I misunderstood you.

I would welcome that contribution. I don't think we're testing JIT mode from the installed package at the moment.

I just wonder how much practical benefit running all of them offers? What is the right balance and/or duplication between the build and package copies of the same tests in CI? Can we avoid affecting contributors' workflows at all?

There might be something I'm not seeing, too 🙂

@LebedevRI
Copy link
Contributor Author

@LebedevRI - aha! I misunderstood you.

I would welcome that contribution. I don't think we're testing JIT mode from the installed package at the moment.

Aha.

I just wonder how much practical benefit running all of them offers?

I'm mainly just bothered that the whole halide needs to be reconfigured with different Halide_TARGET,
this just feels suboptimal, now that i'm looking to enable opencl tests on debian package,
plus, might additionally allow to enable to run those deb package halide tests as
https://wiki.debian.org/ContinuousIntegration/autopkgtest, without building the whole package, just tests.

What is the right balance and/or duplication between the build and package copies of the same tests in CI? Can we avoid affecting contributors' workflows at all?

Yeah, i would guess that only one of the ways should be used by the whatever wishes to run the tests.

There might be something I'm not seeing, too 🙂

@alexreinking
Copy link
Member

I'm mainly just bothered that the whole halide needs to be reconfigured with different Halide_TARGET,
this just feels suboptimal, now that i'm looking to enable opencl tests on debian package,

This is a great point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants