Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: casadi_interpolant_linear.dll not found on Windows with IDAKLUSolver, fails when simulating a drive cycle using Interpolant #3783

Closed
chmabaur opened this issue Jan 30, 2024 · 32 comments · Fixed by #4077
Assignees
Labels
bug Something isn't working priority: high To be resolved as soon as possible release blocker Issues that need to be addressed before the creation of a release

Comments

@chmabaur
Copy link
Contributor

PyBaMM Version

24.1rc2

Python Version

3.10.11

Describe the bug

I want to simulate a drive cycle by passing an Interpolant function to the "Current function [A]" parameter. This however fails as casadi_interpolant_linear.dll is not found.

Steps to Reproduce

import pybamm
import numpy as np

model = pybamm.lithium_ion.DFN()
geometry = model.default_geometry
param = model.default_parameter_values

# Create interpolant
current_interpolant = pybamm.Interpolant(
    np.linspace(0, 5000, 50000),
    np.ones(50000)*0.7,
    pybamm.t
    )

# Set drive cycle
param.update({"Current function [A]": current_interpolant})

# Solve
solver_idaklu = pybamm.IDAKLUSolver(
    rtol=1e-8, atol=1e-8)

sim = pybamm.Simulation(
    model, parameter_values=param, solver=solver_idaklu)
solution_idaklu = sim.solve(calc_esoh=False)

Relevant log output

File C:\Software\Python\Environments\pybamm_venv\lib\site-packages\pybamm\solvers\idaklu_solver.py:439 in set_up
    rhs_algebraic = idaklu.generate_function(rhs_algebraic.serialize())

RuntimeError: C:\vcpkg\buildtrees\casadi\src\3.6.4-aa7d507b6b.clean\casadi\core\casadi_os.cpp:166: Assertion "handle!=nullptr" failed:
PluginInterface::load_plugin: Cannot load shared library 'casadi_interpolant_linear.dll': 
   (
    Searched directories: 1. casadipath from GlobalOptions
                          2. CASADIPATH env var
                          3. PATH env var (Windows)
                          4. LD_LIBRARY_PATH env var (Linux)
                          5. DYLD_LIBRARY_PATH env var (osx)
    A library may be 'not found' even if the file exists:
          * library is not compatible (different compiler/bitness)
          * the dependencies are not found
   )
  Tried '' :
    Error code (WIN32): 126
  Tried '.' :
    Error code (WIN32): 126
@chmabaur chmabaur added the bug Something isn't working label Jan 30, 2024
@chmabaur chmabaur changed the title [Bug]: IDAKLUSolver fails with drive cycle using Interpolant [Bug]: IDAKLUSolver fails when simulating a drive cycle using Interpolant Jan 30, 2024
@chmabaur
Copy link
Contributor Author

@agriyakhetarpal: Are you aware of this issue?

@agriyakhetarpal
Copy link
Member

Thanks @chmabaur – first time I am looking at this, will explore further.

From an initial look, it looks like it's using some version of casadi from vcpkg rather than from the Pythonic version? Also, could you try with casadi==3.6.4 and casadi==3.6.3 and post the output logs?

@chmabaur
Copy link
Contributor Author

@agriyakhetarpal Thanks for the very fast reply :-). Seems like the dll is linked wrongly, as the folder C:\vcpkg does not exist on my computer.

I have uninstalled casadi 3.6.4 and installed casadi 3.6.3, but still it is looking for C:\vcpkg\buildtrees\casadi\src\3.6.4-aa7d507b6b.clean\ .... (note version 3.6.4 ...).

@agriyakhetarpal
Copy link
Member

Fair enough. I will look into possible issues with the wheel for v24.1rc2. Could you check if v24.1rc1 and v24.1rc0 work with your script or do they raise the same error?

@agriyakhetarpal agriyakhetarpal changed the title [Bug]: IDAKLUSolver fails when simulating a drive cycle using Interpolant [Bug]: casadi_interpolant_linear.dll not found on Windows with IDAKLUSolver, fails when simulating a drive cycle using Interpolant Jan 30, 2024
@agriyakhetarpal
Copy link
Member

I have renamed the issue to reflect how this is a problem with the compilation of the solver rather than its use of it – it might work without issues on a different platform.

@chmabaur
Copy link
Contributor Author

Getting the same error for all 24.1 versions.

@agriyakhetarpal
Copy link
Member

agriyakhetarpal commented Jan 30, 2024

@chmabaur that makes sense, though I think this is a CasADi problem. I ran delvewheel on the Python wheel and this is what it returned:

The following DLLs will be copied into the wheel.
    vcomp140.dll (Error: Not Found)
    msvcp140.dll (Error: Not Found)

The following DLLs are assumed to be present in the end user's environment and will not be copied into the wheel.
    api-ms-win-crt-environment-l1-1-0.dll
    api-ms-win-crt-filesystem-l1-1-0.dll
    api-ms-win-crt-heap-l1-1-0.dll
    api-ms-win-crt-math-l1-1-0.dll
    api-ms-win-crt-runtime-l1-1-0.dll
    api-ms-win-crt-stdio-l1-1-0.dll
    api-ms-win-crt-string-l1-1-0.dll
    api-ms-win-crt-time-l1-1-0.dll
    api-ms-win-crt-utility-l1-1-0.dll
    kernel32.dll
    python39.dll
    vcruntime140.dll
    vcruntime140_1.dll

Warning: At least one dependent DLL needs to be copied into the wheel but was not found.

which suggests that the PyBaMM wheel is built properly since the top two of the DLLs in the list are provided by Windows APIs (I ran it on macOS which is why I don't have them, and in other words the wheel is now tested, even!). However, I am happy to suggest an array of options:

  1. Try using Dependency Walker on libcasadi.dll to look at its external runtime dependencies – does it find casadi_interpolant_linear.dll?
  2. Try to locate casadi_interpolant_linear.dll manually on your Windows system in the Python site-packages directory where CasADi is installed, since you mentioned that it has not been built from source using vcpkg
  3. Possibly try adjusting the PYTHONPATH environment variable since it is what is used to look for DLLs associated with .pyd files.
  4. Setting the CASADIPATH and PATH environment variables to point to the location of either your Python or CasADi installation might work, I found this after a bit of digging: Solution to error code on Windows 10 using Python 3.11, CasADi 3.6.3 and VSCode: Cannot load shared library 'libcasadi_nlpsol_ipopt.dll' casadi/casadi#3443

This is a very tricky bug to solve and I can't deny the plausibility of something being wrong in building the wheels on our end – I hope these solutions work for you. I am assuming that the CasadiSolver works as usual and this DLL file is found, OTOH?

@agriyakhetarpal agriyakhetarpal added the priority: high To be resolved as soon as possible label Jan 30, 2024
@chmabaur
Copy link
Contributor Author

chmabaur commented Jan 31, 2024

@agriyakhetarpal I am a bit confused. I have found the file libcasadi_interpolant_linear.dll in the site-packages/casadi folder of the virtual environment. Note however the name, it is called libcasadi_ instead of casadi_ ..., is there a reference to the wrong name?

DependencyWalker on libcasadi.dll has not found any reference to a casadi_interpolant_linear.dll or libcasadi_interpolant_linear.dll.

And yes, CasadiSolver works with the drive cycle!

@agriyakhetarpal
Copy link
Member

Thanks for your insights. I would say we didn't link the wheel wrongly, because I am unable to reproduce the issue; however I can only assume so – did adjusting/prepending PYTHONPATH, CASADIPATH, or PATH to the CasADi directory give any insights?

Dependency Walker is a bit unreliable since it is a piece of legacy software hasn't been designed to work with newer Windows editions and does manage to bring up false positives too sometimes, not that it did so in this case. I am not aware of any popular alternatives. Does https://github.com/lucasg/Dependencies provide any insights – from a quick look it looks like a rewrite of the tool?

I don't know if renaming and removing the lib prefix would do something, but given the absurdity of the error it is worth trying :D as far as I know, the CMake generator does this as a convention.

It looks like an upstream bug in your case. I also found that Windows error 126 is also associated with insufficient permissions

@chmabaur
Copy link
Contributor Author

Yes I was using the Dependencies software actually as I got some errors with Dependency Walker :-).

I have added site-packages/casadi folder to the PATH and CASADIPATH which gave the same error. However, I have then removed the lib prefix, which actually then caused the python kernel to crash. So it seems the library was found then, but it seems like it is not the correct one?!

You cannot reproduce the issue? I actually get the same error on two different Windows systems, so I expect this to occur also for other users. What operating system are you using? Do you use self-compiled package or do you use virtual environment with all installed with pip?

@agriyakhetarpal
Copy link
Member

agriyakhetarpal commented Jan 31, 2024

I am able to reproduce the issue on a Windows 11 amd64 machine, now. I downloaded the 24.1 PyBaMM wheel that hit PyPI a few hours ago and installed it with pip and did not face issues with instantiating pybamm.IDAKLUSolver(), but the moment I gave a shot at testing against your MWE – it indeed ends up failing with the same error as yours (I apologise for the oversight). I am using CasADi 3.6.4 installed from pip, and upon renaming libcasadi_interpolant_linear.dll to casadi_interpolant_linear.dll, it strangely claims to look for and subsequently not find the original libcasadi_interpolant_linear.dll.

I will try looking into this, it looks like we should be testing PyBaMM much more extensively after wheels for it are built, across platforms. The MWE looks like it works on my macOS system, however – it returns no errors there.

@agriyakhetarpal
Copy link
Member

cc: @jsbrittain and @martinjrobins, would appreciate your help debugging this especially because this has affected the pip-installation of PyBaMM on Windows for 24.1

@agriyakhetarpal
Copy link
Member

@chmabaur, I'm just trying to ascertain when the problem arose – could you re-test the wheels from our conversation in #3100 (comment) with your MWE and check if it works as intended?

@chmabaur
Copy link
Contributor Author

chmabaur commented Feb 2, 2024

@agriyakhetarpal I get the same error with https://github.com/agriyakhetarpal/PyBaMM/actions/runs/7137563916 these wheels.

@jsbrittain
Copy link
Contributor

@agriyakhetarpal Hi - I don't have a solution, but wanted to add the following observations. I've managed to recreate the issue described above, and in addition tried to run a MWE using only casadi. The result is interesting, and makes me suspect that there is a bug in the pybamm wheel build process. Briefly, this code creates a linear interpolant in casadi (no pybamm):

import numpy as np
from casadi import interpolant

u = np.linspace(1, 5, 10)
v = 2 * u
f = interpolant("interp", "linear", [u], v, {})

This runs fine on windows (github runner) and macos (my macbook). If you then remove or rename libcasadi_interpolant_linear.dll from site-packages then it produces the following error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\hostedtoolcache\windows\Python\3.8.10\x64\lib\site-packages\casadi\casadi.py", line 33500, in interpolant
    return _casadi.interpolant(*args)
RuntimeError: .../casadi/core/casadi_os.cpp:166: Assertion "handle!=nullptr" failed:
PluginInterface::load_plugin: Cannot load shared library 'libcasadi_interpolant_linear.dll':
   (
    Searched directories: 1. casadipath from GlobalOptions
                          2. CASADIPATH env var
                          3. PATH env var (Windows)
                          4. LD_LIBRARY_PATH env var (Linux)
                          5. DYLD_LIBRARY_PATH env var (osx)
    A library may be 'not found' even if the file exists:
          * library is not compatible (different compiler/bitness)
          * the dependencies are not found
   )
  Tried 'C:\hostedtoolcache\windows\Python\3.8.10\x64\lib\site-packages\casadi' :
    Error code (WIN32): 126
  Tried '' :
    Error code (WIN32): 126
  Tried '.' :
    Error code (WIN32): 126

This looks a lot like the error original reported, but with two important differences:

  1. the runtimeerror spawns from a process that uses a relative file path (not c:\vcpkg; which is reproducible if you install pybamm wheels on windows even without installing vcpkg), and
  2. it is trying to load libcasadi....dll, not casadi...dll.

Moving the dll into the current working directory resolves the issue (i.e. does not produce the segmentation fault mentioned above; which is also reproducible). Trying the original MWE in the same environment produces the error listed at the top of this issue (Cannot load shared library 'casadi_interpolant_linear.dll'), so casadi seems to work, but running through pybamm does not.

@jsbrittain
Copy link
Contributor

jsbrittain commented Feb 2, 2024

@agriyakhetarpal Installing PyBaMM 23.1 on Windows produces the (now resolved) DeSerialization of ProtoFunction failed. Object written in version 2 but can only read in version 1. bug, but by also downgrading to casadi 3.5.4 (or 3.5.1-3.5.3) we recreate the PluginInterface::load_plugin: Cannot load shared library 'casadi_interpolant_linear.dll' problem, so this appears to be a longstanding issue.

@agriyakhetarpal
Copy link
Member

Thank you for this investigation, @jsbrittain – from what I understand, it looks like Python-CasADi works well and as expected, but PyBaMM's IDAKLU-wrapped CasADi is looking at the wrong paths for its libraries. Perhaps, if we are able to use the Python-CasADi for our linkage on Windows (#3603)—trying which did not work during my experiments in #3100 as we had discussed and had similar issues with lib- prefixes on DLLs—we might be able to circumvent this problem, because macOS and Linux fail to reproduce the issue in the MWE and from my insight we are building CasADi from source through the means of the vcpkg registry on Windows.

On an unrelated note, we should consider using delvewheel to repair the Windows wheels too, it can help find missing libraries and patch them into the wheel – while it is not usually required due to how .pyd files work, it might help us debug some of these scenarios in the future.

From what you said, can we assume that we have a temporary workaround for this issue, i.e., "Moving the dll into the current working directory" which should make the drive cycle script work, or was this just for the CasADi code snippet?

@jsbrittain
Copy link
Contributor

From what you said, can we assume that we have a temporary workaround for this issue, i.e., "Moving the dll into the current working directory" which should make the drive cycle script work, or was this just for the CasADi code snippet?

That was just for the casadi code snippet I'm afraid. Renaming the files for PyBaMM produces a segmentation fault (which suggests that it is not just the name, but a version mismatch also?).

@agriyakhetarpal
Copy link
Member

That was just for the casadi code snippet I'm afraid. Renaming the files for PyBaMM produces a segmentation fault (which suggests that it is not just the name, but a version mismatch also?).

Fair enough, we can't realistically keep the CasADi versions in sync for the most part. We should be able to keep the build-system CasADi as such in order to build wheels with the same version as the one defined in the vcpkg registry, but pinning its dependency is not recommended practice, especially because PyBaMM is meant to be used as a library as well. I will take a look at how to use Python-CasADi for linkage, but you mentioned earlier that this lib prependage in the DLL name could be a problem to be resolved upstream by the CasADi developers and patching it on our end would not be sustainable in the long term

@jsbrittain
Copy link
Contributor

@agriyakhetarpal I'm not confident that I know where the fault is. There does seem to be an inconsistency in naming with casadi on Windows. When downloading and building using cmake the output files are casadi*.dll, whereas on Mac they are libcasadi*.dylib; however, when pip installing the library files are libcasadi*.dll. Even then, if I build casadi manually and redirect the CASADIPATH to the cmake build, I still receive a segmentation fault when running the MWE on Windows, so patch renaming the files wouldn't even fix the problem.

@agriyakhetarpal
Copy link
Member

agriyakhetarpal commented Feb 5, 2024

Makes sense, @jsbrittain. Should we post this to the new #casadi channel on the PyBaMM Slack so that the CasADi developers can be made aware of these build issues and therefore suggest potential fixes? If this is is something that's broken on their end, then they could fix it and we could potentially have a simpler build process in the future.

P.S. My comment about removing the lib-prefixes was about exploring how to do so at build-time by possibly configuring CMake not to add those prefixes, not at post-build by renaming the files

@agriyakhetarpal
Copy link
Member

I wanted to post an update on this: based on the logs for pybop-team/PyBOP#176 where the IDAKLU solver is getting added to PyBOP, this issue is not just applicable for Windows wheels but for Linux and macOS wheels too, and I think I have some progress.

I still haven't been able to ascertain the cause of the issue, but it has something to do where the generate_function code (which is where the Python component of the IDAKLU solver starts to go through the generated CasADi code to pass it to the C++ component via bindings) looks for the CasADi symbols. I feel that the issue is coming from these lines, that I had added in #3569:

PyBaMM/CMakeLists.txt

Lines 89 to 93 in 4597143

set_target_properties(
idaklu PROPERTIES
INSTALL_RPATH "${CASADI_DIR}"
INSTALL_RPATH_USE_LINK_PATH TRUE
)

i.e., an incorrect configuration of the runtime search path for the compiled CasADi files (the RPATH could be retained, but INSTALL_RPATH_USE_LINK_PATH should have been set to FALSE instead).

I can try to revisit this issue later this week and rebuild the wheels without this configuration, but I have a strong hunch that reverting the suitable changes in #3569 (just this file, not the SuiteSparse and SUNDIALS fixes for their RPATHs) should fix the issue. This may not resolve the Windows error, however. (Does RPATH have a significance on Windows? I am unaware).

Another tidbit that might be unrelated, but, on a Windows machine I did try to copy all the .dll files from the CasADi site-packages directory folder into where the IDAKLU .pyd file exists inside the wheel (i.e., pybamm/solvers/idaklu/ and therefore the . path where CasADi searches for them). In this case, it successfully finds the casadi_interpolant_linear.dll file and the related files that it needs for its runtime dependencies, but the program exits with a segmentation fault similar to what @jsbrittain had received on a different attempt, though without no warnings or messages to stdout, just a blank line followed by termination.

Right now, a workaround is to build PyBaMM from source, where this issue does not seem to occur. This is appropriate for Linux and macOS, but for Windows it's a bit tricky and the reference is in the publish_pypi.yml workflow where certain environment variables are provided as flags that are passed to setup.py (vcpkg and VS2022 are required too). Building from source on Windows may not fix it either as noted above, since we do not yet have a mechanism to repair the current wheels for that platform.

I think the lesson from this is to advocate for better testing for the wheels after they have been built. We don't bundle the test-related configurations and files inside the wheel (see #3617 and the publicly-announced GSoC project for it this year), but to get around that we should download these wheels as artifacts and test all of them manually in a job that installs them and runs all of the files under tests/ and examples/ inside a virtual environment.

@arjxn-py arjxn-py self-assigned this Feb 21, 2024
@arjxn-py arjxn-py linked a pull request Feb 21, 2024 that will close this issue
8 tasks
@arjxn-py
Copy link
Member

I also tried setting valid environment variables but still receive the same error on linux, can this issue be related casadi#1932

@agriyakhetarpal
Copy link
Member

agriyakhetarpal commented Feb 25, 2024

@arjxn-py and I have some progress on this; nothing concrete and I think we are nowhere close to resolution yet. What we think with a now-updated perspective is that the changes in #3569 were, in fact, not what ended up causing the mess, since we looked at all the relevant compiled files one-by-one for various wheels and inspected their runtime search paths, all of them look fine. I have compiled new wheels from a branch on my fork in this run, here: https://github.com/agriyakhetarpal/PyBaMM/actions/runs/8032835499. This branch reverts certain changes from #3569, including those made to the CMakeLists.txt file and adds back the script that was earlier used to rewrite the Python-CasADi runtime search paths. It also adds delvewheel to patch the Windows wheels – something we weren't doing earlier. It should be ready for a PR soon if we can get to the bottom of what the problem is and how it can be resolved.

Note that there is a problem with the import pybamm statement currently because of the dependencies typing-extensions and sympy being treated as required dependencies while not aptly declared to be so (see #3836) which should be resolved soon. With the wheels from the workflow run above, the problem is partially fixed – the wheels for Linux and macOS are now able to run the script, but only if CASADIPATH is set to point to the CasADi path in site-packages. There is no respite for the Windows build yet, unfortunately...

Previously (with the current PyPI wheels, that is), the script does not work even if CASADIPATH, PATH, LD_LIBRARY_PATH, etc. are set to the CasADi installation, so at least we are able to restore the required functionality with the above wheels on Linux and macOS.

@chmabaur, you had mentioned previously in that you received the same error for all 24.1 pre-releases and the final release in #3783 (comment) – additionally, could you please test whether the PyPI wheels for PyBaMM v23.5 and v23.9 are able to run the above script and confirm? Also, does the above set of wheels from the link to the workflow run, with CASADIPATH set work?

I am unaware whether this is an issue coming from an incorrect repair for the wheels – neither does delocate seem to have any major functionality changes in its recent releases, nor does auditwheel.

@jsbrittain, what I think is happening here is that when we link CasADi to the IDAKLU target during the compilation of the extension module, it just tries to find libcasadi.so.3.7 (or libcasadi.3.7.dylib on macOS) and not any of the other shared libraries that the IDAKLU solver seems to depend on, such as ilbcasadi_interpolant_linear.so in this case. Is there a way to "convince" CMake to link the IDAKLU target against the rest of these files, i.e., those present in the site-packages/casadi folder? We shouldn't need things like libalpaqa.1.0.0.dylib, or the MUMPS or IPOPT files because we are not interfacing those solvers, but maybe libcasadi_rootfinder_newton.3.7.dylib, libcasadi_sundials_common.3.7.dylib, etc. (most of the files matching the libcasadi-* pattern, perhaps) should be declared as dependencies for IDAKLU in CMakeLists.txt and then copied into the PyBaMM wheel. Would greatly appreciate your help whenever you have time to look into this!

Also, cc @Saransh-cpp: since this issue had revealed that the PyBaMM wheel is broken for v24.1 for users of all platforms and dependents (see pybop-team/PyBOP#176), we had decided in the developer meeting on 19/02/2024 that once the issue gets fixed, a 24.1post1 patch release should be made; and published to PyPI as soon as possible – and in the longer term (maybe even before 24.5), we should start looking towards properly testing the wheel ahead of releasing it to users.

@agriyakhetarpal
Copy link
Member

Update: @arjxn-py and I proceeded to do some more debugging, and we wanted to post the results – which are a bit more concerning.

We tested the above reproducer script with multiple versions of PyBaMM and across architectures (Unix and Windows). We notice that through pybop-team/PyBOP#176 the installation of PyBaMM from source works on Linux and macOS, in the same way how it does for our CI pipelines as well – however, we have not compiled the IDAKLU solver on Windows on PR tests owing to the amount of time it takes (which was estimated to be around ~20-22 minutes because of the slow installation of SuiteSparse + METIS) – and this lack of testing might have been the reason we have never caught this bug before.

On fresh and separate conda environments, we proceeded to install PyBaMM from PyPI wheels, both on Linux (WSL) and Windows. We installed PyBaMM v23.3, v23.4.1, v23.5, v23.9, and v24.1 –

  1. On Windows, we had faced an issue with the CasADi serialisation/deserialisation ([Bug]: IDAKLU / casadi serialization error - Object written in version 2 but can only read in version 1 #3100 and Idaklu error #3193, fixed by Update to casadi 3.6.4 for Windows wheels #3637 for v24.1 and above) coming from a lack of updates to the CasADi vcpkg registry (see Revisit linkage with Python casadi for Windows wheels and move away from pybamm-team/casadi-vcpkg-registry #3603), also noted by @jsbrittain in [Bug]: casadi_interpolant_linear.dll not found on Windows with IDAKLUSolver, fails when simulating a drive cycle using Interpolant #3783 (comment). Therefore, we installed CasADi 3.5.5 temporarily after installing PyBaMM to resolve the error. The error was because the IDAKLU solver on Windows was earlier not linked with CasADi versions later than 3.6.0. However, doing that reveals the same error as above:
Error trace
PluginInterface::load_plugin: Cannot load shared library 'casadi_interpolant_linear.dll': 
   (
    Searched directories: 1. casadipath from GlobalOptions
                          2. CASADIPATH env var
                          3. PATH env var (Windows)
                          4. LD_LIBRARY_PATH env var (Linux)
                          5. DYLD_LIBRARY_PATH env var (osx)
    A library may be 'not found' even if the file exists:
          * library is not compatible (different compiler/bitness)
          * the dependencies are not found
   )
  Tried '' :
    Error code (WIN32): 126
  Tried '.' :
    Error code (WIN32): 126

We tested and this was the case with v23.3, v23.4.1, v23.5, and v23.9 – which suggests that this bug was occuring silently with pybamm.Interpolant all this while, and my thoughts on #3783 (comment) are no longer valid – this bug did not come from #3569. We did not test versions earlier than v23.3, but it is highly likely that the bug remains there as well.

  1. On Linux and macOS, the serialisation error does not occur because we have been linking to the Python CasADi as usual at build time, which remains of the same version as a run-time dependency. PyBaMM when installed from source works correctly, however, when the compiled components of it get packaged into a wheel and are subsequently repaired, the dependencies of the IDAKLU shared object / dynamic library, that are libcasadi.so.3.7/libcasadi.3.7.dylib no longer look at the CasADi library where it is installed inside a Python virtual environment or site-packages/casadi directory, but only inside a .dylibs hidden folder in the root of the directory, which is why the wheels fail to find libcasadi_interpolant_linear.dylib (or .so).

This suggests that the IDAKLU solver has remained to be broken for possibly several released PyBaMM versions over the years across all platforms. I wonder how much of the test suite passes if we do compile the IDAKLU solver on Windows outside of building the wheels... (we don't currently do it for the scheduled tests either where CI completion time is not a concern, see #3664).

I had posted some comments previously in about a workaround:

With the wheels from the workflow run above, the problem is partially fixed – the wheels for Linux and macOS are now able to run the script, but only if CASADIPATH is set to point to the CasADi path in site-packages. There is no respite for the Windows build yet, unfortunately...

Previously (with the current PyPI wheels, that is), the script does not work even if CASADIPATH, PATH, LD_LIBRARY_PATH, etc. are set to the CasADi installation, so at least we are able to restore the required functionality with the above wheels on Linux and macOS.

However, these are not exactly workarounds and Windows wheels have just a single DLL (.pyd) file, where symbols are not visible. @arjxn-py and I had a hypothesis about how something in our wheel building process broke between v23.9 and v24.1, but this experiment has revealed that this wasn't the case.

I'm not sure whether adding CASADIPATH as an environment variable inside idaklu_solver.py would be a good workaround – it is a bit of a hack as it is, and it doesn't work on Windows, it does just on Unix-like platforms. Another possibility would be to copy the missing files like casadi_interpolant_linear.dylib into the wheel and then fix their RPATHs, but even that would not work on Windows because of the lib- prefix that is currently being searched for.

would be great to receive help here in order to fix the issue, and I hope that these additional insights are useful – tagging @martinjrobins, @jsbrittain, and @kratman! Perhaps this should lead us to test the built wheels at a priority during the pytest migration project this year rather than as a stretch goal.

@agriyakhetarpal
Copy link
Member

agriyakhetarpal commented Mar 26, 2024

I triggered this workflow yesterday to run both the integration and the unit tests (please see the PR on my fork above) and these were the logs:
https://github.com/agriyakhetarpal/PyBaMM/actions/runs/8414348869

and surprisingly all of the IDAKLU solver tests on all platforms pass when testing the built wheel, but those that use the IDAKLU solver such as our dependents, say, PyBOP: pybop-team/PyBOP#176 are facing quite a lot of failures. So, perhaps pybamm.Interpolant with the IDAKLU solver is something that we have not been testing in our unit or integration tests before and that it isn't something that a lot of users do? This code snippet in question is for a drive cycle experiment.

P.S. some of the failures in the tests are currently expected and can be ignored safely, because we have historically not added the testing data or the tests in general into the wheels, but this scenario can change during the the pytest migration project this year.

@martinjrobins
Copy link
Contributor

I've had a quick look at this. At first I thought it was a bit strange that the windows wheel was even looking for a dll, as far as I can tell we compile casadi statically on windows for the idaklu solver, so the wheel should be self-contained and no need to bring in any dlls. However, I see that casadi uses a plugin system and loads up certain dlls at runtime, including the interpolant one :( This is a pain because its entirely incompatible with our build process for the windows wheel.

At the moment I'm not even sure if casadi supports static compilation of their plugins..... but will keep looking

@martinjrobins
Copy link
Contributor

martinjrobins commented Apr 23, 2024

The solution here might be to avoid the use of the casadi plugin system in pybamm. At the moment we are using it since we use the casadi.Interpolant class (

)

Instead we could try and use the explicit function on the casadi::MX class: https://web.casadi.org/python-api/#casadi.casadi.MX.interpn_linear

This might complain at build time that we don't link in the interpolant lib, if so we might need to add this to our CMakeLists

@martinjrobins
Copy link
Contributor

I'm not sure how to use interpn_linear :) so I've asked on the casadi repo: casadi/casadi#3668

@agriyakhetarpal
Copy link
Member

agriyakhetarpal commented Apr 23, 2024

Thanks, @martinjrobins! I would like to note that there are ways to repair and make the wheel self-contained for Windows too, with the delvewheel tool which performs DLL name mangling and fixes wheels just like auditwheel and delocate do for Unix-like platforms.

I did a very quick experiment by trying to repair the CasADi Windows wheel with its own shared libraries with repairwheel and delvewheel (the prior is a cross-platform alternative that works on all platforms):

pipx run repairwheel casadi-3.6.5-cp312-none-win_amd64.whl --output-dir repaired_wheel/ --lib-dir casadi-3.6.5-cp312-none-win_amd64/casadi/

to get

repairing /Users/agriyakhetarpal/Downloads/casadi-3.6.5-cp312-none-win_amd64.whl
finding DLL dependencies
copying DLLs into casadi.libs
mangling DLL names
patching casadi/__init__.py
repackaging wheel
fixed wheel written to /var/folders/b3/2bq1m1_50bs4c7305j8vxcqr0000gn/T/repairwheelsg4yi5qy/casadi-3.6.5-cp312-none-win_amd64.whl
Wrote /Users/agriyakhetarpal/Downloads/repaired_wheel/casadi-3.6.5-cp312-none-win_amd64.whl

which leads me to observe that because of this plugin system, CasADi wheels themselves are not properly repaired (on Windows – on Unix platforms it is) before being published to PyPI, and that reveals another source for the problem. One solution will be to run wheel unpack to extract the CasADi wheel, copy just the needed DLLs (those for interpolant, newton, etc.) so that the size of PyBaMM's wheels doesn't get too large, and then pack PyBaMM with those wheels (because if we do it ourselves, then CMake tends to look for casadi_*.dll, rather than libcasadi_*.dll – as highlighted in #3603). Either way, this isn't the best approach for the Windows wheel, but can serve as a temporary workaround for those who are still trying to use the IDAKLU solver on Windows.

@martinjrobins
Copy link
Contributor

martinjrobins commented Apr 23, 2024

This might work, but you'll have to be careful that the version of casadi that we compile for the idaklu solver is the same version as the wheel! Also casadi cross-compile their wheels using mingw, whereas we nativly compile ours with visual studio, so there might be incompatibilities there

@agriyakhetarpal
Copy link
Member

agriyakhetarpal commented Apr 23, 2024

Yes, indeed, compiling the IDAKLU solver (and its prerequisites of course) with something like MinGW or a full-fledged toolchain like MSYS2 is also something I have been thinking about doing, but not sure how much work that would require for porting from MSVC. I know that both SUNDIALS and SuiteSparse (upstream one, rather than the METIS-CMake-Windows one) can use MinGW-w64 (their CI does it) and we can get a BLAS vendor (mostly OpenBLAS) from pkgconfiglite or via NuGet, so it could be mostly about making our CMakeLists.txt point to the correct locations of these libs, but other patching could also be required for it to work.

As for keeping CasADi used to compile IDAKLU with in sync the same version, yes, we'll have to be more careful there at the time of every release, given that problems like #3100 came directly from there that were previously discussed 😬

@agriyakhetarpal agriyakhetarpal added the release blocker Issues that need to be addressed before the creation of a release label Apr 25, 2024
@agriyakhetarpal agriyakhetarpal mentioned this issue May 4, 2024
8 tasks
martinjrobins added a commit that referenced this issue May 10, 2024
kratman pushed a commit that referenced this issue May 10, 2024
…4077)

* bug: use casadi MX.interpn_linear function instead of plugin #3783

* bug: fix for 2d and 3d linear interpolant #3783

* cover cubic interpolation in 2d #3783

* #3783 add to changelog

---------

Co-authored-by: Agriya Khetarpal <74401230+agriyakhetarpal@users.noreply.github.com>
js1tr3 pushed a commit to js1tr3/PyBaMM that referenced this issue Aug 12, 2024
…eam#3783 (pybamm-team#4077)

* bug: use casadi MX.interpn_linear function instead of plugin pybamm-team#3783

* bug: fix for 2d and 3d linear interpolant pybamm-team#3783

* cover cubic interpolation in 2d pybamm-team#3783

* pybamm-team#3783 add to changelog

---------

Co-authored-by: Agriya Khetarpal <74401230+agriyakhetarpal@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working priority: high To be resolved as soon as possible release blocker Issues that need to be addressed before the creation of a release
Projects
None yet
5 participants