-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restrict PyTorch version not to be more advanced than that used in Elasticsearch #479
Conversation
It sounds like the
But it sounds like actually the requirement is that your Eland distribution uses the same PyTorch version as the ML C++ code, and to avoid having some really complicated compatibility matrix the easiest way to express that is to say Eland and Elasticsearch must be of the same minor version (any patch level within that minor). |
The CI failure was due to a change in the Python client after elastic/elasticsearch-py@0e16cd6 As the the Given that upgrading PyTorch has already enforced this requirement and may happen again on a later upgrade I've updated the README to state that if using the PyTorch features the minor versions of Elasticsearch and Eland must be the same. Update |
I pushed c97d641 which bumps the required Elasticsearch Python client to version 8.3. Eland does not make any claims for compatibility with older versions of the Python client and the latest is always preferred |
This can be fixed using the REST compatibility layer to call the old deprecated endpoint but it would still leave the problem where the current version of PyTorch is not compatible with the latest Elasticsearch release and cannot be upgraded without breaking BWC at some point. Returning to the comment above:
We want to avoid the situation where some parts of Eland work with earlier versions of Elasticsearch and others do not. Tightly coupling the compatible versions of Eland, Elasticsearch and the Elasticsearch Python client at a minor level will guarantee compatibility. However, this is a change to the current policy and I'm happy to discuss it. My proposal is this:
|
I agree with 2 and 3. On reflection I think my suggestion to require the minor version matches was probably too strict. With Kibana and Elasticsearch the rule is that you have to upgrade Elasticsearch first and Kibana afterwards. This means that you can upgrade with no downtime, as the older version of Kibana continues to work with the newer version of Elasticsearch in the mixed version state during the upgrade. I think we should specify a similar constraint for Eland: you have to upgrade Elasticsearch before Eland and an older Eland should work with a newer Elasticsearch during the upgrade period. With trained models it sounds like we've got an additional hard break at the 8.3 boundary where we've changed the infer API. This is OK while we're in technical preview but going forwards we should aim to stick to the rule that older Eland works with newer Elasticsearch within the same major version. |
On reflection the allowing Eland to work with different versions of the same major Elasticsearch cluster will be required. A developer who is using version 8.2 of Eland where the In 8.3 Therefore Eland 8.3 must be compatible with 8.0, 8.1, and 8.2. which can be achieved by using the deprecated Given that, points 2 & 3 still stand:
Everyone using the |
# Conflicts: # eland/ml/pytorch/_pytorch_model.py
Is it the case that if you trace a model using PyTorch 1.11 then PyTorch 1.9 cannot understand it? If that is the case then it's going to be very hard to make Eland compatible with older stack versions because it will have to dynamically get the appropriate version of PyTorch after finding out what the server version is. |
Anecdotally we know that models traced in PyTorch 1.11 can run in libtorch 1.9 as that is what many people have been doing. But it's worth verifying so I spun up a cloud cluster of Elasticsearch version 8.0.1 and uploaded a NER model using the changes in this PR (with PyTorch pinned to 1.11). Evaluating the model did work 😁 so PyTorch 1.11 is backward compatible with 1.9, the breaking change is in 1.12. |
But then we're setting ourselves up for a problem the next time we upgrade PyTorch if we say Eland 8.3 and above is going to be compatible with all Elasticsearch 8.x versions. We could say Eland 8.3 is compatible with all Elasticsearch 8.x versions and then never release another version of Eland. Or we could never upgrade the version of PyTorch used within Eland again in 8.x, so that by 8.last Elasticsearch might be on PyTorch 1.17 but Eland would still be on PyTorch 1.11. But neither of those options sounds ideal. We could be forced into doing a quick upgrade by a major security bug for example. This is why I think it would be better to define a compatibility policy for Eland and Elasticsearch that accounts for the possibility of breaking changes in PyTorch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PyTorch recently released version 1.12 which contains breaking changes to torchscript models making them incompatible with earlier versions.
The Elastic stack uses version 1.11 of the PyTorch C++ library (libtorch), a model traced to torchscript format using a v1.12 Python PyTorch install cannot be evaluated in libtorch v1.11.
Anyone using Eland to import a model into Elastic today would pull the latest version of PyTorch and find the model cannot be evaluated in Elasticsearch. Opening the model errors with the message:
Attempted to read a PyTorch file with version 10, but the maximum supported version for reading is 9. Your PyTorch installation may be too old.
This change to the requirements prevents Eland using v1.12 of PyTorch.
HuggingFace transformers has also pinned the torch requirement to
<1.12
in huggingface/transformers@5a3d0cbThe
sentence-transformers
andtransformers
versions have also been tighten to a range known to be compatible with PyTorch 1.11.