Restrict PyTorch version not to be more advanced than that used in Elasticsearch #479

davidkyle · 2022-06-30T15:06:42Z

PyTorch recently released version 1.12 which contains breaking changes to torchscript models making them incompatible with earlier versions.

The Elastic stack uses version 1.11 of the PyTorch C++ library (libtorch), a model traced to torchscript format using a v1.12 Python PyTorch install cannot be evaluated in libtorch v1.11.

Anyone using Eland to import a model into Elastic today would pull the latest version of PyTorch and find the model cannot be evaluated in Elasticsearch. Opening the model errors with the message:

Attempted to read a PyTorch file with version 10, but the maximum supported version for reading is 9. Your PyTorch installation may be too old.

This change to the requirements prevents Eland using v1.12 of PyTorch.

HuggingFace transformers has also pinned the torch requirement to <1.12 in huggingface/transformers@5a3d0cb

The sentence-transformers and transformers versions have also been tighten to a range known to be compatible with PyTorch 1.11.

droberts195 · 2022-06-30T15:15:26Z

It sounds like the README needs updating too. Currently it says:

Supports Elasticsearch clusters that are 7.11+, recommended 7.14 or later for all features to work. Make sure your Eland major version matches the major version of your Elasticsearch cluster.

But it sounds like actually the requirement is that your Eland distribution uses the same PyTorch version as the ML C++ code, and to avoid having some really complicated compatibility matrix the easiest way to express that is to say Eland and Elasticsearch must be of the same minor version (any patch level within that minor).

davidkyle · 2022-07-04T08:52:18Z

The CI failure was due to a change in the Python client after _infer API was renamed in the Elasticsearch 8.3 release.

elastic/elasticsearch-py@0e16cd6

As the the _infer API is in tech preview and the old name is supported but deprecated it is ok to rename but it means a 8.3 version of Eland must be used with an 8.3 Python client and Elasticsearch cluster.

Given that upgrading PyTorch has already enforced this requirement and may happen again on a later upgrade I've updated the README to state that if using the PyTorch features the minor versions of Elasticsearch and Eland must be the same.

Update
Actually it is Python client that must be >=v8.3

davidkyle · 2022-07-04T09:11:31Z

I pushed c97d641 which bumps the required Elasticsearch Python client to version 8.3.

Eland does not make any claims for compatibility with older versions of the Python client and the latest is always preferred

davidkyle · 2022-07-04T14:56:45Z

clients-ci is now failing against pre 8.3 nodes (obviously) which do not know about the _infer API change. I'm not sure why the tests pass for Python 3.10 I'm assuming the NLP tests are skipped for 3.10

 'no handler found for uri [/_ml/trained_models/distilbert-base-uncased/_infer?timeout=60s] and method [POST]')

This can be fixed using the REST compatibility layer to call the old deprecated endpoint but it would still leave the problem where the current version of PyTorch is not compatible with the latest Elasticsearch release and cannot be upgraded without breaking BWC at some point.

Returning to the comment above:

avoid having some really complicated compatibility matrix the easiest way to express that is to say Eland and Elasticsearch must be of the same minor version

We want to avoid the situation where some parts of Eland work with earlier versions of Elasticsearch and others do not. Tightly coupling the compatible versions of Eland, Elasticsearch and the Elasticsearch Python client at a minor level will guarantee compatibility. However, this is a change to the current policy and I'm happy to discuss it.

My proposal is this:

Document that both the major and minor versions of Eland should match the Elasticsearch cluster it is running against
Pin the Python client to the same major and minor version of Eland and update with every release
Update extra requirements (torch, etc.) as and when required

droberts195 · 2022-07-05T12:03:56Z

My proposal is this:

Document that both the major and minor versions of Eland should match the Elasticsearch cluster it is running against

Pin the Python client to the same major and minor version of Eland and update with every release

Update extra requirements (torch, etc.) as and when required

I agree with 2 and 3. On reflection I think my suggestion to require the minor version matches was probably too strict.

With Kibana and Elasticsearch the rule is that you have to upgrade Elasticsearch first and Kibana afterwards. This means that you can upgrade with no downtime, as the older version of Kibana continues to work with the newer version of Elasticsearch in the mixed version state during the upgrade.

I think we should specify a similar constraint for Eland: you have to upgrade Elasticsearch before Eland and an older Eland should work with a newer Elasticsearch during the upgrade period.

With trained models it sounds like we've got an additional hard break at the 8.3 boundary where we've changed the infer API. This is OK while we're in technical preview but going forwards we should aim to stick to the rule that older Eland works with newer Elasticsearch within the same major version.

davidkyle · 2022-07-06T09:31:34Z

On reflection the allowing Eland to work with different versions of the same major Elasticsearch cluster will be required.

A developer who is using version 8.2 of Eland where the torch has not been pinned to <1.12 will automatically pull the latest torch package when the create a venv, install the requirements or build the docker image. That version is not compatible with the latest Elasticsearch release.

In 8.3 torch will be pinned to a compatible version so we want everyone to upgrade to 8.3 if they are using eland_import_hub_model. Even if they are running against an older version of Elasticsearch e.g. 8.1.0 we want the developer to upgrade to Eland 8.3.

Therefore Eland 8.3 must be compatible with 8.0, 8.1, and 8.2. which can be achieved by using the deprecated deployment/_infer which is supported throughout 8.

Given that, points 2 & 3 still stand:

Pin the Python client to the same major and minor version of Eland and update with every release

Update extra requirements (torch, etc.) as and when required

Everyone using the eland_import_hub_model script should upgrade to Eland 8.3 and we will make it backwards compatible with earlier 8.x versions and future compatible with later 8.x versions.

# Conflicts: # eland/ml/pytorch/_pytorch_model.py

droberts195 · 2022-07-06T09:46:07Z

we will make it backwards compatible with earlier 8.x versions and future compatible with later 8.x versions

Is it the case that if you trace a model using PyTorch 1.11 then PyTorch 1.9 cannot understand it? If that is the case then it's going to be very hard to make Eland compatible with older stack versions because it will have to dynamically get the appropriate version of PyTorch after finding out what the server version is.

davidkyle · 2022-07-06T10:31:39Z

Is it the case that if you trace a model using PyTorch 1.11 then PyTorch 1.9 cannot understand it?

Anecdotally we know that models traced in PyTorch 1.11 can run in libtorch 1.9 as that is what many people have been doing. But it's worth verifying so I spun up a cloud cluster of Elasticsearch version 8.0.1 and uploaded a NER model using the changes in this PR (with PyTorch pinned to 1.11). Evaluating the model did work 😁 so PyTorch 1.11 is backward compatible with 1.9, the breaking change is in 1.12.

droberts195 · 2022-07-06T11:09:48Z

the breaking change is in 1.12

But then we're setting ourselves up for a problem the next time we upgrade PyTorch if we say Eland 8.3 and above is going to be compatible with all Elasticsearch 8.x versions.

We could say Eland 8.3 is compatible with all Elasticsearch 8.x versions and then never release another version of Eland. Or we could never upgrade the version of PyTorch used within Eland again in 8.x, so that by 8.last Elasticsearch might be on PyTorch 1.17 but Eland would still be on PyTorch 1.11.

But neither of those options sounds ideal. We could be forced into doing a quick upgrade by a major security bug for example.

This is why I think it would be better to define a compatibility policy for Eland and Elasticsearch that accounts for the possibility of breaking changes in PyTorch.

droberts195

LGTM

Prevent using torch 1.12

cf21fec

davidkyle added the topic:NLP Issue or PR about NLP model support and eland_import_hub_model label Jun 30, 2022

lint

e233a96

benwtrent approved these changes Jun 30, 2022

View reviewed changes

davidkyle added 2 commits July 4, 2022 09:29

Update infer API

c7f32e5

Update readme

c5366cf

davidkyle added 2 commits July 4, 2022 09:54

reformat

79db8fe

v8.3 of Python client is required

c97d641

Use deprecated infer API for BWC

d3663aa

# Conflicts: # eland/ml/pytorch/_pytorch_model.py

davidkyle mentioned this pull request Jul 6, 2022

Detect incompatible PyTorch versions by the Elasticsearch #481

Closed

droberts195 approved these changes Jul 7, 2022

View reviewed changes

davidkyle merged commit 0eb36fa into elastic:main Jul 7, 2022

davidkyle deleted the no-future-versions branch July 7, 2022 13:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Restrict PyTorch version not to be more advanced than that used in Elasticsearch #479

Restrict PyTorch version not to be more advanced than that used in Elasticsearch #479

davidkyle commented Jun 30, 2022

droberts195 commented Jun 30, 2022

davidkyle commented Jul 4, 2022 •

edited

Loading

davidkyle commented Jul 4, 2022

davidkyle commented Jul 4, 2022 •

edited

Loading

droberts195 commented Jul 5, 2022

davidkyle commented Jul 6, 2022

droberts195 commented Jul 6, 2022

davidkyle commented Jul 6, 2022

droberts195 commented Jul 6, 2022

droberts195 left a comment

Restrict PyTorch version not to be more advanced than that used in Elasticsearch #479

Restrict PyTorch version not to be more advanced than that used in Elasticsearch #479

Conversation

davidkyle commented Jun 30, 2022

droberts195 commented Jun 30, 2022

davidkyle commented Jul 4, 2022 • edited Loading

davidkyle commented Jul 4, 2022

davidkyle commented Jul 4, 2022 • edited Loading

droberts195 commented Jul 5, 2022

davidkyle commented Jul 6, 2022

droberts195 commented Jul 6, 2022

davidkyle commented Jul 6, 2022

droberts195 commented Jul 6, 2022

droberts195 left a comment

Choose a reason for hiding this comment

davidkyle commented Jul 4, 2022 •

edited

Loading

davidkyle commented Jul 4, 2022 •

edited

Loading