Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HuggingFace model multilingual-e5-small fails to open on Windows due to max path limitations #3048

Closed
david-sitsky opened this issue Mar 28, 2024 · 4 comments · Fixed by #3053
Labels
bug Something isn't working

Comments

@david-sitsky
Copy link
Contributor

Description

While I work on Linux, I have to write software that also works on Windows. The multilingual-e5-small model has a large number of properties, which are the number of languages it supports. You can see this when deploying this model on Linux:

DEBUG DefaultModelZoo Scanning models in repo: class ai.djl.repository.RemoteRepository, djl://ai.djl.huggingface.pytorch/intfloat/multilingual-e5-small?translatorFactory=ai.djl.translate.NoopServingTranslatorFactory
INFO  ModelInfo Loading model multilingual_e5_small on cpu()
DEBUG ModelZoo Loading model with Criteria:
	Application: UNDEFINED
	Input: class ai.djl.modality.Input
	Output: class ai.djl.modality.Output
	Engine: PyTorch
	ModelZoo: ai.djl.localmodelzoo
	Arguments: {"padding":"true","engine":"PyTorch","translatorFactory":"ai.djl.huggingface.translator.TextEmbeddingTranslatorFactory"}
	Options: {"modelName":"multilingual-e5-small","mapLocation":"true"}
	No translator supplied

DEBUG ModelZoo Searching model in specified model zoo: ai.djl.localmodelzoo
DEBUG ModelZoo Checking ModelLoader: ai.djl.huggingface.pytorch:intfloat/multilingual-e5-small NLP.TEXT_EMBEDDING [
	ai.djl.huggingface.pytorch/intfloat/multilingual-e5-small/0.0.1/multilingual-e5-small {"multilingual":"true","af":"true","am":"true","ar":"true","as":"true","az":"true","be":"true","bg":"true","bn":"true","br":"true","bs":"true","ca":"true","cs":"true","cy":"true","da":"true","de":"true","el":"true","en":"true","eo":"true","es":"true","et":"true","eu":"true","fa":"true","fi":"true","fr":"true","fy":"true","ga":"true","gd":"true","gl":"true","gu":"true","ha":"true","he":"true","hi":"true","hr":"true","hu":"true","hy":"true","id":"true","is":"true","it":"true","ja":"true","jv":"true","ka":"true","kk":"true","km":"true","kn":"true","ko":"true","ku":"true","ky":"true","la":"true","lo":"true","lt":"true","lv":"true","mg":"true","mk":"true","ml":"true","mn":"true","mr":"true","ms":"true","my":"true","ne":"true","nl":"true","no":"true","om":"true","or":"true","pa":"true","pl":"true","ps":"true","pt":"true","ro":"true","ru":"true","sa":"true","sd":"true","si":"true","sk":"true","sl":"true","so":"true","sq":"true","sr":"true","su":"true","sv":"true","sw":"true","ta":"true","te":"true","th":"true","tl":"true","tr":"true","ug":"true","uk":"true","ur":"true","uz":"true","vi":"true","xh":"true","yi":"true","zh":"true"}
]
DEBUG MRL Preparing artifact: djl://ai.djl.huggingface.pytorch/intfloat/multilingual-e5-small?translatorFactory=ai.djl.translate.NoopServingTranslatorFactory, ai.djl.huggingface.pytorch/intfloat/multilingual-e5-small/0.0.1/multilingual-e5-small {"multilingual":"true","af":"true","am":"true","ar":"true","as":"true","az":"true","be":"true","bg":"true","bn":"true","br":"true","bs":"true","ca":"true","cs":"true","cy":"true","da":"true","de":"true","el":"true","en":"true","eo":"true","es":"true","et":"true","eu":"true","fa":"true","fi":"true","fr":"true","fy":"true","ga":"true","gd":"true","gl":"true","gu":"true","ha":"true","he":"true","hi":"true","hr":"true","hu":"true","hy":"true","id":"true","is":"true","it":"true","ja":"true","jv":"true","ka":"true","kk":"true","km":"true","kn":"true","ko":"true","ku":"true","ky":"true","la":"true","lo":"true","lt":"true","lv":"true","mg":"true","mk":"true","ml":"true","mn":"true","mr":"true","ms":"true","my":"true","ne":"true","nl":"true","no":"true","om":"true","or":"true","pa":"true","pl":"true","ps":"true","pt":"true","ro":"true","ru":"true","sa":"true","sd":"true","si":"true","sk":"true","sl":"true","so":"true","sq":"true","sr":"true","su":"true","sv":"true","sw":"true","ta":"true","te":"true","th":"true","tl":"true","tr":"true","ug":"true","uk":"true","ur":"true","uz":"true","vi":"true","xh":"true","yi":"true","zh":"true"}
DEBUG AbstractRepository Files have been downloaded already: /data/djl-serving/cache/cache/repo/model/nlp/text_embedding/ai/djl/huggingface/pytorch/intfloat/multilingual-e5-small/true/true/true/true/true/true/true/true/true/true/true/true/true/true/true/true/true/true/true/true/true/true/true/true/true/true/true/true/true/true/true/true/true/true/true/true/true/true/true/true/true/true/true/true/true/true/true/true/true/true/true/true/true/true/true/true/true/true/true/true/true/true/true/true/true/true/true/true/true/true/true/true/true/true/true/true/true/true/true/true/true/true/true/true/true/true/true/true/true/true/true/true/true/true/0.0.1

Expected Behavior

That the model can be loaded on Windows.

Error Message

ai.djl.engine.EngineException: open file failed because of errno 2 on fopen: No such file or directory, file path: C:\Users\sits\.djl.ai\cache\repo\model\nlp\text_embedding\ai\djl\huggingface\pytorch\intfloat\multilingual-e5-small\true\true\true\true\true\true\true\true\true\true\true\true\true\true\true\true\true\true\true\true\true\true\true\true\true\true\true\true\true\true\true\true\true\true\true\true\true\true\true\true\true\true\true\true\true\true\true\true\true\true\true\true\true\true\true\true\true\true\true\true\true\true\true\true\true\true\true\true\true\true\true\true\true\true\true\true\true\true\true\true\true\true\true\true\true\true\true\true\true\true\true\true\true\true\0.0.1\multilingual-e5-small.pt
	at ai.djl.pytorch.jni.PyTorchLibrary.moduleLoad(Native Method) ~[pytorch-engine-0.26.0.jar:?]
	at ai.djl.pytorch.jni.JniUtils.loadModule(JniUtils.java:1742) ~[pytorch-engine-0.26.0.jar:?]
	at ai.djl.pytorch.engine.PtModel.load(PtModel.java:93) ~[pytorch-engine-0.26.0.jar:?]
	at ai.djl.repository.zoo.BaseModelLoader.loadModel(BaseModelLoader.java:166) ~[api-0.26.0.jar:?]
	at ai.djl.repository.zoo.Criteria.loadModel(Criteria.java:172) ~[api-0.26.0.jar:?]

How to Reproduce?

Open the above model on a Windows machine.

Thoughts

Why are model properties being represented as sub-directories? This seems to be an expensive way to do so, when a properties file would take less filesystem resources? Also various other filesystems have limits which are likely to be hit by this way of representing things.

Can this be easily changed, or is this a more involved change? I'm happy to have a look if some guidance can be provided.

@david-sitsky david-sitsky added the bug Something isn't working label Mar 28, 2024
@zachgk
Copy link
Contributor

zachgk commented Mar 28, 2024

For a workaround, you can modify the metadata.json and remove the properties. You may also need to change file references from relative to absolute. Loading this would avoid the long path problems.

In terms of the larger usage of properties inside model paths, the main reason we do so is to differentiate different artifacts within the same metadata. Properties make it work with a fairly clean directory tree for most cases. This also only applies for the local cache of downloaded models and datasets. The code for it is in Artifact.getResourceUri() if you are interested.

For a complete solution, we would need to replace it with a new system. The obvious one is to use the artifact name instead of the properties. This requires that all artifacts have names and that they are unique within a metadata.json file. I am not sure off the top of my head that this is true, so we would have to verify. Assuming that is fine, it should be a fairly easy change. @frankfliu what do you think?

@frankfliu
Copy link
Contributor

the properties are generated by model importing tool, we need to update the tool to combine languages into a single property.

We can manually change the metadata.json for now.

@frankfliu
Copy link
Contributor

I did a scan of existing model zoo, all models (except mxnet yolo) artifact name + version are unique in metadata.json.

We actually don't need to use properties as file path to avoid file path clash.

@david-sitsky
Copy link
Contributor Author

@frankfliu - many thanks for fixing this. I can confirm on Windows this now works as expected using 0.27.0-SNAPSHOT.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants