Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to install underthesea on Alpine Docker image #705

Closed
qhungbui7 opened this issue Aug 16, 2023 · 3 comments
Closed

How to install underthesea on Alpine Docker image #705

qhungbui7 opened this issue Aug 16, 2023 · 3 comments

Comments

@qhungbui7
Copy link

qhungbui7 commented Aug 16, 2023

I want to install underthesea in a Alpine docker image. However, it only installed a very old version of underthesea and it seems the problem comes from the new underthesea_core can't be found in Pip due to some reasons (I have updated, upgraded the apk manager and installed the newest version of pip)

underthesea==1.2.3
├── click [required: >=6.0, installed: 8.1.6]
├── joblib [required: Any, installed: 1.3.2]
├── nltk [required: Any, installed: 3.8.1]
│   ├── click [required: Any, installed: 8.1.6]
│   ├── joblib [required: Any, installed: 1.3.2]
│   ├── regex [required: >=2021.8.3, installed: 2023.8.8]
│   └── tqdm [required: Any, installed: 4.66.1]
├── python-crfsuite [required: >=0.9.6, installed: 0.9.9]
├── PyYAML [required: Any, installed: 6.0.1]
├── requests [required: Any, installed: 2.31.0]
│   ├── certifi [required: >=2017.4.17, installed: 2023.7.22]
│   ├── charset-normalizer [required: >=2,<4, installed: 3.2.0]
│   ├── idna [required: >=2.5,<4, installed: 3.4]
│   └── urllib3 [required: >=1.21.1,<3, installed: 2.0.4]
├── scikit-learn [required: >=0.20,<0.22, installed: 0.21.3]
│   ├── joblib [required: >=0.11, installed: 1.3.2]
│   ├── numpy [required: >=1.11.0, installed: 1.25.2]
│   └── scipy [required: >=0.17.0, installed: 1.11.1]
│       └── numpy [required: >=1.21.6,<1.28.0, installed: 1.25.2]
├── seqeval [required: Any, installed: 1.2.2]
│   ├── numpy [required: >=1.14.0, installed: 1.25.2]
│   └── scikit-learn [required: >=0.21.3, installed: 0.21.3]
│       ├── joblib [required: >=0.11, installed: 1.3.2]
│       ├── numpy [required: >=1.11.0, installed: 1.25.2]
│       └── scipy [required: >=0.17.0, installed: 1.11.1]
│           └── numpy [required: >=1.21.6,<1.28.0, installed: 1.25.2]
├── tabulate [required: Any, installed: 0.9.0]
├── tqdm [required: Any, installed: 4.66.1]
└── Unidecode [required: Any, installed: 1.3.6]

I tried to manually install underthesea_core==1.0.4 directly and it said that

Step 17/30 : RUN pip install underthesea_core==1.0.4
 ---> Running in 67ea560d42db
ERROR: Could not find a version that satisfies the requirement underthesea_core==1.0.4 (from versions: 0.0.1, 0.0.2, 0.0.3, 0.0.4a0, 0.0.4a1, 0.0.4a2, 0.0.4a3, 0.0.4a4, 0.0.4a5, 0.0.4a6, 0.0.4a8)
ERROR: No matching distribution found for underthesea_core==1.0.4

I suspect that the difference between Alpine architecture and Debian architecture makes the installation unsuccessful - it can also be related to the fact that underthesea_core was written by Rust. Can you guys give me details about the system requirements for installing underthesea /underthesea_core ? Thank you!

@rain1024
Copy link
Contributor

@qhungbui7

Can you provide details about the system requirements for installing underthesea/underthesea_core?

Generally speaking, underthesea is compatible with Windows, Mac, and Linux operating systems, including Ubuntu and CentOS. I've personally tested it on my Ubuntu 20.04 workstation and my Mac M2. However, its compatibility across all versions hasn't been fully verified, and I've yet to test it on Alpine.

I suspect that the difference between Alpine architecture and Debian architecture makes the installation unsuccessful - it can also be related to the fact that underthesea_core was written by Rust.
You've raised a valid point. I'll delve deeper into this issue to understand it better.

Thanks for bringing this to our attention.

@rain1024
Copy link
Contributor

rain1024 commented Aug 18, 2023

Update 2023-08-18

Today, I built an Alpine image using a Dockerfile with the configurations: alpine 3.16 and python 3.10.

FROM alpine:3.16
RUN apk add --no-cache python3 py3-pip
RUN apk add bash

Following your earlier suggestions, I attempted a manual installation:

$ pip install underthesea-core==1.0.4
ERROR: Could not find a version that satisfies the requirement underthesea-core==1.0.4 (from versions: 0.0.1, 0.0.2, 0.0.3, 0.0.4a0, 0.0.4a1, 0.0.4a2, 0.0.4a3, 0.0.4a4, 0.0.4a5, 0.0.4a6, 0.0.4a8)
ERROR: No matching distribution found for underthesea-core==1.0.4

You were right; the issue arises because underthesea-core doesn't support this specific environment.

After a quick lookup, I found a relevant issue: PyO3/pyo3#599. I utilize pyo3 for crafting Python bindings for Rust, so this is certainly connected.

I'll dive deeper into this matter at a later time.

@rain1024
Copy link
Contributor

rain1024 commented Aug 19, 2023

Hey @qhungbui7,

After experimenting, I crafted a Dockerfile to generate the underthesea image. This image has been published to the GitHub registry.

Give it a spin:

$ docker run -it ghcr.io/undertheseanlp/underthesea/underthesea:0.1.0 bash

Inside the container:
$ python3
>>> from underthesea import word_tokenize
>>> word_tokenize("thế hệ trẻ là tương lai của đất nước")
['thế hệ', 'trẻ', 'là', 'tương lai', 'của', 'đất nước']

A brief overview of the Docker image creation:

  • I needed to build underthesea_core from its source. To achieve this, I set up a Rust build environment using an Alpine base.
  • underthesea relies on scikit-learn. Since it requires prebuilding, I installed it via the Alpine package manager (apk add py3-scikit-learn).
  • Those were the only major challenges. Once you're past them, you can effortlessly install underthesea using pip.

I'm eager to hear how it works for you. Your feedback would be invaluable!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants