Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(issue#6): Ensured wordnet and stopwords are loaded before used. #7

Merged
merged 1 commit into from
Aug 21, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,10 @@

## STAGE 1 - Core package(s)

FROM ghcr.io/pyo3/maturin:main as maturin
FROM ghcr.io/pyo3/maturin:main AS maturin

RUN mkdir -p /app/build/bonn
WORKDIR /app/build/test_data
# WORKDIR /app/build/test_data
# RUN curl -L -O "...wiki/wiki.en.fifu"
WORKDIR /app/build

Expand All @@ -19,8 +19,8 @@ COPY README.md /app/build

RUN RUSTFLAGS="-L /usr/lib64/atlas -C link-args=-lsatlas -ltatlas -llapack" cargo install finalfusion-utils --features=opq

COPY pyproject.toml /app/build
COPY src /app/build/src
COPY bonn /app/build/bonn
COPY pyproject.toml /app/build
COPY python/bonn /app/build/bonn
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So presumably the build was broken until this change as src and bonn were in different levels?

Copy link
Contributor Author

@KamenDimitrov97 KamenDimitrov97 Aug 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how the release was fine and where this happened, as I was a bit pressed for time on mrn, but when I tried to rebuild it locally bonn wasn't available only _bonn which is why I did this change.
Even after unziping the wheel only package available was _bonn.
Before the directory structure is:

├── _bonn/ # SRC code files
│ ├── _bonn.so file

Now it's :
├── bonn/ # Source code files
│ ├── _bonn.so file
│ ├── extract.py
│ ├── category_manager.py
│ ├── all bonn.py files

Copy link
Contributor Author

@KamenDimitrov97 KamenDimitrov97 Aug 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After release I'm going to try on our onyx environment if there's any issues before writing to onyx.
Release shouldn't be an issue in their production environment. Because the bonn version they're using is 0.1.5 and I've updated for release to 0.1.6.
version used in ons

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image
The previous error.


WORKDIR /app/build
5 changes: 1 addition & 4 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -13,14 +13,11 @@ RESET := $(shell tput -Txterm sgr0)
all: build

.PHONY: build
build: Dockerfile
build:
@mkdir -p $(BUILD)/wheels
docker build -t bonn_py_build -f Dockerfile .
docker run --platform "linux/amd64" --entrypoint maturin -v $(shell pwd)/$(BUILD)/wheels:/app/build/target/wheels bonn_py_build build --find-interpreter

Dockerfile:
m4 Dockerfile.in > Dockerfile

test_data/wiki.en.fifu:
curl -o test_data/wiki.en.fifu http://www.sfs.uni-tuebingen.de/a3-public-data/finalfusion-fasttext/wiki/wiki.en.fifu

Expand Down
4 changes: 2 additions & 2 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ build-backend = "maturin"

[project]
name = "bonn"
version = "0.1.5"
version = "0.1.6"
description = "Created for ONS. Proof-of-concept mmap'd Rust word2vec implementation linked with category matching"
readme = "README.md"
license = { "file" = "LICENSE.md" }
Expand All @@ -29,5 +29,5 @@ classifiers = [
]

[tool.maturin]
python-source = "python"
python-source = ""
module-name = "bonn._bonn"
4 changes: 3 additions & 1 deletion python/bonn/category_manager.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,12 @@
import math
import re
from sortedcontainers import SortedDict
from nltk.corpus import stopwords
from nltk.corpus import stopwords, wordnet
from nltk.stem.wordnet import WordNetLemmatizer

from .utils import cosine_similarities
stopwords.ensure_loaded()
wordnet.ensure_loaded()

re_ws = re.compile(r"\s+")
re_num = re.compile(r"[^\w\s\']", flags=re.UNICODE)
Expand Down
Loading