Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix merge conflicts with main #38

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 8 additions & 1 deletion .github/actions/setup-poetry/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,17 @@ runs:
- name: Install poetry
run: pipx install poetry==1.8.3
shell: bash
- uses: actions/setup-python@v4
- uses: actions/setup-python@v5
id: py
with:
python-version: ${{ inputs.python-version }}
update-environment: false
cache: 'poetry'
- name: Setup poetry env with correct python
run: |
poetry env use ${{ steps.py.outputs.python-path }}
poetry run python --version
shell: bash
- name: Install only dependencies and not the package itself
run: poetry install --all-extras --no-root
shell: bash
34 changes: 34 additions & 0 deletions .github/scripts/build_rhel.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
#!/bin/bash

set -e # trigger failure on error - do not remove!
set -x # display command on output

# Build the sdist
poetry build -f sdist

# Compile the wheel from sdist in centos stream

docker build -f - . <<EOF
FROM quay.io/centos/centos:stream9
RUN dnf config-manager --set-enabled crb
# RUN dnf copr -y enable cheimes/deepsearch-glm rhel-9-x86_64
RUN dnf install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-9.noarch.rpm \
&& dnf clean all
RUN dnf install -y --nodocs \
gcc gcc-c++ git make cmake pkgconfig glibc-devel \
python3.11 python3.11-pip python3.11-devel \
libjpeg-turbo-devel libpng-devel qpdf-devel json-devel utf8cpp-devel zlib-devel \
loguru-devel \
&& dnf clean all

RUN mkdir /src
COPY ./dist/*.tar.gz /src/

RUN USE_SYSTEM_DEPS=ON pip3.11 install /src/docling_parse*.tar.gz \
&& python3.11 -c 'from docling_parse.docling_parse import pdf_parser, pdf_parser_v2'

COPY ./tests /src/tests
RUN cd /src \
&& pip3.11 install pytest \
&& pytest
EOF
8 changes: 4 additions & 4 deletions .github/workflows/checks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,16 +12,16 @@ jobs:
python-version: ['3.9', '3.10', '3.11', '3.12']
steps:
- uses: actions/checkout@v4
# - name: Install dependencies [linux]
# run: sudo apt-get install -y libldap-common
# shell: bash
- uses: ./.github/actions/setup-poetry
with:
python-version: ${{ matrix.python-version }}
- name: Run styling check
run: poetry run pre-commit run --all-files
- name: Install with poetry
run: poetry install --all-extras
run: |
poetry install --all-extras
ls -l
ls -l docling_parse
- name: Testing
run: |
poetry run pytest -v tests
Expand Down
2 changes: 2 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,4 +21,6 @@ jobs:
uses: ./.github/workflows/checks.yml
build-wheels:
uses: ./.github/workflows/wheels.yml
rhel-build:
uses: ./.github/workflows/rhel.yml

12 changes: 12 additions & 0 deletions .github/workflows/rhel.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
on:
workflow_call:

jobs:
run-checks:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: ./.github/actions/setup-poetry
- name: Run build in docker
run: ./.github/scripts/build_rhel.sh
shell: bash
83 changes: 66 additions & 17 deletions .github/workflows/wheels.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ jobs:
# list of github vm: https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners/about-github-hosted-runners#standard-github-hosted-runners-for-public-repositories
fail-fast: false
matrix:
python-version: ["3.9", "3.10", "3.11", "3.12"]
python-version: ["3.10", "3.11", "3.12"]

os:
- name: "ubuntu-latest"
Expand Down Expand Up @@ -51,11 +51,21 @@ jobs:

- name: Setup Python
uses: actions/setup-python@v5
id: py
with:
python-version: ${{ matrix.python-version }}
update-environment: false

- name: Install Poetry
run: python -m pip install poetry==1.8.3
run: |
which python
python --version
which python3
python3 --version
echo "pythonpath: ${{ steps.py.outputs.python-path }}"
${{ steps.py.outputs.python-path }} --version
pipx install poetry==1.8.3
poetry env use ${{ steps.py.outputs.python-path }}

- name: Set up custom PATH and set py version to cpXYZ [windows]
if: ${{matrix.os.platform_id == 'win_amd64'}}
Expand Down Expand Up @@ -97,9 +107,15 @@ jobs:
CIBW_ENVIRONMENT: "MACOSX_DEPLOYMENT_TARGET=${{ matrix.os.macos_version }}.0"
ARCHFLAGS: -arch x86_64
BUILD_THREADS: "4"
PYTORCH_MPS_HIGH_WATERMARK_RATIO: "0.0"
run: |
echo "Building wheel ${CIBW_BUILD}"
PY_CACHE_TAG=$(poetry run python -c 'import sys;print(sys.implementation.cache_tag)')
echo "Building wheel ${CIBW_BUILD} ${{ env.CIBW_BUILD }}"
echo "Building cp: ${{ env.python_cp_version }}"
echo "Building cache_tag: ${PY_CACHE_TAG}"
echo "Building platform_id: ${{ matrix.os.platform_id }}"
poetry run python --version
poetry run python --version | grep ${{ matrix.python-version }}
poetry install --no-root --only=build
cat ./pyproject.toml
poetry run python -m cibuildwheel --output-dir wheelhouse
Expand All @@ -115,8 +131,12 @@ jobs:
for file in ./wheelhouse/*.whl; do
echo "Inspecting $file"
poetry run python -m zipfile --list "$file"
echo "Checking if .so is contained in the wheel"
poetry run python -m zipfile --list "$file" | grep \\.so
echo "Checking if the correct python version is contained in the wheel"
poetry run python -m zipfile --list "$file" | grep ${PY_CACHE_TAG}
done
mkdir ./dist
mkdir -p ./dist
cp wheelhouse/*.whl ./dist/

# there is an error with the tagging of wheels for macosx-arm64
Expand All @@ -137,9 +157,16 @@ jobs:
CIBW_ENVIRONMENT: "MACOSX_DEPLOYMENT_TARGET=${{ matrix.os.macos_version }}.0"
ARCHFLAGS: -arch arm64
BUILD_THREADS: "4"
PYTORCH_MPS_HIGH_WATERMARK_RATIO: "0.0"
CUDA_VISIBLE_DEVICES: "cpu"
run: |
echo "Building wheel ${CIBW_BUILD}"
PY_CACHE_TAG=$(poetry run python -c 'import sys;print(sys.implementation.cache_tag)')
echo "Building wheel ${CIBW_BUILD} ${{ env.CIBW_BUILD }}"
echo "Building cp: ${{ env.python_cp_version }}"
echo "Building cache_tag: ${PY_CACHE_TAG}"
echo "Building platform_id: ${{ matrix.os.platform_id }}"
poetry run python --version
poetry run python --version | grep ${{ matrix.python-version }}
poetry install --no-root --only=build
cat ./pyproject.toml
poetry run python -m cibuildwheel --output-dir wheelhouse
Expand All @@ -155,8 +182,12 @@ jobs:
for file in ./wheelhouse/*.whl; do
echo "Inspecting $file"
poetry run python -m zipfile --list "$file"
echo "Checking if .so is contained in the wheel"
poetry run python -m zipfile --list "$file" | grep \\.so
echo "Checking if the correct python version is contained in the wheel"
poetry run python -m zipfile --list "$file" | grep ${PY_CACHE_TAG}
done
mkdir ./dist
mkdir -p ./dist
cp wheelhouse/*.whl ./dist/

- name: Set up QEMU [linux]
Expand All @@ -165,6 +196,13 @@ jobs:
with:
platforms: all

- name: Build sdist
# build only on Linux to avoid too many duplicates of the sdist
if: matrix.os.name == 'ubuntu-latest'
run: |
echo "Building wheel ${CIBW_BUILD}"
poetry build -f sdist

- name: Build wheels [linux]
if: matrix.os.name == 'ubuntu-latest'
env:
Expand All @@ -176,17 +214,25 @@ jobs:
CIBW_BUILD_VERBOSITY: 3
BUILD_THREADS: "8"
run: |
echo "Building wheel ${CIBW_BUILD}"
PY_CACHE_TAG=$(poetry run python -c 'import sys;print(sys.implementation.cache_tag)')
echo "Building cp: ${{ env.python_cp_version }}"
echo "Building cache_tag: ${PY_CACHE_TAG}"
echo "Building platform_id: ${{ matrix.os.platform_id }}"
poetry run python --version
poetry run python --version | grep ${{ matrix.python-version }}
poetry install --no-root --only=build
cat ./pyproject.toml
poetry run python -m cibuildwheel --output-dir ./wheelhouse
ls -l ./wheelhouse
for file in ./wheelhouse/*.whl; do
echo "Inspecting $file"
poetry run python -m zipfile --list "$file"
echo "Checking if .so is contained in the wheel"
poetry run python -m zipfile --list "$file" | grep \\.so
echo "Checking if the correct python version is contained in the wheel"
poetry run python -m zipfile --list "$file" | grep ${PY_CACHE_TAG}
done
mkdir ./dist
mkdir -p ./dist
cp wheelhouse/*.whl ./dist/

- name: Set up MSYS2 [windows]
Expand All @@ -196,12 +242,15 @@ jobs:
update: true
install: >
mingw-w64-x86_64-toolchain
mingw-w64-i686-toolchain
mingw-w64-x86_64-gcc-libs

- name: Set up QPDF external-libs [windows]
- name: Set up external-libs [windows]
if: ${{matrix.os.platform_id == 'win_amd64'}}
shell: pwsh
run: |
Copy-Item -Path "C:/mingw64/bin/libgcc_s_seh-1.dll" -Destination ".\docling_parse"
Copy-Item -Path "C:/mingw64/bin/libstdc++-6.dll" -Destination ".\docling_parse"
Copy-Item -Path "C:/mingw64/bin/libwinpthread-1.dll" -Destination ".\docling_parse"
New-Item -Path 'C:\windows-libs' -ItemType Directory -Force
Invoke-WebRequest -Uri 'https://github.com/qpdf/external-libs/releases/download/release-2024-06-07/qpdf-external-libs-bin.zip' -OutFile 'C:\windows-libs\qpdf-external-libs-bin.zip'
Expand-Archive -Path 'C:\windows-libs\qpdf-external-libs-bin.zip' -DestinationPath 'C:\windows-libs' -Force
Expand All @@ -222,11 +271,11 @@ jobs:
CIBW_BUILD_VERBOSITY: 3
CIBW_ARCHS: AMD64
CIBW_PROJECT_REQUIRES_PYTHON: "~=${{ matrix.python-version }}.0"
PKG_CONFIG_PATH: "C:/msys64/mingw64/lib/pkgconfig"
PKG_CONFIG_EXECUTABLE: "C:/msys64/usr/bin/pkg-config.exe"
CMAKE_PREFIX_PATH: "C:/msys64/mingw64;C:/windows-libs/external-libs"
CMAKE_LIBRARY_PATH: "C:/msys64/mingw64/lib;C:/windows-libs/external-libs/lib-mingw64"
CMAKE_INCLUDE_PATH: "C:/msys64/mingw64/include;C:/windows-libs/external-libs/include"
PKG_CONFIG_PATH: "C:/msys64/usr/lib/pkgconfig"
PKG_CONFIG_EXECUTABLE: "C:/mingw64/bin/pkg-config.exe"
CMAKE_PREFIX_PATH: "C:/msys64/mingw64;C:/mingw64;C:/windows-libs/external-libs"
CMAKE_LIBRARY_PATH: "C:/msys64/mingw64/lib;C:/mingw64/lib;C:/windows-libs/external-libs/lib-mingw64"
CMAKE_INCLUDE_PATH: "C:/msys64/mingw64/include;C:/mingw64/include;C:/windows-libs/external-libs/include"
CMAKE_GENERATOR: "MSYS Makefiles"
BUILD_THREADS: 1
ASM_NASM: "C:/nasm/nasm.exe"
Expand All @@ -249,9 +298,9 @@ jobs:

- name: publish wheels (dry run)
run: |
poetry publish --dry-run --no-interaction -vvv
poetry publish --skip-existing --dry-run --no-interaction -vvv

- name: publish wheels (on publishing) [for releases only]
if: ${{ startsWith(github.ref, 'refs/tags/') }}
run: |
poetry publish --no-interaction -vvv
poetry publish --skip-existing --no-interaction -vvv
1 change: 0 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -287,7 +287,6 @@ pyrightconfig.json
[Ll]ib
[Ll]ib64
[Ll]ocal
[Ss]cripts
pyvenv.cfg
pip-selfcheck.json

Expand Down
22 changes: 22 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,25 @@
## [v1.4.1](https://github.com/DS4SD/docling-parse/releases/tag/v1.4.1) - 2024-10-02

### Fix

* Windows build properly linking to system libraries ([#36](https://github.com/DS4SD/docling-parse/issues/36)) ([`e26ed05`](https://github.com/DS4SD/docling-parse/commit/e26ed056c22400552918c3a97dfb13614c9a03f5))

## [v1.4.0](https://github.com/DS4SD/docling-parse/releases/tag/v1.4.0) - 2024-10-02

### Feature

* Build using system deps ([#33](https://github.com/DS4SD/docling-parse/issues/33)) ([`e1c8e49`](https://github.com/DS4SD/docling-parse/commit/e1c8e4980faab35bfdf6d1a78d8749745c560889))

### Fix

* Python version in wheels ([#31](https://github.com/DS4SD/docling-parse/issues/31)) ([`8d903ba`](https://github.com/DS4SD/docling-parse/commit/8d903baf61a7706066374c23265e115a9513c3ba))

## [v1.3.1](https://github.com/DS4SD/docling-parse/releases/tag/v1.3.1) - 2024-09-30

### Fix

* Sdist and wheels content ([#28](https://github.com/DS4SD/docling-parse/issues/28)) ([`f3febc5`](https://github.com/DS4SD/docling-parse/commit/f3febc53a2a6565b16847113633f92d1a2dab48a))

## [v1.3.0](https://github.com/DS4SD/docling-parse/releases/tag/v1.3.0) - 2024-09-20

### Feature
Expand Down
Loading
Loading