Skip to content

Commit

Permalink
Merge branch 'master' of ssh://github.com/OCR-D/core into workflow-se…
Browse files Browse the repository at this point in the history
…rver
  • Loading branch information
bertsky committed May 4, 2022
2 parents 83b10f5 + ecdb840 commit d98daa8
Show file tree
Hide file tree
Showing 15 changed files with 240 additions and 69 deletions.
40 changes: 40 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,46 @@ Versioned according to [Semantic Versioning](http://semver.org/).

## Unreleased

## [2.33.0] - 2022-05-03

Fixed:

* `ocrd workspace remove-group`: Pass on `--recursive` to `remove_file_group`, #831, #832
* `ocrd workspace bulk-add`: handle unset file_id properly, #812, #846
* `io.BufferedReader` filename attribute should be `name` not `filename`, #838, #839

Changed:

* `OcrdWorkspace.image_from_*`: support passing explicit AlternativeImage filename, #845

Removed:

* `make asset-server` feature no longer used, #843
* `python3-pip` dependency is redundant, #813

## [2.32.0] - 2022-03-30

Fixed:

* `ocrd zip bag`: `-I` is *not* required, #828, #829

Changed:

* `OcrdExif`: fallback to PIL if ImageMagick's `identify` is not available, #796, #676
* `OcrdWorkspace.image_from_*`: Avoid false warning when recropping, #820, #687

## [2.31.0] - 2022-03-20

Changed:

* `make cuda-ubuntu` installs all CUDA versions, OCR-D/core#704, OCR-D/ocrd_all#270
* `ocrd resmgr`: updated models for ocrd-anybaseocr-{tiseg,layout-analysis}, #819, OCR-D/ocrd_anybaseocr#89

Fixed:

* Error message erroneously referenced `mets:file/@ID` instead `mets:fileGrp/@USE`, #823
* Consistently use kwargs/args in `OcrdWorkspace.save_image_file`, #822
* Missing arg for log message in WorkspaceValidator, #811

## [2.30.0] - 2022-02-01

Expand Down Expand Up @@ -1428,6 +1465,9 @@ Fixed
Initial Release

<!-- link-labels -->
[2.33.0]: ../../compare/v2.33.0..v2.32.0
[2.32.0]: ../../compare/v2.32.0..v2.31.0
[2.31.0]: ../../compare/v2.31.0..v2.30.0
[2.30.0]: ../../compare/v2.30.0..v2.29.0
[2.29.0]: ../../compare/v2.29.0..v2.28.0
[2.28.0]: ../../compare/v2.28.0..v2.27.0
Expand Down
5 changes: 3 additions & 2 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,8 @@ COPY ocrd_validators/ ./ocrd_validators
COPY Makefile .
COPY README.md .
COPY LICENSE .
RUN apt-get update && apt-get -y install --no-install-recommends \
RUN echo 'APT::Install-Recommends "0"; APT::Install-Suggests "0";' >/etc/apt/apt.conf.d/ocr-d.conf
RUN apt-get update && apt-get -y install \
ca-certificates \
software-properties-common \
python3-dev \
Expand All @@ -28,12 +29,12 @@ RUN apt-get update && apt-get -y install --no-install-recommends \
curl \
sudo \
git \
&& make deps-ubuntu \
&& pip3 install --upgrade pip setuptools \
&& make install \
&& $FIXUP \
&& rm -rf /build-ocrd


WORKDIR /data

CMD ["/usr/local/bin/ocrd", "--help"]
11 changes: 3 additions & 8 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,6 @@ help:
@echo " generate-page Regenerate python code from PAGE XSD"
@echo " spec Copy JSON Schema, OpenAPI from OCR-D/spec"
@echo " assets Setup test assets"
@echo " assets-server Start asset server at http://localhost:5001"
@echo " test Run all unit tests"
@echo " docs Build documentation"
@echo " docs-clean Clean docs"
Expand Down Expand Up @@ -60,16 +59,16 @@ PIP_INSTALL = pip install

# Dependencies for deployment in an ubuntu/debian linux
deps-ubuntu:
apt-get install -y python3 python3-pip python3-venv
apt-get install -y python3 python3-venv imagemagick

# Install test python deps via pip
deps-test:
$(PIP) install -U "pip>=19.0.0,!=20.3.2"
$(PIP) install -U pip
$(PIP) install -r requirements_test.txt

# (Re)install the tool
install:
$(PIP) install -U "pip>=19.0.0,!=20.3.2" wheel
$(PIP) install -U pip wheel
for mod in $(BUILD_ORDER);do (cd $$mod ; $(PIP_INSTALL) .);done

# Install with pip install -e
Expand Down Expand Up @@ -138,10 +137,6 @@ assets: repo/assets
mkdir -p $(TESTDIR)/assets
cp -r -t $(TESTDIR)/assets repo/assets/data/*

# Start asset server at http://localhost:5001
assets-server:
cd assets && make start


#
# Tests
Expand Down
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,10 @@ pip install ocrd_modelfactory

All python software released by [OCR-D](https://github.com/OCR-D) requires Python 3.6 or higher.

**NOTE** Some OCR-D-Tools (or even test cases) _might_ reveal an unintended behavior if you have specific enviroment modifications, like:
* using a custom build of [ImageMagick](https://github.com/ImageMagick/ImageMagick), whose format delegates are different from what OCR-D supposes
* custom Python logging configurations in your personal account

## Command line tools

**NOTE:** All OCR-D CLI tools support a `--help` flag which shows usage and
Expand Down Expand Up @@ -165,10 +169,6 @@ Download assets (`make assets`)

Test with local files: `make test`

- Test with local asset server:
- Start asset-server: `make asset-server`
- `make test OCRD_BASEURL='http://localhost:5001/'`

- Test with remote assets:
- `make test OCRD_BASEURL='https://github.com/OCR-D/assets/raw/master/data/'`

Expand Down
5 changes: 2 additions & 3 deletions ocrd/ocrd/cli/workspace.py
Original file line number Diff line number Diff line change
Expand Up @@ -303,11 +303,10 @@ def workspace_cli_bulk_add(ctx, regex, mimetype, page_id, file_id, url, file_grp
group_dict = m.groupdict()

# derive --file-id from filename if not --file-id not explicitly set
if not file_id:
file_id = safe_filename(str(file_path))
file_id_ = file_id or safe_filename(str(file_path))

# set up file info
file_dict = {'url': url, 'mimetype': mimetype, 'ID': file_id, 'pageId': page_id, 'fileGrp': file_grp}
file_dict = {'url': url, 'mimetype': mimetype, 'ID': file_id_, 'pageId': page_id, 'fileGrp': file_grp}

# guess mime type
if not file_dict['mimetype']:
Expand Down
2 changes: 1 addition & 1 deletion ocrd/ocrd/cli/zip.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ def zip_cli():
help='Basename of the METS file.',
show_default=True)
@click.option('-i', '--identifier', '--id', help="Ocrd-Identifier", required=True)
@click.option('-I', '--in-place', help="Replace workspace with bag (like bagit.py does)", required=True, is_flag=True)
@click.option('-I', '--in-place', help="Replace workspace with bag (like bagit.py does)", is_flag=True)
@click.option('-D', '--manifestation-depth', help="Ocrd-Manifestation-Depth", type=click.Choice(['full', 'partial']), default='partial')
@click.option('-m', '--mets', help="location of mets.xml in the bag's data dir", default="mets.xml")
@click.option('-b', '--base-version-checksum', help="Ocrd-Base-Version-Checksum")
Expand Down
16 changes: 10 additions & 6 deletions ocrd/ocrd/resource_list.yml
Original file line number Diff line number Diff line change
Expand Up @@ -133,19 +133,23 @@ ocrd-anybaseocr-block-segmentation:
description: block segmentation model for anybaseocr
size: 256139800
ocrd-anybaseocr-layout-analysis:
- url: https://ocr-d-repo.scc.kit.edu/models/dfki/layoutAnalysis/structure_analysis.h5
name: structure_analysis.h5
- url: https://ocr-d.kba.cloud/structure_analysis.tar.gz
name: structure_analysis
description: structure analysis model for anybaseocr
size: 31477056
type: tarball
path_in_archive: 'structure_analysis'
size: 29002514
- url: https://ocr-d-repo.scc.kit.edu/models/dfki/layoutAnalysis/mapping_densenet.pickle
name: mapping_densenet.pickle
description: mapping model for anybaseocr
size: 374
ocrd-anybaseocr-tiseg:
- url: https://ocr-d-repo.scc.kit.edu/models/dfki/tiseg/seg_model.hdf5
name: seg_model.hdf5
- url: https://ocr-d.kba.cloud/seg_model.tar.gz
name: seg_model
description: text image segmentation model for anybaseocr
size: 66080688
type: tarball
path_in_archive: 'seg_model'
size: 61388872
ocrd-kraken-segment:
- url: https://github.com/mittagessen/kraken/raw/master/kraken/blla.mlmodel
description: Pretrained baseline segmentation model
Expand Down
6 changes: 3 additions & 3 deletions ocrd/ocrd/resource_manager.py
Original file line number Diff line number Diff line change
Expand Up @@ -249,17 +249,17 @@ def download(
else:
self._copy_impl(url, fpath, progress_cb)
elif resource_type == 'tarball':
with pushd_popd(tempdir=True):
with pushd_popd(tempdir=True) as tempdir:
if is_url:
self._download_impl(url, 'download.tar.xx', progress_cb, size)
else:
self._copy_impl(url, 'download.tar.xx', progress_cb)
Path('out').mkdir()
with pushd_popd('out'):
log.info("Extracting tarball")
log.info("Extracting tarball to %s/out" % tempdir)
with open_tarfile('../download.tar.xx', 'r:*') as tar:
tar.extractall()
log.info("Copying '%s' from tarball to %s" % (path_in_archive, fpath))
log.info("Copying '%s' from extracted tarball %s/out to %s" % (path_in_archive, tempdir, fpath))
copytree(path_in_archive, str(fpath))
# TODO
# elif resource_type == 'github-dir':
Expand Down
Loading

0 comments on commit d98daa8

Please sign in to comment.