Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

For some schedulers, setting PIMS image reader's .class_priority is ineffective in controlling dask-image.imread() #262

Open
ParticularMiner opened this issue Apr 30, 2022 · 15 comments

Comments

@ParticularMiner
Copy link

cc: @jmdelahanty

Hi dask-image developers!

Normally an end-user may control which reader pims.open() uses to load images by simply increasing the .class_priority attribute of their preferred pims reader prior to calling pims.open(). See this link.

pims.ImageIOReader.class_priority = 100  # we set this very high in order to force pims.open() to use this reader
rgb_frames = pims.open('/path/to/video/file.mpg')  # uses ImageIOReader

Since dask-image.imread() uses pims.open(), it would be great if it could mirror such functionality too.

pims.ImageIOReader.class_priority = 100  # we set this very high in order to force dask's imread() to use this reader [via pims.open()]
rgb_frames = dask_image.imread.imread('/path/to/video/file.mpg')  # uses ImageIOReader

And indeed this functionality does work for dask-image.imread() in single-machine schedulers, like "threading" and "sync". But I do not know of a way to make all processes, in a multi-process scheduler, for example, aware of the preferred reader's increased .class_priority. Any help here would be greatly appreciated.

Alternatively, it might be an idea to modify dask-image.imread() to receive a "reader" keyword argument which indicates the end-user's preferred PIMS reader.

@GenevieveBuckley
Copy link
Collaborator

Hi @ParticularMiner

I see how that would be useful. Have you tried using the dask.array.image.imread function (regular dask, not the one in dask-image)? It allows you to pass in your preferred reader function directly, which seems even easier than fiddling with the priority levels.

from dask.array.image import imread
import pims

data = imread('path/to/files/*.tif', imread=pims.ImageIOReader)

(Having two different imread functions in two different places kinda violates the python zen "There should be one-- and preferably only one --obvious way to do it", which I don't like. You can read some more discussion about that here if you like: #229)

@GenevieveBuckley
Copy link
Collaborator

Let us know if that fixes your issue
(and also feel free to let us know if you have opinions about #229. Development is stalled now I'm no longer working full time on dask stuff, but it's still good to hear from people)

@ParticularMiner
Copy link
Author

Hi @GenevieveBuckley ,

Thank you! Until now, I had been unaware of dask.array.image.imread().

The API of dask.array.image.imread() is certainly attractive, in that it allows the use of other readers. But it would be great if it also had some of the other keyword arguments of dask_image.imread.imread(). But I agree, that dask should have only one such function. And perhaps since dask-image presumably deals with all things image, then it would make sense for dask.array.image.imread() to be moved into dask_image.imread.imread().

Unfortunately though, as it is now, dask.array.image.imread() raised an exception while reading a video file which dask_image.imread.imread() had no problem reading:

from dask.array.image import imread
import pims


video = imread('path/to/video.mp4', imread=pims.ImageIOReader)
Click to see error messages:
dask\array\image.py:58: in imread
    keys = [(name, i) + (0,) * len(sample.shape) for i in range(len(filenames))]
        filename   = 'path/to/video.mp4'
        filenames  = ['path/to/video.mp4']
        imread     = <class 'pims.imageio_reader.ImageIOReader'>
        name       = 'imread-baa7a8184312ac7b15459beea41cbd90'
        preprocess = None
        sample     = <FramesSequenceND>
Axes: 3
Axis 'x' size: 1920
Axis 'y' size: 1080
Axis 't' size: 851
Pixel Datatype: uint8
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

.0 = <range_iterator object at 0x000002C0CE3F6730>

>   keys = [(name, i) + (0,) * len(sample.shape) for i in range(len(filenames))]
E   AttributeError: 'ImageIOReader' object has no attribute 'shape'

.0         = <range_iterator object at 0x000002C0CE3F6730>
i          = 0
name       = 'imread-baa7a8184312ac7b15459beea41cbd90'
sample     = <FramesSequenceND>
Axes: 3
Axis 'x' size: 1920
Axis 'y' size: 1080
Axis 't' size: 851
Pixel Datatype: uint8

dask\array\image.py:58: AttributeError

@GenevieveBuckley
Copy link
Collaborator

  1. Are you able to share a small example video file? I've tried using some of the demo video files available here, but wasn't able to reproduce the error you show above.

  2. Can you share the output from conda list and/or pip list? Knowing which versions you have for the different python libraries would be helpful.

@ParticularMiner
Copy link
Author

ParticularMiner commented May 11, 2022

Sure.

  • Download video file: test_vid.mp4. I was also unable to open the demo video files at the link you provided.

  • conda list

    # packages in environment at /path/to/conda/environment:
    #
    # Name Version Build Channel
    aiohttp 3.8.1 py39hb82d6ee_0 conda-forge
    aiosignal 1.2.0 pyhd8ed1ab_0 conda-forge
    anyio 3.5.0 py39hcbf5309_0 conda-forge
    aom 3.3.0 h0e60522_1 conda-forge
    apptools 5.1.0 pyh44b312d_0 conda-forge
    argon2-cffi 21.3.0 pyhd8ed1ab_0 conda-forge
    argon2-cffi-bindings 21.2.0 py39hb82d6ee_1 conda-forge
    asciitree 0.3.3 py_2 conda-forge
    astroid 2.8.6 py39hcbf5309_1 conda-forge
    asttokens 2.0.5 pyhd8ed1ab_0 conda-forge
    async-timeout 4.0.2 pyhd8ed1ab_0 conda-forge
    atomicwrites 1.4.0 pyh9f0ad1d_0 conda-forge
    attrs 21.4.0 pyhd8ed1ab_0 conda-forge
    babel 2.9.1 pyh44b312d_0 conda-forge
    backcall 0.2.0 pyh9f0ad1d_0 conda-forge
    backports 1.0 py_2 conda-forge
    backports.functools_lru_cache 1.6.4 pyhd8ed1ab_0 conda-forge
    beautifulsoup4 4.10.0 pyha770c72_0 conda-forge
    black 21.12b0 pyhd8ed1ab_0 conda-forge
    bleach 4.1.0 pyhd8ed1ab_0 conda-forge
    blosc 1.21.0 h0e60522_0 conda-forge
    bokeh 2.4.2 py39hcbf5309_0 conda-forge
    brotlipy 0.7.0 py39hb82d6ee_1003 conda-forge
    bzip2 1.0.8 h8ffe710_4 conda-forge
    c-blosc2 2.0.4 h09319c2_1 conda-forge
    ca-certificates 2021.10.8 h5b45459_0 conda-forge
    cairo 1.16.0 hb19e0ff_1008 conda-forge
    certifi 2021.10.8 py39hcbf5309_2 conda-forge
    cffi 1.15.0 py39h0878f49_0 conda-forge
    cfgv 3.3.1 pyhd8ed1ab_0 conda-forge
    cfitsio 4.1.0 h5a969a9_0 conda-forge
    chardet 4.0.0 py39hcbf5309_2 conda-forge
    charls 2.3.4 h39d44d4_0 conda-forge
    charset-normalizer 2.0.7 pyhd8ed1ab_0 conda-forge
    click 8.0.4 py39hcbf5309_0 conda-forge
    cloudpickle 2.0.0 pyhd8ed1ab_0 conda-forge
    colorama 0.4.4 pyh9f0ad1d_0 conda-forge
    conda 4.11.0 py39hcbf5309_0 conda-forge
    conda-build 3.21.7 py39hcbf5309_0 conda-forge
    conda-package-handling 1.8.0 py39hb3671d1_0 conda-forge
    configobj 5.0.6 py_0 conda-forge
    coverage 6.3.2 py39hb82d6ee_1 conda-forge
    cryptography 36.0.1 py39h7bc7c5c_0 conda-forge
    curl 7.82.0 h789b8ee_0 conda-forge
    cycler 0.11.0 pyhd8ed1ab_0 conda-forge
    cytoolz 0.11.2 py39hb82d6ee_1 conda-forge
    dask 2022.3.0+8.gad98d4ac.dirty dev_0
    dask-core 2022.2.1 pyhd3eb1b0_0
    dask-image 2021.12.0 pyhd8ed1ab_0 conda-forge
    dataclasses 0.8 pyhc8e2a94_3 conda-forge
    debugpy 1.5.1 py39h415ef7b_0 conda-forge
    decorator 5.1.1 pyhd8ed1ab_0 conda-forge
    defusedxml 0.7.1 pyhd8ed1ab_0 conda-forge
    distlib 0.3.4 pyhd8ed1ab_0 conda-forge
    distributed 2022.2.1 pyhd8ed1ab_0 conda-forge
    donfig 0.6.0 pyhd8ed1ab_0 conda-forge
    double-conversion 3.2.0 h0e60522_0 conda-forge
    eigen 3.4.0 h2d74725_0 conda-forge
    entrypoints 0.4 pyhd8ed1ab_0 conda-forge
    envisage 6.0.1 pyhd8ed1ab_0 conda-forge
    executing 0.8.3 pyhd8ed1ab_0 conda-forge
    expat 2.4.7 h39d44d4_0 conda-forge
    fasteners 0.17.3 pyhd8ed1ab_0 conda-forge
    ffmpeg 4.3.1 ha925a31_0 conda-forge
    filelock 3.6.0 pyhd8ed1ab_0 conda-forge
    flake8 3.9.2 pyhd8ed1ab_0 conda-forge
    flit-core 3.7.1 pyhd8ed1ab_0 conda-forge
    font-ttf-dejavu-sans-mono 2.37 hab24e00_0 conda-forge
    font-ttf-inconsolata 3.000 h77eed37_0 conda-forge
    font-ttf-source-code-pro 2.038 h77eed37_0 conda-forge
    font-ttf-ubuntu 0.83 hab24e00_0 conda-forge
    fontconfig 2.13.96 hce3cb01_2 conda-forge
    fonts-conda-ecosystem 1 0 conda-forge
    fonts-conda-forge 1 0 conda-forge
    freetype 2.10.4 h546665d_1 conda-forge
    fribidi 1.0.10 h8d14728_0 conda-forge
    frozenlist 1.3.0 py39hb82d6ee_0 conda-forge
    fsspec 2022.2.0 pyhd8ed1ab_0 conda-forge
    getopt-win32 0.1 h8ffe710_0 conda-forge
    gettext 0.19.8.1 ha2e2712_1008 conda-forge
    giflib 5.2.1 h8d14728_2 conda-forge
    gl2ps 1.4.2 h0597ee9_0 conda-forge
    glew 2.1.0 h39d44d4_2 conda-forge
    glob2 0.7 py_0 conda-forge
    graphblas 5.1.10 h0e60522_0 conda-forge
    graphite2 1.3.13 1000 conda-forge
    graphviz 2.50.0 hefbd956_1 conda-forge
    gts 0.7.6 h7c369d9_2 conda-forge
    harfbuzz 3.1.1 hc601d6f_0 conda-forge
    hdf4 4.2.15 h0e5069d_3 conda-forge
    hdf5 1.12.1 nompi_h2a0e4a3_104 conda-forge
    heapdict 1.0.1 py_0 conda-forge
    icu 68.2 h0e60522_0 conda-forge
    identify 2.4.12 pyhd8ed1ab_0 conda-forge
    idna 3.3 pyhd8ed1ab_0 conda-forge
    imagecodecs 2022.2.22 py39h279a0da_3 conda-forge
    imageio 2.16.2 pyhcf75d05_0 conda-forge
    imageio-ffmpeg 0.4.5 pyhd8ed1ab_0 conda-forge
    importlib-metadata 4.11.3 py39hcbf5309_0 conda-forge
    importlib_metadata 4.11.3 hd8ed1ab_1 conda-forge
    importlib_resources 5.4.0 pyhd8ed1ab_0 conda-forge
    iniconfig 1.1.1 pyh9f0ad1d_0 conda-forge
    intel-openmp 2022.0.0 h57928b3_3663 conda-forge
    ipykernel 6.9.2 py39h832f523_0 conda-forge
    ipympl 0.8.8 pyhd8ed1ab_0 conda-forge
    ipython 8.1.1 py39hcbf5309_0 conda-forge
    ipython_genutils 0.2.0 py_1 conda-forge
    ipywidgets 7.7.0 pyhd8ed1ab_0 conda-forge
    isort 5.10.1 pyhd8ed1ab_0 conda-forge
    jbig 2.1 h8d14728_2003 conda-forge
    jedi 0.18.1 py39hcbf5309_0 conda-forge
    jinja2 3.0.3 pyhd8ed1ab_0 conda-forge
    joblib 1.1.0 pyhd8ed1ab_0 conda-forge
    jpeg 9e h8ffe710_0 conda-forge
    json5 0.9.5 pyh9f0ad1d_0 conda-forge
    jsoncpp 1.9.5 h2d74725_1 conda-forge
    jsonschema 4.4.0 pyhd8ed1ab_0 conda-forge
    jupyter_client 7.1.2 pyhd8ed1ab_0 conda-forge
    jupyter_core 4.9.2 py39hcbf5309_0 conda-forge
    jupyter_server 1.13.5 pyhd8ed1ab_0 conda-forge
    jupyterlab 3.2.4 pyhd8ed1ab_0 conda-forge
    jupyterlab_pygments 0.1.2 pyh9f0ad1d_0 conda-forge
    jupyterlab_server 2.11.0 pyhd8ed1ab_0 conda-forge
    jupyterlab_widgets 1.1.0 pyhd8ed1ab_0 conda-forge
    jxrlib 1.1 h8ffe710_2 conda-forge
    kaleido 0.2.1 pypi_0 pypi
    kiwisolver 1.4.0 py39h2e07f2f_0 conda-forge
    krb5 1.19.3 h1176d77_0 conda-forge
    lazy-object-proxy 1.7.1 py39hb82d6ee_0 conda-forge
    lcms2 2.12 h2a16943_0 conda-forge
    lerc 3.0 h0e60522_0 conda-forge
    libaec 1.0.6 h39d44d4_0 conda-forge
    libarchive 3.5.2 hb45042f_1 conda-forge
    libavif 0.10.0 h8ffe710_1 conda-forge
    libblas 3.9.0 13_win64_mkl conda-forge
    libbrotlicommon 1.0.9 h8ffe710_7 conda-forge
    libbrotlidec 1.0.9 h8ffe710_7 conda-forge
    libbrotlienc 1.0.9 h8ffe710_7 conda-forge
    libcblas 3.9.0 13_win64_mkl conda-forge
    libclang 11.1.0 default_h5c34c98_1 conda-forge
    libcurl 7.82.0 h789b8ee_0 conda-forge
    libdeflate 1.10 h8ffe710_0 conda-forge
    libffi 3.4.2 h8ffe710_5 conda-forge
    libflang 5.0.0 h6538335_20180525 conda-forge
    libgd 2.3.3 h8bb91b0_0 conda-forge
    libglib 2.70.2 h3be07f2_4 conda-forge
    libiconv 1.16 he774522_0 conda-forge
    liblapack 3.9.0 13_win64_mkl conda-forge
    liblief 0.11.5 h0e60522_1 conda-forge
    libnetcdf 4.8.1 nompi_h1cc8e9d_101 conda-forge
    libogg 1.3.4 h8ffe710_1 conda-forge
    libpng 1.6.37 h1d00b33_2 conda-forge
    libsodium 1.0.18 h8d14728_1 conda-forge
    libssh2 1.10.0 h680486a_2 conda-forge
    libtheora 1.1.1 h8d14728_1005 conda-forge
    libtiff 4.3.0 hc4061b1_3 conda-forge
    libwebp 1.2.2 h57928b3_0 conda-forge
    libwebp-base 1.2.2 h8ffe710_1 conda-forge
    libxcb 1.13 hcd874cb_1004 conda-forge
    libxml2 2.9.12 hf5bbc77_1 conda-forge
    libzip 1.8.0 hfed4ece_1 conda-forge
    libzlib 1.2.11 h8ffe710_1013 conda-forge
    libzopfli 1.0.3 h0e60522_0 conda-forge
    llvm-meta 5.0.0 0 conda-forge
    llvmlite 0.37.0 py39ha0cd8c8_0 conda-forge
    locket 0.2.0 py_2 conda-forge
    loguru 0.6.0 py39hcbf5309_1 conda-forge
    lz4-c 1.9.3 h8ffe710_1 conda-forge
    lzo 2.10 he774522_1000 conda-forge
    m2-msys2-runtime 2.5.0.17080.65c939c 3 conda-forge
    m2-patch 2.7.5 2 conda-forge
    m2w64-gcc-libgfortran 5.3.0 6 conda-forge
    m2w64-gcc-libs 5.3.0 7 conda-forge
    m2w64-gcc-libs-core 5.3.0 7 conda-forge
    m2w64-gmp 6.1.0 2 conda-forge
    m2w64-libwinpthread-git 5.0.0.4634.697f757 2 conda-forge
    markupsafe 2.1.1 py39hb82d6ee_0 conda-forge
    matplotlib 3.4.3 py39hcbf5309_1 conda-forge
    matplotlib-base 3.4.3 py39h581301d_2 conda-forge
    matplotlib-inline 0.1.3 pyhd8ed1ab_0 conda-forge
    mayavi 4.7.4 py39h4a0cae3_1 conda-forge
    mccabe 0.6.1 py_1 conda-forge
    menuinst 1.4.18 py39hcbf5309_1 conda-forge
    mistune 0.8.4 py39hb82d6ee_1005 conda-forge
    mkl 2022.0.0 h0e2418a_796 conda-forge
    more-itertools 8.12.0 pyhd8ed1ab_0 conda-forge
    msgpack-python 1.0.3 py39h2e07f2f_0 conda-forge
    msys2-conda-epoch 20160418 1 conda-forge
    multidict 6.0.2 py39hb82d6ee_0 conda-forge
    mypy_extensions 0.4.3 py39hcbf5309_4 conda-forge
    nbclassic 0.3.7 pyhd8ed1ab_0 conda-forge
    nbclient 0.5.13 pyhd8ed1ab_0 conda-forge
    nbconvert 6.4.4 py39hcbf5309_0 conda-forge
    nbformat 5.2.0 pyhd8ed1ab_0 conda-forge
    nest-asyncio 1.5.4 pyhd8ed1ab_0 conda-forge
    networkx 2.8 pyhd8ed1ab_0 conda-forge
    nodeenv 1.6.0 pyhd8ed1ab_0 conda-forge
    notebook 6.4.10 pyha770c72_0 conda-forge
    notebook-shim 0.1.0 pyhd8ed1ab_0 conda-forge
    numba 0.54.1 py39hb8cd55e_0 conda-forge
    numcodecs 0.9.1 py39h415ef7b_2 conda-forge
    numpy 1.20.3 py39h6635163_1 conda-forge
    openjpeg 2.4.0 hb211442_1 conda-forge
    openmp 5.0.0 vc14_1 conda-forge
    openssl 1.1.1n h8ffe710_0 conda-forge
    packaging 21.3 pyhd8ed1ab_0 conda-forge
    pandas 1.3.4 py39h2e25243_0 conda-forge
    pandoc 2.17.1.1 h57928b3_0 conda-forge
    pandocfilters 1.5.0 pyhd8ed1ab_0 conda-forge
    pango 1.48.10 h33e4779_2 conda-forge
    parso 0.8.3 pyhd8ed1ab_0 conda-forge
    partd 1.2.0 pyhd8ed1ab_0 conda-forge
    pathspec 0.9.0 pyhd8ed1ab_0 conda-forge
    pcre 8.45 h0e60522_0 conda-forge
    pickleshare 0.7.5 py_1003 conda-forge
    pillow 9.0.1 py39ha53f419_2 conda-forge
    pims 0.5 pyh9f0ad1d_1 conda-forge
    pip 22.0.4 pyhd8ed1ab_0 conda-forge
    pixman 0.40.0 h8ffe710_0 conda-forge
    pkginfo 1.8.2 pyhd8ed1ab_0 conda-forge
    platformdirs 2.5.1 pyhd8ed1ab_0 conda-forge
    plotly 5.4.0 pyhd8ed1ab_0 conda-forge
    pluggy 1.0.0 py39hcbf5309_2 conda-forge
    pre-commit 2.15.0 py39hcbf5309_1 conda-forge
    proj 9.0.0 h1cfcee9_1 conda-forge
    prometheus_client 0.13.1 pyhd8ed1ab_0 conda-forge
    prompt-toolkit 3.0.27 pyha770c72_0 conda-forge
    psutil 5.9.0 py39hb82d6ee_0 conda-forge
    pthread-stubs 0.4 hcd874cb_1001 conda-forge
    pugixml 1.11.4 h0e60522_0 conda-forge
    pure_eval 0.2.2 pyhd8ed1ab_0 conda-forge
    py 1.11.0 pyh6c4a22f_0 conda-forge
    py-lief 0.11.5 py39h415ef7b_1 conda-forge
    pycodestyle 2.7.0 pyhd3eb1b0_0
    pycosat 0.6.3 py39hb82d6ee_1009 conda-forge
    pycparser 2.21 pyhd8ed1ab_0 conda-forge
    pyface 7.4.1 pyhd8ed1ab_0 conda-forge
    pyflakes 2.3.1 pyhd8ed1ab_0 conda-forge
    pygments 2.11.2 pyhd8ed1ab_0 conda-forge
    pylint 2.11.1 pyhd8ed1ab_0 conda-forge
    pyopenssl 22.0.0 pyhd8ed1ab_0 conda-forge
    pyparsing 3.0.7 pyhd8ed1ab_0 conda-forge
    pyqt 5.12.3 py39hcbf5309_8 conda-forge
    pyqt-impl 5.12.3 py39h415ef7b_8 conda-forge
    pyqt5-sip 4.19.18 py39h415ef7b_8 conda-forge
    pyqtchart 5.12 py39h415ef7b_8 conda-forge
    pyqtwebengine 5.12.1 py39h415ef7b_8 conda-forge
    pyrsistent 0.18.1 py39hb82d6ee_0 conda-forge
    pysocks 1.7.1 py39hcbf5309_4 conda-forge
    pytest 6.2.5 py39hcbf5309_1 conda-forge
    pytest-cov 3.0.0 pyhd8ed1ab_0 conda-forge
    python 3.9.10 h9a09f29_2_cpython conda-forge
    python-blosc 1.10.2 py39h2e25243_2 conda-forge
    python-dateutil 2.8.2 pyhd8ed1ab_0 conda-forge
    python-graphviz 0.18.2 pyhaef67bd_0 conda-forge
    python-libarchive-c 4.0 py39hcbf5309_0 conda-forge
    python-suitesparse-graphblas 5.1.10.1 py39h5d4886f_1 conda-forge
    python_abi 3.9 2_cp39 conda-forge
    pytz 2021.3 pyhd8ed1ab_0 conda-forge
    pywavelets 1.3.0 py39h5d4886f_1 conda-forge
    pywin32 303 py39hb82d6ee_0 conda-forge
    pywinpty 2.0.5 py39h99910a6_0 conda-forge
    pyyaml 6.0 py39hb82d6ee_3 conda-forge
    pyzmq 22.3.0 py39he46f08e_1 conda-forge
    qt 5.12.9 h5909a2a_4 conda-forge
    quaternion 2022.4.1 py39h5d4886f_2 conda-forge
    requests 2.27.1 pyhd8ed1ab_0 conda-forge
    ripgrep 13.0.0 h7f3b576_2 conda-forge
    ruamel_yaml 0.15.80 py39hb82d6ee_1006 conda-forge
    scikit-image 0.19.2 py39h2e25243_0 conda-forge
    scikit-learn 1.0.2 py39he931e04_0 conda-forge
    scipy 1.8.0 py39hc0c34ad_0 conda-forge
    send2trash 1.8.0 pyhd8ed1ab_0 conda-forge
    setuptools 59.8.0 py39hcbf5309_0 conda-forge
    six 1.16.0 pyh6c4a22f_0 conda-forge
    slicerator 1.1.0 pyhd8ed1ab_0 conda-forge
    snappy 1.1.8 ha925a31_3 conda-forge
    sniffio 1.2.0 py39hcbf5309_2 conda-forge
    sortedcontainers 2.4.0 pyhd8ed1ab_0 conda-forge
    soupsieve 2.3.1 pyhd8ed1ab_0 conda-forge
    sparse 0.13.0 pyhd8ed1ab_0 conda-forge
    sqlite 3.37.1 h8ffe710_0 conda-forge
    stack_data 0.2.0 pyhd8ed1ab_0 conda-forge
    tbb 2021.5.0 h2d74725_0 conda-forge
    tbb-devel 2021.5.0 h2d74725_0 conda-forge
    tblib 1.7.0 pyhd8ed1ab_0 conda-forge
    tenacity 8.0.1 pyhd8ed1ab_0 conda-forge
    terminado 0.13.3 py39hcbf5309_0 conda-forge
    testpath 0.6.0 pyhd8ed1ab_0 conda-forge
    threadpoolctl 3.1.0 pyh8a188c0_0 conda-forge
    tifffile 2022.4.8 pyhd8ed1ab_0 conda-forge
    tk 8.6.12 h8ffe710_0 conda-forge
    toml 0.10.2 pyhd8ed1ab_0 conda-forge
    tomli 1.2.2 pyhd8ed1ab_0 conda-forge
    toolz 0.11.2 pyhd8ed1ab_0 conda-forge
    tornado 6.1 py39hb82d6ee_2 conda-forge
    tqdm 4.63.0 pyhd8ed1ab_0 conda-forge
    traitlets 5.1.1 pyhd8ed1ab_0 conda-forge
    traits 6.3.2 py39hb82d6ee_1 conda-forge
    traitsui 7.3.1 pyhd8ed1ab_0 conda-forge
    typed-ast 1.5.2 py39hb82d6ee_0 conda-forge
    typing-extensions 4.1.1 hd8ed1ab_0 conda-forge
    typing_extensions 4.1.1 pyha770c72_0 conda-forge
    tzdata 2022a h191b570_0 conda-forge
    ucrt 10.0.20348.0 h57928b3_0 conda-forge
    ukkonen 1.0.1 py39h2e07f2f_1 conda-forge
    urllib3 1.26.9 pyhd8ed1ab_0 conda-forge
    utfcpp 3.2.1 h57928b3_0 conda-forge
    vc 14.2 hb210afc_6 conda-forge
    virtualenv 20.13.4 py39hcbf5309_0 conda-forge
    vs2015_runtime 14.29.30037 h902a5da_6 conda-forge
    vtk 9.1.0 qt_py39h1ab545e_207 conda-forge
    wcwidth 0.2.5 pyh9f0ad1d_2 conda-forge
    webencodings 0.5.1 py_1 conda-forge
    websocket-client 1.3.1 pyhd8ed1ab_0 conda-forge
    wheel 0.37.1 pyhd8ed1ab_0 conda-forge
    widgetsnbextension 3.6.0 py39hcbf5309_0 conda-forge
    win32_setctime 1.1.0 pyhd8ed1ab_0 conda-forge
    win_inet_pton 1.1.0 py39hcbf5309_3 conda-forge
    winpty 0.4.3 4 conda-forge
    wrapt 1.13.3 py39hb82d6ee_1 conda-forge
    xorg-kbproto 1.0.7 hcd874cb_1002 conda-forge
    xorg-libice 1.0.10 hcd874cb_0 conda-forge
    xorg-libsm 1.2.3 hcd874cb_1000 conda-forge
    xorg-libx11 1.7.2 hcd874cb_0 conda-forge
    xorg-libxau 1.0.9 hcd874cb_0 conda-forge
    xorg-libxdmcp 1.1.3 hcd874cb_0 conda-forge
    xorg-libxext 1.3.4 hcd874cb_1 conda-forge
    xorg-libxpm 3.5.13 hcd874cb_0 conda-forge
    xorg-libxt 1.2.1 hcd874cb_2 conda-forge
    xorg-xextproto 7.3.0 hcd874cb_1002 conda-forge
    xorg-xproto 7.0.31 hcd874cb_1007 conda-forge
    xz 5.2.5 h62dcd97_1 conda-forge
    yaml 0.2.5 h8ffe710_2 conda-forge
    yarl 1.7.2 py39hb82d6ee_1 conda-forge
    zarr 2.11.1 pyhd8ed1ab_0 conda-forge
    zeromq 4.3.4 h0e60522_1 conda-forge
    zfp 0.5.5 h0e60522_8 conda-forge
    zict 2.1.0 pyhd8ed1ab_0 conda-forge
    zipp 3.7.0 pyhd8ed1ab_1 conda-forge
    zlib 1.2.11 h8ffe710_1013 conda-forge
    zstd 1.5.2 h6255e5f_0 conda-forge

@ParticularMiner
Copy link
Author

@GenevieveBuckley

After updating the python package pims and applying the fix you provided here, I was able to use dask.array.image.imread() to open the video file! Thanks!

However, I noticed that dask.array.image.imread() added a new dimension to the resulting array so that I got five dimensions instead of four. The first dimension had a length of 1; the second was the number of frames in the video; the third and fourth were the image dimensions; and the fifth was the color channel.

Though obviously I can "squeeze" the first dimension out, I did not expect it in the first place since imread()'s docstring says that a "dask array of all images stacked along the first dimension" will be returned. Besides, the array had only one chunk, which potentially puts the RAM at risk for large video files. In contrast, dask-image.imread.imread() does not have these issues.

@GenevieveBuckley
Copy link
Collaborator

However, I noticed that dask.array.image.imread() added a new dimension to the resulting array so that I got five dimensions instead of four. The first dimension had a length of 1; the second was the number of frames in the video; the third and fourth were the image dimensions; and the fifth was the color channel.

Though obviously I can "squeeze" the first dimension out, I did not expect it in the first place since imread()'s docstring says that a "dask array of all images stacked along the first dimension" will be returned. Besides, the array had only one chunk, which potentially puts the RAM at risk for large video files. In contrast, dask-image.imread.imread() does not have these issues.

Hm, yes. It looks like dask.array.image.imread will have one chunk per file on disk, whereas dask_image.imread.imread will chunk along the pims frames. I don't think there's a solid reason for that being the way it is. Neither set of developers work a lot with movie files (at least, I don't often), so it's possible no-one's really considered it much.

@GenevieveBuckley
Copy link
Collaborator

Since dask-image.imread() uses pims.open(), it would be great if it could mirror such functionality too.

pims.ImageIOReader.class_priority = 100  # we set this very high in order to force dask's imread() to use this reader [via pims.open()]
rgb_frames = dask_image.imread.imread('/path/to/video/file.mpg')  # uses ImageIOReader

This does work, from what I can see. It's just difficult to tell, because pims.open() hides away which handler is being used at any particular time.

This is how I checked:

  1. pip install an editable version of pims in my environment (using this bugfix branch, since you want to use the ImageIOReader)
  2. Then I added a print statement print(handler) right above line 201 here to see which handlers get tried.
  3. Make sure both PyAV and imageio-ffmpeg are installed. By default PyAV has a higher class priority.
  4. Start a new python session and open the test video file with dask_image. I can see the PyAV reader is used internally by pims to open the file (because it prints out the handler name).
  5. Adjust the class priority: import pims; pims.ImageIOReader.class_priority = 100
  6. Open the test video file again with dask_image, and note that pims is now using the ImageIOReader to open the file.

@GenevieveBuckley
Copy link
Collaborator

To summarize, this thread brings up two points:

  1. It would be good if the pims class priority could be used together with dask-image. I believe this is already possible, discussed in this comment.
  2. The dask.array.image.imread docs are slightly misleading. I've opened Clarify chunking in imread docstring dask#9082 to address that point.

Is there anything else I've missed, or you're still having trouble with?

@ParticularMiner
Copy link
Author

ParticularMiner commented May 13, 2022

Thanks for your reply.

Sorry, it seems my original post was not clear. What I meant was that I was aware that the following code-snippet does work for single and multi-threaded schedulers. But not for multi-process schedulers. And probably not for distributed-memory schedulers either.

pims.ImageIOReader.class_priority = 100  # we set this very high in order to force dask's imread() to use this reader [via pims.open()]
rgb_frames = dask_image.imread.imread('/path/to/video/file.mpg')  # uses ImageIOReader

rgb_frames.compute(scheduler='single-threaded')  # works
rgb_frames.compute(scheduler='threading')  # works
rgb_frames.compute(scheduler='processes')  # does not work

@jakirkham
Copy link
Member

Yeah the single and thread schedulers share the same process memory. So if the priority is set in that process, that is sufficient. All workers view that same memory.

With the process scheduler, different processes have their own process memory space and it isn't shared. Setting information in one does not necessarily get communicated to another. So one would need to do this during process startup. This is handled here. The simplest solution is to just provide your own ProcessPoolExecutor, which would have whatever initializer one used to created the Executor. Would follow this for guidance on setting it up.

An alternative solution, would be to add some kind of dask.config parameter (like these), which would allow one to change the initializer. This could be a totally different function that is run instead. Or perhaps one that gets run as part of the initializer process just later on in that function. This is probably a reasonable PR to do if you wanted to go that way.

Distributed would likely have the same issue for the same reason. However there are a lot more options there. For example preload scripts would work. If you are planning on doing process based execution, would suggest just using Distributed. It has a centralized scheduler, the ability to work with Futures, a rich diagnostic dashboard, etc. Generally this will be a better experience. Also will be easier to go from there to a cluster, the cloud, etc. as needed.

@ParticularMiner
Copy link
Author

ParticularMiner commented May 13, 2022

Many thanks @jakirkham !

The simplest solution is to just provide your own ProcessPoolExecutor ...

I followed your first suggestion since that was the easiest one to understand (as you guessed 😄). And it works (see the following code-snippet)!

import dask_image
import pims


def initialize_worker_process():
    """
    Initialize a worker process before running any tasks in it.
    """
    # If Numpy is already imported, presumably its random state was
    # inherited from the parent => re-seed it.
    import sys
    
    np = sys.modules.get("numpy")
    if np is not None:
        np.random.seed()
    
    # We increase the priority of ImageIOReader in order to force dask's 
    # imread() to use this reader [via pims.open()]
    pims.ImageIOReader.class_priority = 100


def get_pool_with_reader_priority_set(num_workers=None):
    import os
    from dask import config
    from dask.system import CPU_COUNT
    from dask.multiprocessing import get_context
    from concurrent.futures import ProcessPoolExecutor
    
    num_workers = num_workers or config.get("num_workers", None) or CPU_COUNT
    if os.environ.get("PYTHONHASHSEED") in (None, "0"):
        # This number is arbitrary; it was chosen to commemorate
        # https://github.com/dask/dask/issues/6640.
        os.environ["PYTHONHASHSEED"] = "6640"
    context = get_context()
    return ProcessPoolExecutor(
        num_workers, mp_context=context, initializer=initialize_worker_process
    )


rgb_frames = dask_image.imread.imread('/path/to/video/file.mpg') 
rgb_frames.compute(scheduler='processes', pool=get_pool_with_reader_priority_set())   # uses ImageIOReader

I suppose a PR that helps the end-user avoid getting his/her hands dirty with the innards of multi-process scheduler technology would be a good idea.

But before that, perhaps I should try dask.distributed ...

@ParticularMiner
Copy link
Author

@jakirkham

Would the following idea sit well with you?

The idea is to add a new keyword argument, say initializer=None (intended to be a callable) to dask.multiprocessing.get():

https://github.com/dask/dask/blob/137206dc04eb62424617a068405545b26db99a6f/dask/multiprocessing.py#L145-L156

so that we can later replace the following call currently within its body:

        pool = ProcessPoolExecutor(
            num_workers, mp_context=context, initializer=initialize_worker_process
        )

with:

        pool = ProcessPoolExecutor(
            num_workers, mp_context=context, initializer=initializer or initialize_worker_process
        )

This would enable the end-user to pass his/her own process initializer function to compute() or dask.config.set() (if using a context manager).

@jakirkham
Copy link
Member

That seems like a reasonable starting point. There may be a few things to firm up, but it is probably easier to discuss these in a PR. Would suggest sending a draft PR to Dask and we can go from there 🙂

@jakirkham
Copy link
Member

Initializer customization added in PR ( dask/dask#9087 ), which should be in the next Dask release

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants