Update dependencies v0.53 #499

juhoinkinen · 2021-06-16T06:11:25Z

Dependency updates for Annif v0.53.

All pinned packages are updated to newest available, with two exceptions due to compatibility issues:

SciPy 1.5.4 is used, because SciPy 1.6 is not available for Python 3.6
NumPy 1.19.* is used, because TensorFlow 2.5 cannot use newer NumPy

A notably update is upgrading joblib to 1.0.1 from 0.17.0, maybe that helps the Annif models to remain compatible for longer in future.

~~Also installs optional python-Levenshtein package to get rid of warnings about missing it.~~
Filters warnings about not-installed python-Levenshtein package emitted by Gensim 4.0.1.

During installation there are messages like Using legacy 'setup.py install' for <somepkg>, since package 'wheel' is not installed. I suppose by installing the wheel package the messages would go away, but I wonder if it would do anything else good.

There are some negligible warnings (about PyYAML version during installation and many DeprecationWarnings from tests), but one from TensorFlow could be more serious:

/home/local/jmminkin/git/Annif/venv/lib/python3.7/site-packages/tensorflow/python/keras/utils/generic_utils.py:497: CustomMaskWarning: Custom mask layers require a config and must override get_config. When loading, the custom mask layer must be passed to the custom_objects argument.
    category=CustomMaskWarning)

For now I could not find out what exactly that is about, but it could be related to the error seen when trying to use old models in Portainer (for ai.dev.finto.fi); the container fails to start due to:

File "/usr/local/lib/python3.8/site-packages/tensorflow/python/keras/layers/core.py", line 1021, in from_config
code = marshal.loads(raw_code)
ValueError: bad marshal data (unknown type code)

However at kj-kk when I eval the old yso-fi model I see the usual UserWarnings from sklearn, and models seem to work.

codecov · 2021-06-16T06:11:34Z

Codecov Report

Merging #499 (1b3b786) into master (0fe6557) will increase coverage by 0.01%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #499      +/-   ##
==========================================
+ Coverage   99.48%   99.50%   +0.01%     
==========================================
  Files          78       78              
  Lines        5669     5672       +3     
==========================================
+ Hits         5640     5644       +4     
+ Misses         29       28       -1

Impacted Files	Coverage Δ
annif/backend/tfidf.py	`98.85% <100.00%> (+0.04%)`	⬆️
annif/backend/stwfsa.py	`100.00% <0.00%> (+1.51%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0fe6557...1b3b786. Read the comment docs.

osma · 2021-06-16T07:41:38Z

Also installs optional python-Levenshtein package to get rid of warnings about missing it.

That's a bit of a waste, because Annif isn't really using the gensim.similarities.levenshtein module. I guess another option would be to filter the warning. I wonder why Gensim decided to do it like this.

During installation there are messages like Using legacy 'setup.py install' for , since package 'wheel' is not installed. I suppose by installing the wheel package the messages would go away, but I wonder if it would do anything else good.

That might speed up installing those packages with wheels available. Other than that, it shouldn't make a difference.

There are some negligible warnings (about PyYAML version during installation and many DeprecationWarnings from tests), but one from TensorFlow could be more serious:

I'm guessing this could be related to the Lambda layer used in nn_ensemble. This warning is shown when custom layers are serialized. It's a little bit unfortunate. There is this note in the documentation for Lambda layers that explains why serialization of Lambda layers is fragile and it would perhaps be better to use a custom subclass.

For now I could not find out what exactly that is about, but it could be related to the error seen when trying to use old models in Portainer (for ai.dev.finto.fi); the container fails to start due to:

This could be another symptom of serialization/deserialization issues with Lambda layers. Keras seems to use the marshal module for serializing functions; it is not compatible across Python versions. The Docker image just switched from 3.7 to 3.8 so perhaps that could have triggered this (and not the TF update in this PR)?

osma · 2021-06-16T07:59:43Z

That's a bit of a waste, because Annif isn't really using the gensim.similarities.levenshtein module. I guess another option would be to filter the warning. I wonder why Gensim decided to do it like this.

I see now that Gensim has recently removed the dependency on the Levenshtein library altogether in this PR, but this change is not yet included in a release.

I suggest that we avoid introducing an additional, useless dependency and instead filter the warning by importing like this (in the tfidf backend and possibly elsewhere):

import warnings
with warnings.catch_warnings():
    warnings.simplefilter('ignore')
    import gensim.similarities

sonarcloud · 2021-06-16T11:03:58Z

Kudos, SonarCloud Quality Gate passed!

0 Bugs
0 Vulnerabilities
0 Security Hotspots
0 Code Smells

No Coverage information
0.0% Duplication

juhoinkinen · 2021-06-16T11:08:57Z

Updated PR description for filtering Gensim warnings.

osma · 2021-06-16T12:42:10Z

I found some more information about the CustomMaskWarning error. It appears to be a bug in TensorFlow 2.5.0 and it's related to the Add layer, not the Lambda layer that I first suspected.

I've created a draft PR #500 that replaces the Lambda layer with a custom MeanLayer. But it doesn't actually solve the CustomMaskWarning error. It might improve compatibility of saved nn_ensemble models and potentially fix the ValueError: bad marshal data (unknown type code) seen on the Docker infrastructure, but that needs more testing and is out of scope for the 0.53 release.

osma

I think this is good enough; we can't get rid of the CustomMaskWarning unless we downgrade TensorFlow, but that would also lose support for Python 3.9.

juhoinkinen added 5 commits June 15, 2021 14:31

Upgrade dependencies for v0.53

af017d1

Include Python 3.9 for unit tests in GH Actions

0c236ea

Install SciPy 1.5.4 because 1.6.* would require Python 3.7+

60d7505

Upgrade VW to latest version

3852342

Install optional python-Levenshtein to get rid of warning

d5b39c0

juhoinkinen added the maintenance label Jun 16, 2021

juhoinkinen added this to the 0.53 milestone Jun 16, 2021

juhoinkinen requested a review from osma June 16, 2021 06:11

juhoinkinen added 2 commits June 16, 2021 12:10

Remove python-Levenshtein pkg; filter warnings about missing it

cba511a

Add comment for filtering warnings about missig python-Levenshtein pkg

1b3b786

juhoinkinen marked this pull request as ready for review June 16, 2021 11:08

osma mentioned this pull request Jun 16, 2021

Implement custom MeanLayer in nn_ensemble #500

Merged

osma approved these changes Jun 16, 2021

View reviewed changes

juhoinkinen merged commit 13cd15b into master Jun 21, 2021

juhoinkinen deleted the update-dependencies-v0.53 branch June 21, 2021 08:09

osma mentioned this pull request Aug 31, 2021

spaCy analyzer #374

Closed

osma mentioned this pull request Oct 11, 2021

Flask, Connexion, Click Dependency Mismatches #533

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update dependencies v0.53 #499

Update dependencies v0.53 #499

juhoinkinen commented Jun 16, 2021 •

edited

Loading

codecov bot commented Jun 16, 2021 •

edited

Loading

osma commented Jun 16, 2021

osma commented Jun 16, 2021 •

edited

Loading

sonarcloud bot commented Jun 16, 2021

juhoinkinen commented Jun 16, 2021

osma commented Jun 16, 2021

osma left a comment

Update dependencies v0.53 #499

Update dependencies v0.53 #499

Conversation

juhoinkinen commented Jun 16, 2021 • edited Loading

codecov bot commented Jun 16, 2021 • edited Loading

Codecov Report

osma commented Jun 16, 2021

osma commented Jun 16, 2021 • edited Loading

sonarcloud bot commented Jun 16, 2021

juhoinkinen commented Jun 16, 2021

osma commented Jun 16, 2021

osma left a comment

Choose a reason for hiding this comment

juhoinkinen commented Jun 16, 2021 •

edited

Loading

codecov bot commented Jun 16, 2021 •

edited

Loading

osma commented Jun 16, 2021 •

edited

Loading