Update dependency spacy to v3 #70

renovate · 2023-02-06T14:02:05Z

This PR contains the following updates:

Package	Change	Age	Adoption	Passing	Confidence
spacy (source, changelog)	`==2.3.9` -> `==3.7.2`

Release Notes

explosion/spaCy (spacy)

`v3.7.2`: : Fixes for APIs and requirements

Compare Source

✨ New features and improvements

Update __all__ fields (#13063).

🔴 Bug fixes

#13035: Remove Pathy requirement.
#13053: Restore spacy.cli.project API.
#13057: Support Any comparisons for Token and Span.

📖 Documentation and examples

Many updates for spacy-llm including Azure OpenAI, PaLM, and Mistral support.
Various documentation corrections.

👥 Contributors

@adrianeboyd, @honnibal, @ines, @rmitsch, @svlandeg

`v3.7.1`: : Bug fix for spacy.cli module loading

Compare Source

🔴 Bug fixes

Revert lazy loading of CLI module for spacy.info to fix availability of spacy.cli following import spacy (#13040).

👥 Contributors

@adrianeboyd, @honnibal, @ines, @svlandeg

`v3.7.0`: : Trained pipelines using Curated Transformers and support for Python 3.12

Compare Source

This release drops support for Python 3.6 and adds support for Python 3.12.

✨ New features and improvements

Add support for Python 3.12 (#12979).
Use the new library Weasel for spaCy projects functionality (#12769).
- All spacy project commands should run as before, just now they're using Weasel under the hood.
- ⚠️ Remote storage is not yet supported for Python 3.12. Use Python 3.11 or earlier for remote storage.
Extend to Thinc v8.2 (#12897).
Extend transformers extra to spacy-transformers v1.3 (#13025).
Support registered vectors (#12492).
Add --spans-key option for CLI evaluation with spacy benchmark accuracy (#12981).
Load the CLI module lazily for spacy.info (#12962).
Add type stubs for spacy.training.example (#12801).
Warn for unsupported pattern keys in dependency matcher (#12928).
Language.replace_listeners: Pass the replaced listener and the tok2vec pipe to the callback in order to support spacy-curated-transformers (#12785).
Always use tqdm with disable=None to disable output in non-interactive environments (#12979).
Language updates:
- Add left and right pointing angle brackets as punctuation to ancient Greek (#12829).
- Update example sentences for Turkish (#12895).
Package setup updates:
- Update NumPy build constraints for NumPy 1.25+ (#12839). For Python 3.9+, it is no longer necessary to set build constraints while building binary wheels.
- Refactor Cython profiling in order to disable profiling for Python 3.12 in the package setup, since Cython does not currently support profiling for Python 3.12 (#12979).

📦 Trained pipelines updates

The transformer-based trf pipelines have been updated to use our new Curated Transformers library through the Thinc model wrappers and pipeline component from spaCy Curated Transformers.

⚠️ Backwards incompatibilities

Drop support for Python 3.6.
Drop mypy checks for Python 3.7.
Remove ray extra.
spacy project has a few backwards incompatibilities due to the transition to the standalone library Weasel, which is not as tightly coupled to spaCy. Weasel produces warnings when it detects older spaCy-specific settings in your environment or project config.
- Support for the spacy_version configuration key has been dropped.
- Support for the check_requirements configuration key has been dropped due to the deprecation of pkg_resources.
- The SPACY_CONFIG_OVERRIDES environment variable is no longer checked. You can set configuration overrides using WEASEL_CONFIG_OVERRIDES.
- Support for SPACY_PROJECT_USE_GIT_VERSION environment variable has been dropped.
- Error codes are now Weasel-specific and do not follow spaCy error codes.

📖 Documentation and examples

New and updated documentation for large language models and spaCy Curated Transformers.
Various documentation corrections and updates.
New additions to the spaCy Universe:
- Hobbit spaCy: NLP for Middle Earth
- rolegal: a spaCy Package for Noisy Romanian Legal Document Processing

👥 Contributors

@adrianeboyd, @bdura, @connorbrinton, @danieldk, @davidberenstein1957, @denizcodeyaa, @eltociear, @evornov, @honnibal, @ines, @jmyerston, @koaning, @magdaaniol, @pdhall99, @ringohoffman, @rmitsch, @senisioi, @shadeMe, @svlandeg, @vinbo8, @wjbmattingly

`v3.6.1`: : Support for Pydantic v2, find-function CLI and more

Compare Source

✨ New features and improvements

Allow Pydantic v2 using transitional v1 support (#12888).
Add find-function CLI for finding locations of registered functions (#12757).
Add extra spacy[cuda12x] for cupy-cuda12x (#12890).
Extend tests for init config and train CLI (#12173).
Switch from distutils to setuptools/sysconfig (#12853).

🔴 Bug fixes

#12817: Escape annotated HTML tags in displaCy span renderer.
#12857: Display model's full base version string in incompatibility warning.
#12882: Update <br> tags in displaCy.

📖 Documentation and examples

Various documentation corrections and updates.
New additions to spaCy Universe:
- OdyCy
- SaysWho

👥 Contributors

@adrianeboyd, @afriedman412, @arplusman, @bdura, @connorbrinton, @honnibal, @ines, @it176131, @pmbaumgartner, @rmitsch, @shadeMe, @svlandeg, @thomashacker, @victorialslocum, @x-tabdeveloping

`v3.6.0`: : New span finder component and pipelines for Slovenian

Compare Source

✨ New features and improvements

NEW: span_finder pipeline component to identify overlapping, unlabeled spans (#12507).
Language updates:
- Add initial support for Malay (#12602).
- Update Latin defaults to support noun chunks, update lexical/tokenizer defaults and add example sentences (#12538).
Add option to return scores separately keyed by component name with spacy evaluate --per-component, Language.evaluate(per_component=True) and Scorer.score(per_component=True) (#12540).
Support custom token/lexeme attribute for vectors (#12625).
Support spancat_singlelabel in spacy debug data CLI (#12749).
Typing updates for PhraseMatcher and SpanGroup (#12642, #12714).

🔴 Bug fixes

#12569: Require that all SpanGroup spans come from the current doc.

📦 Trained pipelines updates

We have added new pipelines for Slovenian that use the trainable lemmatizer and floret vectors.

Package	UPOS	Parser LAS	NER F
`sl_core_news_sm`	96.9	82.1	62.9
`sl_core_news_md`	97.6	84.3	73.5
`sl_core_news_lg`	97.7	84.3	79.0
`sl_core_news_trf`	99.0	91.7	90.0

🙏 Special thanks to @orglce for help with the new pipelines!

The English pipelines have been updated to improve handling of contractions with various apostrophes and to lemmatize "get" as a passive auxiliary.

The Danish pipeline da_core_news_trf has been updated to use vesteinn/DanskBERT with performance improvements across the board.

⚠️ Backwards incompatibilities

SpanGroup spans are now required to be from the same doc. When initializing a SpanGroup, there is a new check to verify that all added spans refer to the current doc. Without this check, it was possible to run into string store or other errors.

📖 Documentation and examples

Various documentation corrections and updates.
New additions to spaCy Universe:

👥 Contributors

@adrianeboyd, @bdura, @danieldk, @davidberenstein1957, @diyclassics, @essenmitsosse, @honnibal, @ines, @isabelizimm, @jmyerston, @kadarakos, @KennethEnevoldsen, @khursani8, @ljvmiranda921, @rmitsch, @shadeMe, @svlandeg, @tomaarsen, @victorialslocum, @vin-ivar, @ZiadAmerr

`v3.5.4`: : Bug fixes for overrides with registered functions and sourced components with listeners

Compare Source

✨ New features and improvements

Extend Typer support to v0.9 (#12631).

🔴 Bug fixes

#12701: Fix issues with component names and listeners for sourced components.
#12623: Support overrides for registered functions in configs.

👥 Contributors

@adrianeboyd, @bdura, @honnibal, @ines, @svlandeg

`v3.5.3`: : Speed improvements, bug fixes and more

Compare Source

✨ New features and improvements

Huge speed improvements for spancat, in particular on GPU (~10x-30x faster) (#12577).
Improve speed for child operators (>+, >-, >++, >--) for the dependency matcher (#12528).
Improve loading speed for tokenizers with a large number of exceptions (#12553).
Support doc.spans for displaCy output in spacy benchmark accuracy / spacy evaluate (#12575).
Add MorphAnalysis.get(default=) argument for user-provided default values similar to dict (#12545).
Only perform vectors checks during initialization if there are sourced components (#12607).

🔴 Bug fixes

#12567: Remove #egg from download URLs due to future deprecation in pip.

📖 Documentation and examples

Various documentation corrections and updates.
New additions to spaCy Universe:
- LatinCy
- parsigs
- spaCysee
- spacy-wasm

👥 Contributors

@adrianeboyd, @andyjessen, @bdura, @davidberenstein1957, @diyclassics, @honnibal, @ines, @kadarakos, @KennethEnevoldsen, @ljvmiranda921, @moxley01, @royashcenazi, @svlandeg, @tanloong, @victorialslocum

`v3.5.2`: : Pretraining improvements, bug fixes for spans and spancat and more

Compare Source

✨ New features and improvements

Add support for floret vectors in spacy pretrain (#12435).
Save final model as model-last.bin for spacy pretrain (#12459).
Support Span input for displacy.parse_deps (#12477).
Extend support to CuPy 12.0 for cupy install extras.

🔴 Bug fixes

#12398: Fix entity linker failure on sentence-crossing entities.
#12405: Fix sentence indexing bug in Span.sents.
#12469: Fix scores attribute for spancat_singlelabel.
#12484: Fix Span.sents when the final sentence is the last token in a Doc.
#12486: Fix pickle for the ngram suggester.
#12493: Include Span.kb_id and Span.id strings in Doc and DocBin serialization.

📖 Documentation and examples

Various documentation corrections and updates.
New addition to spaCy Universe:
- Sentimental Onix

👥 Contributors

@adrianeboyd, @BLKSerene, @honnibal, @ines, @kadarakos, @prajakta-1527, @rmitsch, @shadeMe, @sloev, @svlandeg, @thomashacker, @willfrey

`v3.5.1`: : spancat for multi-class labeling, fixes for textcat+transformers and more

Compare Source

💥 We'd love to hear more about your experience with spaCy! Take our survey here.

✨ New features and improvements

NEW: spancat_singlelabel pipeline component for multi-class and non-overlapping span classification. The spancat_singlelabel component predicts at most one label for each suggested span and adds a new setting allow_overlap to restrict the output to non-overlapping spans (#11365).
Extend to mypy v1.0 (#12245).
Use transformer + CNN for efficient GPU textcat with spacy init config (#11900).
Support trainable lemmatizer in spacy debug data (#11419).
Add new operators to dependency matcher for left/right immediate child/parent nodes (>+, >-, <+, <-) (#12334).
Add spacy.PlainTextCorpusReader.v1 for plain text input (#12122).
Add alignment_mode and span_id to Span.char_span() (#12145, #12196).
Use string formatting types in logging calls (#12215).

🔴 Bug fixes

#12017: Improve speed for top_k>1 in trainable lemmatizer.
#12048: Make test_cli_find_threshold() test more robust.
#12227: Fix return type of registry.find().
#12272: Fix speed regression for Matcher patterns with extension attributes.
#12287: Add grc to languages with lexeme norms in spacy-lookups-data.
#12320: Make generation of empty KnowledgeBase instances configurable.
#12343: Fix error message for displacy auto_select_port.
#12347: Fix length check for knowledge base in entity linker, add InMemoryLookupKB.is_empty.
#12365: Fix types for Lexeme.orth and Lexeme.lower.
#12366: Raise error for non-default vectors with PretrainVectors.
#12368: Partially address pending deprecation of pkg_resources.
Various improvements and fixes for the test suite (#12148, #12157, #12210, #12303, #12372).

📖 Documentation and examples

Many website updates to improve accessibility.
Various documentation corrections and updates.
New projects:
- Span labeling datasets
- Comparing embedding layers in spaCy from the technical report Multi hash embeddings in spaCy

👥 Contributors

@adrianeboyd, @andyjessen, @danieldk, @essenmitsosse, @honnibal, @ines, @itssimon, @kadarakos, @kwhumphreys, @ljvmiranda921, @pmbaumgartner, @polm, @richardpaulhudson, @rmitsch, @shadeMe, @svlandeg, @tanloong, @thomashacker, @victorialslocum

`v3.5.0`: : New CLI commands, language updates, bug fixes and much more

Compare Source

✨ New features and improvements

NEW: New apply CLI command to annotate new documents with a trained pipeline (#11376).
NEW: New benchmark CLI command to benchmark pipelines. The new benchmark speed subcommand measures the speed of a pipeline, the benchmark accuracy subcommand is a new alias for evaluate (#11902).
NEW: New find-threshold CLI command to identify an optimal threshold for classification models (#11280).
NEW: New FUZZY Matcher operator for fuzzy matches based on Levenshtein edit distance. In addition, the FUZZY and REGEX operators are now supported in combination with IN/NOT_IN. (#11359).
Language updates for Ancient Greek, Dutch, Russian, Slovenian and Ukrainian (#11345, #11162, #11426, #11753, #11811, #11997, more details below).
Allow up to typer v0.7.x (#11720), mypy 0.990 (#11801) and typing_extensions v4.4.x (#12036).
New spacy.ConsoleLogger.v3 with expanded progress tracking (#11972).
Improved scoring behavior for textcat with spacy.textcat_scorer.v2 (#11696 and #11971) and spacy.textcat_multilabel_scorer.v2 (#11820).
Improved customizability of the knowledge base used for entity linking, with the default implementation being the new InMemoryLookupKB (#11268).
Optional before_update callback that is invoked at the start of each training step (#11739).
Improve performance of SpanGroup (#11380).
Improve UX around displacy.serve when the default port is in use (#11948).
Patch a security vulnerability in extracting tar files (#11746).
Add equality definition for vectors (#11806).
Allow interpolation of variables in directory names in projects (#11235).
Update default component configs to use the latest tok2vec version (#11618).

🔴 Bug fixes

#11382: Fix lookup behavior for the French and Catalan lemmatizers.
#11385: Ensure that downstream components can train properly on a frozen tok2vec or transformer layer.
#11762: Support local file system remotes for projects.
#11763: Raise an error when unsupported values are used for textcat.
#11834: Ensure Vocab.to_disk respects the exclude setting for lookups and vectors.
#12009: Fix a few typing issues for SpanGroup and Span objects.
#12098: Correctly handle missing annotations in the edit tree lemmatizer.

⚠️ Backwards incompatibilities and model updates

The following changes may require you to update code that is using the relevant functionality:

An error is now raised when unsupported values are given as input to train a textcat or textcat_multilabel model - ensure that values are 0.0 or 1.0 as explained in the docs.
As KnowledgeBase is now an abstract class, you should call the constructor of the new InMemoryLookupKB instead when you want to use spaCy's default KB implementation. If you've written a custom KB that inherits from KnowledgeBase, you'll need to implement its abstract methods, or alternatively inherit from InMemoryLookupKB instead.

The following changes may influence the output of your language pipeline or trained models:

Updates to language defaults:
- Extended support for Slovenian (#11162).
- Switch Russian and Ukrainian lemmatizers to pymorphy3 (#11345, #11811).
- Support for editorial punctuation in Ancient Greek (#11426).
- Update to Russian tokenizer exceptions (#11753).
- Small fix in the list of Dutch stop words (#11997).
Updates to model defaults:
- Use the latest tok2vec defaults in all components (#11618).
- Improve the default attributes used for the textcat and textcat_multilabel components (#11698).
- Update the default scorer for textcat and textcat_multilabel to fix a bug related to threshold for textcat and to make it possible to score multiple textcat/textcat_multilabel components in a single pipeline with custom scorers. If no custom scorers are used, the cat_p/r/f scores will now only reflect the final component's labels and performance (#11696, #11820).
- Correct the token_acc score to report the intended measure (# correct tokens / # predicted tokens, the same as in spaCy v2). The token_acc scores for v3.5 will be lower for the same performance because they were incorrectly inflated in v3.0-v3.4. The token_p/r/f scores should remain unchanged (#12073).

The following functionality will be changed in the near future - so it's best to start updating your scripts now to make them more generic:

From v4 onwards, we'll rename the master branch to main.

📦 Trained pipelines updates

The CNN pipelines add IS_SPACE as a tok2vec feature for tagger and morphologizer components to improve tagging of non-whitespace vs. whitespace tokens.
The transformer pipelines require spacy-transformers v1.2, which uses the exact alignment from tokenizers for fast tokenizers instead of the heuristic alignment from spacy-alignments. For all trained pipelines except ja_core_news_trf, the alignments between spaCy tokens and transformer tokens may be slightly different. More details about the spacy-transformers changes in the v1.2.0 release notes.

📖 Documentation and examples

We've ported our website from Gatsby to Next 🥳
Updated the documentation on supported languages.
Added a note about experimental M1 GPU support to the installation quickstart.
Included documentation for the biluo_to_iob and iob_to_biluo functions.
Fixed model links in the v3.4 usage documentation.
Removed "new" tags of functionality from spaCy v2.x.
Various small additions, spelling and typo fixes.
spaCy Universe additions:
- greCy: Providing Ancient Greek models
- spacy-pythainlp: Add Thai support for spaCy
New projects:
- Accelerate NER with Speedster (experimental)

👥 Contributors

@aaronzipp, @adrianeboyd, @albertvillanova, @ArchiDevil, @cfuerbachersparks, @damian-romero, @danieldk, @darigovresearch, @DSLituiev, @essenmitsosse, @gremur, @honnibal, @ines, @jmyerston, @JosPolfliet, @kadarakos, @koaning, @kwhumphreys, @ljvmiranda921, @MarcoGorelli, @orglce, @pmbaumgartner, @polm, @richardpaulhudson, @rmitsch, @ryndaniels, @shadeMe, @svlandeg, @thomashacker, @TrellixVulnTeam, @wannaphong, @zhiiw, @zrpxx

`v3.4.4`: : Bug fixes and future NumPy compatibility

Compare Source

This bug fix release is primarily to avoid deprecation warnings and future incompatibility with NumPy v1.24+.

🔴 Bug fixes

#11845: Don't raise an error in displaCy for unset spans keys.
#11860: Fix spancat for docs with zero suggestions.
#11864: Add smart_open requirement and update deprecated options.
#11899: Fix spacy init config --gpu for environments without spacy-transformers.
#11933: Update for compatibility with NumPy v1.24+ integer conversions.
#11934: Add strings when initializing from labels in EditTreeLemmatizer.
#11935: Restore missing error messages for beam search.

👥 Contributors

@adrianeboyd, @danieldk, @honnibal, @ines, @polm, @svlandeg

`v3.4.3`: : Extended Typer support and bug fixes

Compare Source

✨ New features and improvements

Extend Typer support to v0.7.x (#11720).

🔴 Bug fixes

#11640: Handle docs with no entities in EntityLinker.
#11688: Restore custom doc extension values in Doc.to_json() for attributes set by getters.
#11706: Remove incorrect warning for pipeline_package.load().
#11735: Improve spacy project requirements checks for unsupported specifiers and requirements lines.
#11745: Revert modifications to spacy.load(disable=) that could enable currently disabled components.

👥 Contributors

@aaronzipp, @adrianeboyd, @honnibal, @ines, @polm, @rmitsch, @ryndaniels, @svlandeg, @thomashacker

`v3.4.2`: : Latin and Luganda support, Python 3.11 wheels and more

Compare Source

✨ New features and improvements

NEW: Luganda language support (#10847).
NEW: Latin language support (#11349).
NEW: spacy.ConsoleLogger.v2 optionally saves training logs to JSONL (#11214).
NEW: New operators for the DependencyMatcher to include matching parents or children to the left or the right of the node (#10371).
Prebuilt Python 3.11 wheels are now available for all spaCy dependencies distributed by @explosion.
Support pydantic v1.10 and mypy 0.980+, drop mypy support for Python 3.6 (#11546, #11635).
Support CuPy v11 and add extras for cuda11x and cuda-autodetect (using cupy-wheel) (#11279).
Support custom attributes for tokens and spans in Doc.to_json() and Doc.from_json() (#11125).
Make the enable and disable options for spacy.load() more consistent (#11459).
Allow a single string argument for disable/enclude/exclude for spacy.load() (#11406).
New --url flag for spacy info to print the direct download URL for a pipeline (#11175).
Add a check for missing requirements in the spacy project CLI (#11226).
Add a Levenshtein distance function (#11418).
Improvements to the spacy debug data CLI for spancat data (#11504).
Allow overriding spacy_version in spacy package metadata (#11552).
Improve the error message when using the wrong command for spacy project assets (#11458).
Ensure parent directories are created when storing the results of the spacy pretrain command (#11210).
Extend support to newer versions of natto-py for the ko extra (#11222).

📦 Trained pipelines updates

This release includes updated English pipelines for spaCy v3.4 with improved NER performance. The updates in en_core_web_* v3.4.1 address issues related to training from data with partial named entity annotation, which led to lower NER recall in English pipeline versions v3.0.0–v3.4.0. In particular, entities that appear in the sections of the OntoNotes training data without NER annotation were not predicted consistently by the earlier pipeline versions, such as names and places that are frequent in the Biblical sections, e.g., "David" and "Egypt" (see #7493).

Use spacy download to update your English pipelines to the newest version. If you'd prefer to keep using an earlier version, you can specify the version directly with e.g. spacy download -d en_core_web_sm-3.4.0. You can check that you are using the new version (v3.4.1) with spacy validate:

NAME                     SPACY            VERSION
en_core_web_md           >=3.4.0,<3.5.0   3.4.1     ✔

🔴 Bug fixes

#11275: Fix Dutch noun chunks to skip overlapping spans.
#11276: Fix regex invalid escape sequences.
#11312: Better handling of unexpected types in SetPredicate.
#11460: Fix config validation failures caused by NVTX pipeline wrappers.
#11506: Avoid unwanted side effects in Doc.__init__.
#11540: Preserve missing entity annotation in augmenters.
#11592: Fix issues with DVC commands.
#11631: Fix initialization for pymorphy2_lookup lemmatizer mode for Russian and Ukrainian.

⚠️ Backwards incompatibilities

If you're using a custom component that does not return a Doc type, an error will now be raised (#11424).
If you're using a dot in a factory name, an error is raised as this is not supported (#11336).

📖 Documentation and examples

Added documentation for the new experimental coref component.
Added Ukrainian trained pipelines to the website.
Added documentation for the spacy.models_and_pipes_with_nvtx_range.v1 callback.
Fix English pipeline names in v3.4 release notes.
Various fixes to the Example API documentation.
Extensions and improvements to the displacy docs.
Fix the example command for spacy project dvc.
Update example code for spacy-wordnet.
Improve API documentation around the initialize() function for pipeline components.
Fix various typos and inconsistencies.
spaCy universe additions:
- concepCy: A spaCy wrapper for ConceptNet.
- spaCy partial tagger: build a CRF tagger with a partially annotated dataset.
- Zshot: Zero and Few shot named entity & relationships recognition.

👥 Contributors

@adrianeboyd, @bdura, @danieldk, @diyclassics, @DSLituiev, @GabrielePicco, @honnibal, @ines, @JulesBelveze, @kadarakos, @ljvmiranda921, @ninjalu, @pmbaumgartner, @polm, @radandreicristian, @richardpaulhudson, @rmitsch, @shadeMe, @stefawolf, @svlandeg, @thomashacker, @tobiusaolo, @tzussman , @yasufumy

`v3.4.1`: : Fix compatibility with CuPy v9.x

Compare Source

🔴 Bug fixes

Fix issue #11137: Fix compatibility with CuPy v9.x.

📖 Documentation and examples

spaCy universe additions:
- BERTopic: Leveraging BERT and c-TF-IDF to create easily interpretable topics.
- English Interpretation Sentence Pattern: English interpretation for accurate translation from English to Japanese.

👥 Contributors

@adrianeboyd, @danieldk, @honnibal, @ines, @lll-lll-lll-lll, @Lucaterre, @MaartenGr, @mr-bjerre, @polm, @radenkovic

`v3.4.0`: : Updated types, speed improvements and pipelines for Croatian

Compare Source

✨ New features and improvements

Support for mypy 0.950+ and pydantic v1.9 (#10786).
Prebuilt linux aarch64 wheels are now available for all spaCy dependencies distributed by @explosion.
Min/max {n,m} operator for Matcher patterns (#10981).
Language updates:
- Improve tokenization for Cyrillic combining diacritics (#10837).
- Improve English tokenizer exceptions for contractions with this/that/these/those (#10873).
Improved speed of vector lookups (#10992).
For the parser, use C saxpy/sgemm provided by the Ops implementation in order to use Accelerate through thinc-apple-ops (#10773).
Improved speed of Example.get_aligned_parse and Example.get_aligned (#10952).
Improved speed of StringStore lookups (#10938).
Updated spacy project clone to try both main and master branches by default (#10843).
Added confidence threshold for named entity linker (#11016).
Improved handling of Typer optional default values for init_config_cli (#10788).
Added cycle detection in parser projectivization methods (#10877).
Added counts for NER labels in debug data (#10960).
Support for adding NVTX ranges to TrainablePipe components (#10965).
Support env variable SPACY_NUM_BUILD_JOBS to specify the number of build jobs to run in parallel with pip (#11073).

📦 Trained pipelines updates

We have added new pipelines for Croatian that use the trainable lemmatizer and floret vectors.

Package	UPOS	Parser LAS	NER F
`hr_core_news_sm`	96.6	77.5	76.1
`hr_core_news_md`	97.3	80.1	81.8
`hr_core_news_lg`	97.5	80.4	83.0

🙏 Special thanks to @gtoffoli for help with the new pipelines!

The English pipelines have new word vectors:

Package	Model Version	TAG	Parser LAS	NER F
`en_core_news_md`	v3.3.0	97.3	90.1	84.6
`en_core_news_md`	v3.4.0	97.2	90.3	85.5
`en_core_news_lg`	v3.3.0	97.4	90.1	85.3
`en_core_news_lg`	v3.4.0	97.3	90.2	85.6

All CNN pipelines have been extended to add whitespace augmentation.

🔴 Bug fixes

Fix issue #10960: Support hyphens in NER labels.
Fix issue #10994: Fix horizontal spacing for spans in displaCy.
Fix issue #11013: Check for any token with a vector in Doc.has_vector, distinguish 0-vectors and missing vectors in similarity warnings.
Fix issue #11056: Don't use get_array_module in textcat.
Fix issue #11092: Fix vertical alignment for spans in displaCy.

🚀 Notes about upgrading from v3.3

Doc.has_vector now matches Token.has_vector and Span.has_vector: it returns True if at least one token in the doc has a vector rather than checking only whether the vocab contains vectors.

📖 Documentation and examples

spaCy universe additions:
- Aim-spacy: An Aim-based spaCy experiment tracker.
- Asent: Fast, flexible and transparent sentiment analysis.
- spaCy fishing: Named entity disambiguation and linking on Wikidata in spaCy with Entity-Fishing.
- spacy-report: Generates interactive reports for spaCy models.

👥 Contributors

@adrianeboyd, @danieldk, @ericholscher, @gorarakelyan, @honnibal, @ines, @jademlc, @kadarakos, @KennethEnevoldsen, @koaning, @Lucaterre, @maxTarlov, @philipvollet, @pmbaumgartner, @polm, @richardpaulhudson, @rmitsch, @sadovnychyi, @shadeMe, @shen-qin, @single-fingal, @svlandeg, @victorialslocum, @Zackere

`v3.3.3`: : Bug fixes for Pydantic and pip

Compare Source

This bug fix release is primarily to address Pydantic incompatibility with typing_extensions>=4.6.0.

✨ New features and improvements

Huge speed improvements for spancat, in particular on GPU (~10x-30x faster) (#12577).

🔴 Bug fixes

Add typing_extensions requirement due to Pydantic incompatibility with typing_extensions>=4.6.0.
Remove #egg from download URLs due to future deprecation in pip.

👥 Contributors

@adrianeboyd, @honnibal, @ines, @kadarakos, @svlandeg

`v3.3.2`: : Bug fixes and future NumPy compatibility

Compare Source

This bug fix release is primarily to avoid deprecation warnings and future incompatibility with NumPy v1.24+.

🔴 Bug fixes

#10911, #11194: Improve speed in precomputable_biaffine by avoiding concatenation.
#11276, #11331, #11701:

Configuration

📅 Schedule: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

♻ Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 Ignore: Close this PR and you won't be reminded about this update again.

If you want to rebase/retry this PR, check this box

This PR has been generated by Mend Renovate. View repository job log here.

codecov-commenter · 2023-03-10T12:14:12Z

Codecov Report

Merging #70 (0a6c990) into master (16e694d) will not change coverage.
The diff coverage is n/a.

❗ Current head 0a6c990 differs from pull request most recent head 0e9cea1. Consider uploading reports for the commit 0e9cea1 to get more accurate results

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

@@          Coverage Diff           @@
##           master     #70   +/-   ##
======================================
  Coverage    0.38%   0.38%           
======================================
  Files          16      16           
  Lines        1035    1035           
======================================
  Hits            4       4           
  Misses       1031    1031

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

renovate bot force-pushed the renovate/spacy-3.x branch from 2d96b5c to e140fdb Compare March 10, 2023 12:11

renovate bot force-pushed the renovate/spacy-3.x branch from e140fdb to 0a6c990 Compare April 12, 2023 08:56

renovate bot force-pushed the renovate/spacy-3.x branch from 0a6c990 to 5a805ac Compare May 28, 2023 10:14

renovate bot force-pushed the renovate/spacy-3.x branch from 5a805ac to 4ded11e Compare June 28, 2023 17:48

renovate bot force-pushed the renovate/spacy-3.x branch from 4ded11e to 8cda527 Compare July 7, 2023 09:47

renovate bot force-pushed the renovate/spacy-3.x branch from 8cda527 to 0e515c1 Compare August 8, 2023 15:16

renovate bot force-pushed the renovate/spacy-3.x branch 2 times, most recently from 0a7a468 to 0e9cea1 Compare October 5, 2023 06:59

renovate bot force-pushed the renovate/spacy-3.x branch from 0e9cea1 to 43245ca Compare October 16, 2023 16:36

Update dependency spacy to v3

0452a8d

renovate bot force-pushed the renovate/spacy-3.x branch from 43245ca to 0452a8d Compare November 6, 2023 11:38

woctezuma merged commit 3b46043 into master Nov 6, 2023
4 checks passed

woctezuma deleted the renovate/spacy-3.x branch November 6, 2023 11:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update dependency spacy to v3 #70

Update dependency spacy to v3 #70

renovate bot commented Feb 6, 2023 •

edited

Loading

codecov-commenter commented Mar 10, 2023 •

edited

Loading

Update dependency spacy to v3 #70

Update dependency spacy to v3 #70

Conversation

renovate bot commented Feb 6, 2023 • edited Loading

Release Notes

v3.7.2: : Fixes for APIs and requirements

✨ New features and improvements

🔴 Bug fixes

📖 Documentation and examples

👥 Contributors

v3.7.1: : Bug fix for spacy.cli module loading

🔴 Bug fixes

👥 Contributors

v3.7.0: : Trained pipelines using Curated Transformers and support for Python 3.12

✨ New features and improvements

📦 Trained pipelines updates

⚠️ Backwards incompatibilities

📖 Documentation and examples

👥 Contributors

v3.6.1: : Support for Pydantic v2, find-function CLI and more

✨ New features and improvements

🔴 Bug fixes

📖 Documentation and examples

👥 Contributors

v3.6.0: : New span finder component and pipelines for Slovenian

✨ New features and improvements

🔴 Bug fixes

📦 Trained pipelines updates

⚠️ Backwards incompatibilities

📖 Documentation and examples

👥 Contributors

v3.5.4: : Bug fixes for overrides with registered functions and sourced components with listeners

✨ New features and improvements

🔴 Bug fixes

👥 Contributors

v3.5.3: : Speed improvements, bug fixes and more

✨ New features and improvements

🔴 Bug fixes

📖 Documentation and examples

👥 Contributors

v3.5.2: : Pretraining improvements, bug fixes for spans and spancat and more

✨ New features and improvements

🔴 Bug fixes

📖 Documentation and examples

👥 Contributors

v3.5.1: : spancat for multi-class labeling, fixes for textcat+transformers and more

✨ New features and improvements

🔴 Bug fixes

📖 Documentation and examples

👥 Contributors

v3.5.0: : New CLI commands, language updates, bug fixes and much more

✨ New features and improvements

🔴 Bug fixes

⚠️ Backwards incompatibilities and model updates

📦 Trained pipelines updates

📖 Documentation and examples

👥 Contributors

v3.4.4: : Bug fixes and future NumPy compatibility

🔴 Bug fixes

👥 Contributors

v3.4.3: : Extended Typer support and bug fixes

✨ New features and improvements

🔴 Bug fixes

👥 Contributors

v3.4.2: : Latin and Luganda support, Python 3.11 wheels and more

✨ New features and improvements

📦 Trained pipelines updates

🔴 Bug fixes

⚠️ Backwards incompatibilities

📖 Documentation and examples

👥 Contributors

v3.4.1: : Fix compatibility with CuPy v9.x

🔴 Bug fixes

📖 Documentation and examples

👥 Contributors

v3.4.0: : Updated types, speed improvements and pipelines for Croatian

✨ New features and improvements

📦 Trained pipelines updates

🔴 Bug fixes

🚀 Notes about upgrading from v3.3

renovate bot commented Feb 6, 2023 •

edited

Loading

`v3.7.2`: : Fixes for APIs and requirements

`v3.7.1`: : Bug fix for spacy.cli module loading

`v3.7.0`: : Trained pipelines using Curated Transformers and support for Python 3.12

`v3.6.1`: : Support for Pydantic v2, find-function CLI and more

`v3.6.0`: : New span finder component and pipelines for Slovenian

`v3.5.4`: : Bug fixes for overrides with registered functions and sourced components with listeners

`v3.5.3`: : Speed improvements, bug fixes and more

`v3.5.2`: : Pretraining improvements, bug fixes for spans and spancat and more

`v3.5.1`: : spancat for multi-class labeling, fixes for textcat+transformers and more

`v3.5.0`: : New CLI commands, language updates, bug fixes and much more

`v3.4.4`: : Bug fixes and future NumPy compatibility

`v3.4.3`: : Extended Typer support and bug fixes

`v3.4.2`: : Latin and Luganda support, Python 3.11 wheels and more

`v3.4.1`: : Fix compatibility with CuPy v9.x

`v3.4.0`: : Updated types, speed improvements and pipelines for Croatian

`v3.3.3`: : Bug fixes for Pydantic and pip

`v3.3.2`: : Bug fixes and future NumPy compatibility

codecov-commenter commented Mar 10, 2023 •

edited

Loading