Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add spacy.Language as valid argument for 'spacy_pipeline' #19

Merged
merged 2 commits into from
Dec 23, 2022
Merged

add spacy.Language as valid argument for 'spacy_pipeline' #19

merged 2 commits into from
Dec 23, 2022

Conversation

dominik-schwabe
Copy link
Contributor

This commit allows to reuse an object from spacy.load for many different KeyphraseVectorizer objects. I noticed that the nlp objects gets loaded when fit is called, which makes extracting keyphrases from multiple documents super slow when a model link en_core_web_md is used.

@TimSchopf TimSchopf added the enhancement New feature or request label Dec 17, 2022
@TimSchopf
Copy link
Owner

TimSchopf commented Dec 17, 2022

Hi Dominik,

thanks for the contribution. Can you also please add a short code example and explanation on how to use the new argument in the README.md file?

Also, you can extract keyphrases from multiple documents with the same object and calling fit only once by using a list of documents as inputs. This probably solves the issue already.

Best,
Tim

@dominik-schwabe
Copy link
Contributor Author

I added a little example to the README.

Also, you can extract keyphrases from multiple documents with the same object and calling fit only once by using a list of documents as inputs. This probably solves the issue already.

Usually I make small experiments, where I inspect the results on one documents then change some things, try the changed document or try some different document. I also usually use en_core_web_md. In that setup loading a new nlp object for every new small experiment gives a delay of about 5s instead of being instantaneous.

@TimSchopf TimSchopf merged commit f5bee69 into TimSchopf:master Dec 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants