Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated featurizers #4935

Merged
merged 278 commits into from
Dec 17, 2019
Merged

Updated featurizers #4935

merged 278 commits into from
Dec 17, 2019

Conversation

tabergma
Copy link
Contributor

@tabergma tabergma commented Dec 10, 2019

Proposed changes:

  • Add option use_cls_token to all tokenizers. If it is set to True, the token __CLS__ will be added to the end of the list of tokens.
  • Add option return_sequence to all featurizers. By default all featurizers return a matrix of size
    (1 x feature-dimension). If the option return_sequence is set to True, the corresponding featurizer will return a matrix of size (token-length x feature-dimension).
  • Split featurizers into sparse and dense featurizers.
  • Remove NGramFeaturizer. Please use CountVectorsFeaturizer instead.
  • To use custom features in the CRFEntityExtractor use text_dense_features instead of ner_features. If text_dense_features are present in the feature set, the CRFEntityExtractor will automatically make use of them. Just make sure to add a dense featurizer in front of the CRFEntityExtractor in your pipeline and set the flag return_sequence to True for that featurizer.

closes #4957
part of https://github.com/RasaHQ/research/issues/54

Status (please check what you already did):

  • added some tests for the functionality
  • updated the documentation
  • updated the changelog (please check changelog for instructions)
  • reformat files using black (please check Readme for instructions)

@tabergma tabergma requested a review from tmbo December 12, 2019 12:34
@tmbo
Copy link
Member

tmbo commented Dec 16, 2019

@tabergma you can change the model compatibility version in rasa/constants.py

Copy link
Member

@tmbo tmbo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great work!

I've added some style suggestions but from my perspective this is ready to go

changelog/4935.feature.rst Outdated Show resolved Hide resolved
changelog/4935.removal.rst Show resolved Hide resolved
changelog/4935.removal.rst Outdated Show resolved Hide resolved
changelog/4957.removal.rst Show resolved Hide resolved
rasa/constants.py Outdated Show resolved Hide resolved
rasa/nlu/tokenizers/tokenizer.py Show resolved Hide resolved
rasa/nlu/tokenizers/tokenizer.py Show resolved Hide resolved
rasa/utils/train_utils.py Outdated Show resolved Hide resolved
rasa/utils/train_utils.py Outdated Show resolved Hide resolved
@Ghostvv Ghostvv removed their request for review December 16, 2019 13:50
@tabergma tabergma merged commit 6674b1f into master Dec 17, 2019
@tabergma tabergma deleted the updated-featurizers branch December 17, 2019 08:18
@tmbo
Copy link
Member

tmbo commented Dec 17, 2019

woop woop 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Replace ner_features by text_dense_features
3 participants