Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated featurizers #4935

Merged
merged 278 commits into from
Dec 17, 2019
Merged
Changes from 1 commit
Commits
Show all changes
278 commits
Select commit Hold shift + click to select a range
d496b43
Add changelog entry.
tabergma Oct 18, 2019
291a24e
move code from init to own file
tabergma Oct 18, 2019
5986a0d
update changelog entry.
tabergma Oct 18, 2019
54b5f3a
make use_cls_token a class variable of tokenizer
tabergma Oct 18, 2019
c939387
tokenizer inherits from compoenent
tabergma Oct 18, 2019
944b716
remove not needed init methods
tabergma Oct 18, 2019
f1ed7d7
review comment
tabergma Oct 18, 2019
9112022
Add use_cls_token to default dict.
tabergma Oct 18, 2019
31dd425
thorw key error if use_cls_token is not set as default value.
tabergma Oct 18, 2019
e652d84
Disable cls token use in default pipeline.
tabergma Oct 20, 2019
1d77554
correct type
tabergma Oct 20, 2019
3d8a2e4
fix tests
tabergma Oct 21, 2019
2985938
Merge branch 'combined-entity-intent-model' into adapt-featurizers
tabergma Oct 21, 2019
50f68b2
spacy featurizer returns sequence
tabergma Oct 21, 2019
603d065
fix tests for count vectors featurizer
tabergma Oct 21, 2019
d1a19dc
mitie featurizer returns sequence
tabergma Oct 21, 2019
bd2ceb3
regex featurizer returns sequence
tabergma Oct 21, 2019
7e46fe8
clean up
tabergma Oct 21, 2019
a4b8b0e
Add changelog entry
tabergma Oct 21, 2019
f02b9c2
helper method to convert seq features back
tabergma Oct 21, 2019
d3a5dd5
remove print statement
tabergma Oct 21, 2019
46ab485
fix imports
tabergma Oct 22, 2019
076f33d
remove ner_features from restaurantbot
tabergma Oct 22, 2019
905f2d6
change default value
tabergma Oct 22, 2019
1941a25
fix imports
tabergma Oct 22, 2019
6483379
handle cls token in featurizers
tabergma Oct 22, 2019
952e95a
Remove ngram featurizer from registry
tabergma Oct 22, 2019
6faa44b
review comments
tabergma Oct 23, 2019
20a92ca
count vectors featurizer requires tokens
tabergma Oct 23, 2019
810cae5
remove not needed vocab check
tabergma Oct 23, 2019
b6ad85c
Add cls token to whitespace tokenizer.
tabergma Oct 18, 2019
fb24e35
Add cls token to spacy tokenizer.
tabergma Oct 18, 2019
ad64e50
Add cls token to mitie tokenizer.
tabergma Oct 18, 2019
2ce36d9
Add cls token to jieba tokenizer.
tabergma Oct 18, 2019
3f85199
Add changelog entry.
tabergma Oct 18, 2019
88964a0
move code from init to own file
tabergma Oct 18, 2019
acb7503
update changelog entry.
tabergma Oct 18, 2019
3d89a66
make use_cls_token a class variable of tokenizer
tabergma Oct 18, 2019
7ed1f27
tokenizer inherits from compoenent
tabergma Oct 18, 2019
b9e3188
remove not needed init methods
tabergma Oct 18, 2019
787e047
review comment
tabergma Oct 18, 2019
6fe28f0
Add use_cls_token to default dict.
tabergma Oct 18, 2019
172c0e5
thorw key error if use_cls_token is not set as default value.
tabergma Oct 18, 2019
45a5868
Disable cls token use in default pipeline.
tabergma Oct 20, 2019
7c9c679
correct type
tabergma Oct 20, 2019
dfeca3e
fix tests
tabergma Oct 21, 2019
d031f14
Merge branch 'combined-entity-intent-model' into adapt-featurizers
tabergma Oct 23, 2019
ce91597
Update rasa/nlu/featurizers/sparse_featurizer/count_vectors_featurize…
tabergma Oct 23, 2019
f69673a
Update rasa/nlu/featurizers/sparse_featurizer/count_vectors_featurize…
tabergma Oct 23, 2019
78d4d51
review comments
tabergma Oct 23, 2019
411d328
test regex featurizer on response
tabergma Oct 23, 2019
884e2b3
review comments
tabergma Oct 23, 2019
2075338
Merge pull request #4648 from RasaHQ/adapt-featurizers
tabergma Oct 23, 2019
e857c35
switch from ner to sparse features
tabergma Oct 23, 2019
01b4de6
add seq to senntence embedding method
tabergma Oct 23, 2019
f21bbd7
update crf entity extractor
tabergma Oct 23, 2019
d08b2d7
use constants
tabergma Oct 23, 2019
a5e3382
fix imports
tabergma Oct 23, 2019
b469bd6
Fix crf entity extractor.
tabergma Oct 23, 2019
d97ce14
Remove empty file.
tabergma Oct 23, 2019
3275f5a
add changelog entry.
tabergma Oct 23, 2019
f85fbe2
Remove case_sensitive option from WhitespaceTokenizer
tabergma Oct 23, 2019
a4f5e8e
Update docstring.
tabergma Oct 23, 2019
a6d93fb
review comments
tabergma Oct 23, 2019
ae8faf6
undo removing case sensitive from whitespace tokenizer
tabergma Oct 24, 2019
8791d40
Adapt tests.
tabergma Oct 24, 2019
8d86968
rename word_embeddings to text_dense_features
tabergma Oct 24, 2019
a8a5abf
combine correct features in regex featurizer
tabergma Oct 24, 2019
b1d371b
keep sparse sparse
tabergma Oct 25, 2019
5ded8c9
fix changelog
tabergma Oct 25, 2019
bbf4d43
update sequence to sentence
tabergma Oct 25, 2019
ecbf157
update sequence to sentence
tabergma Oct 25, 2019
74ec4c4
Merge pull request #4663 from RasaHQ/adapt-extractors-classifiers
tabergma Oct 25, 2019
934ae5e
Merge branch 'master' into combined-entity-intent-model
tabergma Oct 25, 2019
a62445c
Merge branch 'master' into combined-entity-intent-model
tabergma Oct 29, 2019
3cb90b7
update session data
tabergma Oct 23, 2019
3faf171
use dict for session data.
tabergma Oct 28, 2019
834d265
adapt classifiers
tabergma Oct 28, 2019
3f0750b
fix classifier
tabergma Oct 28, 2019
4c8f811
add more tests
tabergma Oct 28, 2019
e05a9d6
use sparse in tests
tabergma Oct 28, 2019
b110720
fix shapes
tabergma Oct 29, 2019
219d9dd
fix tests.
tabergma Oct 29, 2019
bca0b85
review comments
tabergma Oct 29, 2019
adc84fe
use label_key
tabergma Oct 29, 2019
1dbaa5c
intent classifier makes use of sparse and dense features.
tabergma Oct 29, 2019
f2a8599
remove default value for label_key
tabergma Oct 29, 2019
9c25095
clean up
tabergma Oct 29, 2019
dafbaf9
review comments
tabergma Oct 30, 2019
3c78a86
add more tests
tabergma Oct 30, 2019
31196cf
add test for blanance session data
tabergma Oct 30, 2019
9f4ed63
use given attribute in create session data
tabergma Oct 30, 2019
7545745
gen_batch can handle sequence
tabergma Oct 31, 2019
d3b48ea
session data is simple dict
tabergma Oct 31, 2019
a65f397
use sparse tensors
tabergma Nov 1, 2019
60bdec2
warp tf.layers.dense with dense_layer function
tabergma Nov 4, 2019
c448fe7
get feature_dim from session data instead of sparse tensor
tabergma Nov 4, 2019
093024f
Update rasa/utils/train_utils.py
Ghostvv Nov 7, 2019
b600c26
pass last dim of sparse tensor into the SparseTensor directly, separa…
Ghostvv Nov 7, 2019
d53ffb9
rephrase todo
Ghostvv Nov 7, 2019
086ee13
rephrase todo
Ghostvv Nov 8, 2019
e3f8a63
keep _encoded_all_label_ids scipy.sparse.csr_matrix.
tabergma Nov 8, 2019
98829a9
session data values are list of np.ndarray
tabergma Nov 8, 2019
f97f6df
fix encoded all label ids
tabergma Nov 8, 2019
5d53eb1
fix train utils methods
tabergma Nov 8, 2019
f79ed36
convert encoded_all_labels into a list of sparse,dense
Ghostvv Nov 8, 2019
0a06ff6
Merge branch 'adapt-session-data' of https://github.com/RasaHQ/rasa i…
Ghostvv Nov 8, 2019
c2447f3
Merge branch 'master' into combined-entity-intent-model
tabergma Nov 8, 2019
2a3966b
Merge branch 'combined-entity-intent-model' into adapt-session-data
tabergma Nov 8, 2019
b208db7
create sparse matrices, if no intent features provided
Ghostvv Nov 8, 2019
ee37852
embedding intent classifier is training.
tabergma Nov 11, 2019
b9256bd
create session data during prediction.
tabergma Nov 11, 2019
112f065
prediction of embedding intent classifier works.
tabergma Nov 11, 2019
2635464
clean up code
tabergma Nov 11, 2019
4104574
convert encoded all labels the same way as session data
Ghostvv Nov 11, 2019
87bb01a
merge
Ghostvv Nov 11, 2019
9d66575
add mask
tabergma Nov 11, 2019
1c4591e
check if tokens are present
tabergma Nov 11, 2019
7889033
add TODO
Ghostvv Nov 11, 2019
6f20dbc
fix wrong embed layer
Ghostvv Nov 11, 2019
6cf4385
more consistent var naming
Ghostvv Nov 11, 2019
bcc52c1
fix balance session data
tabergma Nov 12, 2019
c161a43
add comments
tabergma Nov 12, 2019
c7e3251
extract dense_dim from dense features
tabergma Nov 12, 2019
b98bab6
Fix test_train test.
tabergma Nov 12, 2019
615bb62
_compute_default_label_features works as expected
tabergma Nov 12, 2019
2ef8744
fix len error'
Ghostvv Nov 12, 2019
13a3550
Merge branch 'adapt-session-data' of https://github.com/RasaHQ/rasa i…
Ghostvv Nov 12, 2019
c558e0d
Merge branch 'master' into combined-entity-intent-model
tabergma Nov 12, 2019
42ba88e
Merge branch 'combined-entity-intent-model' into adapt-session-data
tabergma Nov 12, 2019
718aff0
use default label features if not present
tabergma Nov 12, 2019
220d6d0
correct use of session data in policy
tabergma Nov 12, 2019
18fe94f
Use coo_matrix.
tabergma Nov 12, 2019
c56db96
Update Changelog
tabergma Nov 12, 2019
d22055c
clean up
tabergma Nov 12, 2019
ad8695a
Fix imports.
tabergma Nov 12, 2019
1c835c5
add masks, update prediction batch creation
Ghostvv Nov 12, 2019
ff0c707
fix types
Ghostvv Nov 12, 2019
597265b
merge helper methods
Ghostvv Nov 12, 2019
29a9c7f
some refactoring
tabergma Nov 13, 2019
4c03841
add test for get number of features
tabergma Nov 13, 2019
fa7b50c
set initial tuple size to zero
Ghostvv Nov 13, 2019
fb7a6ed
Merge branch 'adapt-session-data' of https://github.com/RasaHQ/rasa i…
Ghostvv Nov 13, 2019
35e2bdb
rename the variable
Ghostvv Nov 13, 2019
9bbe1c1
formatting
tabergma Nov 13, 2019
f383cf0
Update cli startup test
tabergma Nov 13, 2019
7ab1f97
fix test.
tabergma Nov 13, 2019
2a54286
fix different sequence lengths in sparse and dense features
Ghostvv Nov 13, 2019
307e064
cosmetic changes
Ghostvv Nov 13, 2019
7c46caa
black
Ghostvv Nov 13, 2019
468ef3c
fix default Y features
Ghostvv Nov 13, 2019
b2391cf
use f strings
tabergma Nov 13, 2019
ed2b72f
formatting
tabergma Nov 13, 2019
38d83c3
fix types
tabergma Nov 14, 2019
5fdd251
store tuple sizes correctly
tabergma Nov 14, 2019
75b0e69
use float32 everywhere
tabergma Nov 14, 2019
38e8b81
fix docstrings
Ghostvv Nov 14, 2019
6e472a9
fix label_ids in core
Ghostvv Nov 14, 2019
5fdf957
use helper method
Ghostvv Nov 14, 2019
6aaf3ce
fix dynamic seq in label_id
Ghostvv Nov 14, 2019
4f9ecf7
raise if unsupported label_id dims
Ghostvv Nov 14, 2019
015c4d9
black
Ghostvv Nov 14, 2019
6004d65
fix import
Ghostvv Nov 14, 2019
5c050da
fix split session data tests
Ghostvv Nov 14, 2019
c08fe55
Merge pull request #4686 from RasaHQ/adapt-session-data
tabergma Nov 14, 2019
6fc18e5
Merge branch 'master' into combined-entity-intent-model
tabergma Nov 14, 2019
b40d6f4
slightly cleaner sparse to indicies code
Ghostvv Nov 15, 2019
af54fbc
use extend
Ghostvv Nov 15, 2019
5d435a3
remove else
Ghostvv Nov 15, 2019
8d41e5e
use numpy stack
Ghostvv Nov 15, 2019
f29415c
Merge pull request #4777 from RasaHQ/sparse-batch
Ghostvv Nov 15, 2019
308b487
fix split train val
tabergma Nov 15, 2019
62d9e60
mask combined input before averaging
Ghostvv Nov 20, 2019
b26cfac
Merge branch 'master' into updated-featurizers
tabergma Nov 21, 2019
931d5eb
fix oov token warning
Ghostvv Nov 25, 2019
afeeaf1
Merge branch 'master' into updated-featurizers
tabergma Nov 25, 2019
df104f3
Merge branch 'master' into updated-featurizers
tabergma Nov 27, 2019
232176c
move convert featurizer to dense featurizers
tabergma Nov 27, 2019
7c87d60
add future warning to ngram featurizer
tabergma Nov 27, 2019
a81d0a8
set default value of use_cls_token to false
tabergma Nov 27, 2019
b2d1ad2
Merge branch 'master' into updated-featurizers
tabergma Nov 28, 2019
4471dee
fix import (add root)
tabergma Nov 28, 2019
e48d7a5
add return_sequence flag
tabergma Nov 28, 2019
bcfb0ad
convert featurizer returns seq of 1
tabergma Nov 28, 2019
f2b9e4f
fix return_sequence not found in config
tabergma Nov 29, 2019
a9360e3
convert featurizer return seq of 1
tabergma Nov 29, 2019
2850813
add more tests
tabergma Nov 29, 2019
32586f3
add test for convert featurizer
tabergma Nov 29, 2019
5121edc
fix default pipeline test
tabergma Nov 29, 2019
3c20e33
refactor mitie featurizer
tabergma Nov 29, 2019
4d42a22
Merge branch 'master' into updated-featurizers
tabergma Nov 29, 2019
961b912
Merge branch 'updated-featurizers' into add-sequence-flag
tabergma Nov 29, 2019
aac64a8
fix import
tabergma Nov 29, 2019
a4b454b
Add warning to convert featurizer.
tabergma Dec 2, 2019
303ef4c
update warning in crf entity extractor
tabergma Dec 2, 2019
b4e1e04
Add empty documentation page.
tabergma Dec 2, 2019
2e32d7e
update documentation
tabergma Dec 2, 2019
24e92b6
raise value error if seq dimension does not match
tabergma Dec 3, 2019
388fb6e
take mean vec for cls token in mitie
tabergma Dec 4, 2019
d5579a3
fix bug in count vector featurizer
tabergma Dec 4, 2019
0a37a61
review comments
tabergma Dec 4, 2019
afec4a9
add comment to count vectors about input to vectorizer
tabergma Dec 9, 2019
f6507ca
throw error is return seq is true for convert featurizer
tabergma Dec 9, 2019
de9a5ed
update warnings
tabergma Dec 9, 2019
c01673c
update warning
tabergma Dec 9, 2019
3ae7626
fix tests
tabergma Dec 9, 2019
8156a4e
Merge pull request #4880 from RasaHQ/add-sequence-flag
tabergma Dec 9, 2019
ea57e20
Merge branch 'master' into updated-featurizers
tabergma Dec 10, 2019
fcf0474
remove default values from example configs
tabergma Dec 10, 2019
b8b4c2c
Merge branch 'updated-featurizers' into nlu-featurizer-documentation
tabergma Dec 10, 2019
79e0ceb
fix import
tabergma Dec 10, 2019
e47176a
update documentatioon
tabergma Dec 10, 2019
d39c322
Merge branch 'master' into updated-featurizers
tabergma Dec 10, 2019
66bdd62
Merge branch 'updated-featurizers' into nlu-featurizer-documentation
tabergma Dec 10, 2019
e3ed14f
fix links
tabergma Dec 10, 2019
225f1e4
reduce complexity
tabergma Dec 10, 2019
d434f04
update featurization link
tabergma Dec 11, 2019
a422dbd
Merge branch 'master' into updated-featurizers
tabergma Dec 11, 2019
d270cba
Merge branch 'updated-featurizers' into nlu-featurizer-documentation
tabergma Dec 11, 2019
8fdb9cf
review comment
tabergma Dec 11, 2019
dc47c40
Merge pull request #4934 from RasaHQ/nlu-featurizer-documentation
tabergma Dec 11, 2019
4c631e6
remove MESSAGE_ from nlu constants
tabergma Dec 11, 2019
50d54e3
rename spacy_featurizable_attributes to dense_featurizable_attributes
tabergma Dec 11, 2019
80e483b
Merge pull request #4944 from RasaHQ/rename-nlu-constants
tabergma Dec 11, 2019
f9b4f82
update changelog entry
tabergma Dec 12, 2019
1c1d95e
Merge branch 'master' into updated-featurizers
tabergma Dec 12, 2019
832755e
update docs around convert featurizer
tabergma Dec 12, 2019
4e4cef6
add description to public methods in embedding intent classifier
tabergma Dec 12, 2019
bb231b1
update train utils
tabergma Dec 12, 2019
aa3bf9d
update changelog entry
tabergma Dec 12, 2019
1125e11
Update nlu component documentation.
tabergma Dec 12, 2019
9628eb2
fix spelling mistakes
tabergma Dec 12, 2019
56e7f86
Merge branch 'master' into updated-featurizers
tabergma Dec 12, 2019
e4529c0
refactoring count vectors featurizer
tabergma Dec 12, 2019
a366b77
compute default intent features as dense features
tabergma Dec 12, 2019
47095d1
use different dense dim default value for intents
tabergma Dec 12, 2019
b8b4bec
Merge branch 'master' into updated-featurizers
tabergma Dec 12, 2019
cd58a51
Merge branch 'master' into updated-featurizers
tabergma Dec 16, 2019
2df3b36
update model version
tabergma Dec 16, 2019
ec2cb58
update changelog
tabergma Dec 16, 2019
2f148f3
increase version to 1.6.0a2
tabergma Dec 16, 2019
8ba153a
update documentation
tabergma Dec 16, 2019
a79916c
review comments
tabergma Dec 16, 2019
5ef7b80
Update rasa/nlu/featurizers/sparse_featurizer/ngram_featurizer.py
tabergma Dec 16, 2019
bb44fd6
add missing types
tabergma Dec 16, 2019
3032fc4
Update rasa/nlu/featurizers/sparse_featurizer/count_vectors_featurize…
tabergma Dec 16, 2019
e1eade1
Update rasa/nlu/featurizers/sparse_featurizer/count_vectors_featurize…
tabergma Dec 16, 2019
ad30827
fix types
tabergma Dec 16, 2019
b83ee6f
Merge branch 'master' into updated-featurizers
tabergma Dec 16, 2019
2253200
Merge branch 'master' into updated-featurizers
tabergma Dec 17, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions rasa/nlu/tokenizers/tokenizer.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,19 +28,19 @@ def set(self, prop: Text, info: Any) -> None:
def get(self, prop: Text, default: Optional[Any] = None) -> Any:
return self.data.get(prop, default)

def __eq__(self, other) -> bool:
def __eq__(self, other):
tabergma marked this conversation as resolved.
Show resolved Hide resolved
if not isinstance(other, Token):
return NotImplemented
tabergma marked this conversation as resolved.
Show resolved Hide resolved
return NotImplementedError
return (self.offset, self.end, self.text, self.lemma) == (
other.offset,
other.end,
other.text,
other.lemma,
)

def __lt__(self, other) -> bool:
def __lt__(self, other):
tabergma marked this conversation as resolved.
Show resolved Hide resolved
if not isinstance(other, Token):
return NotImplemented
return NotImplementedError
return (self.offset, self.end, self.text, self.lemma) < (
other.offset,
other.end,
Expand Down