-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dense dimension of a sparse input shouldn't be set to a dimension of a dense input #6555
Comments
@Ghostvv Just to be clear, we should remove this line https://github.com/RasaHQ/rasa/blob/master/rasa/nlu/classifiers/diet_classifier.py#L1349 and the output dimension of the |
yes, I think we should simply remove this line. I think we should set rasa/rasa/nlu/classifiers/diet_classifier.py Line 176 in 0cbd883
or 256 , but 512 is definitely too much.Same with concat |
half of sparse is huge. With all our sparse features, like char, it can be 10000 |
@Ghostvv Just a note, the change to the default dense & concat dimensions from 512 to 128 ended up causing issues with a customer's NLU model when they upgraded to Rasa 2.0. Specifically, they were seeing problems where messages that should've triggered For example, the message "feed the cat" was classified as "affirm" with confidence 0.8:
|
@b-quachtran what was the intent of it in previous model? Are you sure it doesn't fluctuate from training to training with 512? |
It gets classified as a "talk_to_human" intent with low confidence with their 1.10.14 model:
They're setting a random seed for the classifier |
but since we changed the default architecture, the effect of a random seed also changed. How's overall performance of the model, did it drop? |
They're running evals on the models at the moment. I'll get back to you once they have real performance metrics to share |
for historical reasons, we set dense dimension of a sparse input to a dimension of a dense input.
since we concatenate instead of sum, there is no reason to do it, I think it should be removed and dense dim should be reduced to smth like a
100
if doesn't hinder performance, but it should reduce train timeSome experiments are required to test performance
The text was updated successfully, but these errors were encountered: