-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix /train
endpoint (and potentially other server endpoints) for new data format
#6354
Comments
I'd add another key @akelad You mentioned
Is the additional |
@wochinge i would avoid using the term "core" - why can't we just pass it all into one |
passing in |
True. I assumed that when using the API (especially from Rasa X), we'd have everything separate anyway. Putting all in one ( core (stories+rules), nlu, domain) makes more sense when thinking from a disk perspective 👍 On the other hand: If you think of YAML as JSON then it's already separated by keys now. As our training data is YAML now it also makes sense to support it's JSON representation, no? In addition we shouldn't simply change the API endpoint without at least support the old separated way as deprecated option. How about doing both:
We can add support for full JSON payloads later. |
sure, whichever works, as long as we make sure that if people run a rasa server without Rasa X, they can send a request to the |
What is the benefit of introducing the I am not sure I fully understand what you are proposing @wochinge, I think an example would help a lot. I think the goal should be:
if possible:
|
If I understood Akela correctly in this comment then the motivation is that - as all training data can be in one single file from with YAML - there is only one key for all training data. The request payload would look like as follows: {
"training_data": "... yaml string containing nlu, core (stories, rules)",
"domain": "domain as string"
} In addition I'd support the old way and distinguish MD / YAML based on the HTTP header (YAML as default): {
"nlu": "nlu data string",
"stories": "stories as string",
"responses": "responses as string",
"rules": "rules as string",
"domain": "domain as string"
} Supporting they way you propose (see snippet below) makes a lot of sense, but I'd do it as a separate PR as it also changes the current behavior of the API and we don't need it atm (to be fair the same could be said about example 1). In that case I'd do it nevertheless as it improves the UX and Akela stated "as long as we make sure that if people run a rasa server without Rasa X, they can send a request to the /train endpoint without a huge hassle :D") {
"nlu": [{"intent": greet", examples: [...]}]
}
|
I don't see a reason to split training data and domain there, I'd rather go for the payload being just yaml (without a json wrapper) since the information is already split in yaml using different toplevel keys.
This will break compatibility, so needs proper explanation in migration guide (and probably a pointer to it in the exception that we will throw when we are trying to parse md as yaml when a user didn't send the header). |
I understand that part, but what do you mean by "since the information is already split in yaml using different toplevel keys."? It's either a big string for each key or we make everything JSON, no?
We can also keep MD as default. Imo this depends how we move forward regarding keeping / dropping markdown support. |
wait, this doesn't make sense: if we use json, e.g.
the HTTP content type header should be set to JSON (and not to markdown or YAML). I think it should work to keep MD the default here. To support YAML, we can use the content type header, set it to YAML and avoid the json wrapper, e.g. we'd post something like this to the endpoint (with content type set to yaml):
|
Got what you mean know 👍 🙈 |
sounds great 💯 let me know how things go so I can update the multimedia responses PR |
Other endpoints:
|
@tmbo The PR is now in review. |
@tmbo This is merged btw. |
Yes! |
With the switch to YAML for 2.0, we forgot to update the
/model/train
endpoint in the server to be able to handle that, and so the tests are failing/hanging in the PR #6352It still writes the files to markdown files, that should be yaml now and also the concept of nlu vs stories files doesn't exist anymore: https://github.com/RasaHQ/rasa/blob/master/rasa/server.py#L746
I haven't checked if there's other endpoints that incorrectly handle that, so we may have to address that too
The text was updated successfully, but these errors were encountered: