Fix `/train` endpoint (and potentially other server endpoints) for new data format #6354

akelad · 2020-08-06T13:03:59Z

With the switch to YAML for 2.0, we forgot to update the /model/train endpoint in the server to be able to handle that, and so the tests are failing/hanging in the PR #6352

It still writes the files to markdown files, that should be yaml now and also the concept of nlu vs stories files doesn't exist anymore: https://github.com/RasaHQ/rasa/blob/master/rasa/server.py#L746

I haven't checked if there's other endpoints that incorrectly handle that, so we may have to address that too

The text was updated successfully, but these errors were encountered:

wochinge · 2020-08-06T13:55:05Z

I'd add another key rules here and change everything to yaml. We can also check the the header and change to md if it says markdown. If we decide to drop md support it's easy too remove (and not a lot of engineering overhead from our side now).

@akelad You mentioned

and yes we can probably change to yaml but also the concept of stories vs nlu files doesn’t exist anymore

Is the additional rules key solution fine? I wouldn't merge the rules and stories keys into a core key because they are still somewhat separated in yaml (stories: vs rules: )

akelad · 2020-08-07T09:33:41Z

@wochinge i would avoid using the term "core" - why can't we just pass it all into one training_data_files key for example?

akelad · 2020-08-07T09:34:11Z

passing in rules: and stories doesn't make sense to me since you can have those in one file

wochinge · 2020-08-07T09:48:20Z

passing in rules: and stories doesn't make sense to me since you can have those in one file

True. I assumed that when using the API (especially from Rasa X), we'd have everything separate anyway. Putting all in one ( core (stories+rules), nlu, domain) makes more sense when thinking from a disk perspective 👍

On the other hand: If you think of YAML as JSON then it's already separated by keys now. As our training data is YAML now it also makes sense to support it's JSON representation, no?

In addition we shouldn't simply change the API endpoint without at least support the old separated way as deprecated option. How about doing both:

Having everything as giant string in a training_data key (I slightly prefer that to training_data_files)
Having everything split by rules, stories and so on (maybe print a deprecation warning then)

We can add support for full JSON payloads later.

akelad · 2020-08-07T10:21:37Z

sure, whichever works, as long as we make sure that if people run a rasa server without Rasa X, they can send a request to the /train endpoint without a huge hassle :D

tmbo · 2020-08-10T09:25:59Z

What is the benefit of introducing the training_data key?

I am not sure I fully understand what you are proposing @wochinge, I think an example would help a lot.

I think the goal should be:

support the existing format at least for the old training data format
add an endpoint that supports submitting yaml only training data + domain + config

if possible:

support the new training data format with the old api

wochinge · 2020-08-10T09:59:06Z

I am not sure I fully understand what you are proposing @wochinge, I think an example would help a lot.

If I understood Akela correctly in this comment then the motivation is that - as all training data can be in one single file from with YAML - there is only one key for all training data. The request payload would look like as follows:

{
   "training_data": "... yaml string containing nlu, core (stories, rules)",
   "domain": "domain as string"
}

In addition I'd support the old way and distinguish MD / YAML based on the HTTP header (YAML as default):

{
  "nlu": "nlu data string",
  "stories": "stories as string",
  "responses": "responses as string",
  "rules": "rules as string",
  "domain": "domain as string"
}

Supporting they way you propose (see snippet below) makes a lot of sense, but I'd do it as a separate PR as it also changes the current behavior of the API and we don't need it atm (to be fair the same could be said about example 1). In that case I'd do it nevertheless as it improves the UX and Akela stated "as long as we make sure that if people run a rasa server without Rasa X, they can send a request to the /train endpoint without a huge hassle :D")

{
  "nlu": [{"intent": greet", examples: [...]}]
}

tmbo · 2020-08-10T12:37:35Z

If I understood Akela correctly in this comment then the motivation is that - as all training data can be in one single file from with YAML - there is only one key for all training data. The request payload would look like as follows:

I don't see a reason to split training data and domain there, I'd rather go for the payload being just yaml (without a json wrapper) since the information is already split in yaml using different toplevel keys.

In addition I'd support the old way and distinguish MD / YAML based on the HTTP header (YAML as default):

This will break compatibility, so needs proper explanation in migration guide (and probably a pointer to it in the exception that we will throw when we are trying to parse md as yaml when a user didn't send the header).

wochinge · 2020-08-10T12:42:46Z

I don't see a reason to split training data and domain there, I'd rather go for the payload being just yaml (without a json wrapper)

I understand that part, but what do you mean by "since the information is already split in yaml using different toplevel keys."? It's either a big string for each key or we make everything JSON, no?

This will break compatibility, so needs proper explanation in migration guide (and probably a pointer to it in the exception that we will throw when we are trying to parse md as yaml when a user didn't send the header).

We can also keep MD as default. Imo this depends how we move forward regarding keeping / dropping markdown support.

tmbo · 2020-08-10T12:46:14Z

wait, this doesn't make sense:

if we use json, e.g.

{
  "nlu": "nlu data string",
  "stories": "stories as string",
  "responses": "responses as string",
  "rules": "rules as string",
  "domain": "domain as string"
}

the HTTP content type header should be set to JSON (and not to markdown or YAML). I think it should work to keep MD the default here.

To support YAML, we can use the content type header, set it to YAML and avoid the json wrapper, e.g. we'd post something like this to the endpoint (with content type set to yaml):

intents:
- greet

stories:
- ...

responses:
- ...

wochinge · 2020-08-10T13:07:54Z

Got what you mean know 👍 🙈
Will implement it later today or tomorrow 👍

tmbo · 2020-08-10T13:11:54Z

sounds great 💯 let me know how things go so I can update the multimedia responses PR

wochinge · 2020-08-11T09:08:05Z

Other endpoints:

retrieve_story should be changed when story writer is done (TODO: Create issue)
evaluate_stories works with new payloads

wochinge · 2020-08-11T09:26:40Z

@tmbo The PR is now in review.

wochinge · 2020-08-13T12:11:18Z

@tmbo This is merged btw.

b-quachtran · 2020-10-20T21:03:37Z

@wochinge It looks like the example payload we provide on our docs page doesn't match the correct YAML spec for the endpoint.

Is application/json only compatible with markdown at the moment?

wochinge · 2020-10-21T11:34:36Z

Is application/json only compatible with markdown at the moment?

Yes!
You need to use application/x-yaml for the yaml payload

akelad added type:bug 🐛 Inconsistencies or issues which will cause an issue or problem for users or implementors. area:rasa-oss 🎡 Anything related to the open source Rasa framework labels Aug 6, 2020

akelad added this to the 2.0a2 Rasa Open Source milestone Aug 6, 2020

akelad assigned wochinge and chkoss Aug 6, 2020

wochinge added the priority:high label Aug 7, 2020

wochinge unassigned chkoss Aug 10, 2020

wochinge mentioned this issue Aug 10, 2020

yaml support for train endpoint #6372

Merged

4 tasks

wochinge closed this as completed in #6372 Aug 12, 2020

wochinge mentioned this issue Mar 4, 2021

Check domain file does not include NLU keys #8104

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix `/train` endpoint (and potentially other server endpoints) for new data format #6354

Fix `/train` endpoint (and potentially other server endpoints) for new data format #6354

akelad commented Aug 6, 2020

wochinge commented Aug 6, 2020

akelad commented Aug 7, 2020

akelad commented Aug 7, 2020

wochinge commented Aug 7, 2020

akelad commented Aug 7, 2020

tmbo commented Aug 10, 2020

wochinge commented Aug 10, 2020

tmbo commented Aug 10, 2020 •

edited

Loading

wochinge commented Aug 10, 2020

tmbo commented Aug 10, 2020 •

edited

Loading

wochinge commented Aug 10, 2020

tmbo commented Aug 10, 2020

wochinge commented Aug 11, 2020

wochinge commented Aug 11, 2020

wochinge commented Aug 13, 2020

b-quachtran commented Oct 20, 2020

wochinge commented Oct 21, 2020

Fix /train endpoint (and potentially other server endpoints) for new data format #6354

Fix /train endpoint (and potentially other server endpoints) for new data format #6354

Comments

akelad commented Aug 6, 2020

wochinge commented Aug 6, 2020

akelad commented Aug 7, 2020

akelad commented Aug 7, 2020

wochinge commented Aug 7, 2020

akelad commented Aug 7, 2020

tmbo commented Aug 10, 2020

wochinge commented Aug 10, 2020

tmbo commented Aug 10, 2020 • edited Loading

wochinge commented Aug 10, 2020

tmbo commented Aug 10, 2020 • edited Loading

wochinge commented Aug 10, 2020

tmbo commented Aug 10, 2020

wochinge commented Aug 11, 2020

wochinge commented Aug 11, 2020

wochinge commented Aug 13, 2020

b-quachtran commented Oct 20, 2020

wochinge commented Oct 21, 2020

Fix `/train` endpoint (and potentially other server endpoints) for new data format #6354

Fix `/train` endpoint (and potentially other server endpoints) for new data format #6354

tmbo commented Aug 10, 2020 •

edited

Loading

tmbo commented Aug 10, 2020 •

edited

Loading