Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix /train endpoint (and potentially other server endpoints) for new data format #6354

Closed
akelad opened this issue Aug 6, 2020 · 17 comments · Fixed by #6372
Closed

Fix /train endpoint (and potentially other server endpoints) for new data format #6354

akelad opened this issue Aug 6, 2020 · 17 comments · Fixed by #6372
Assignees
Labels
area:rasa-oss 🎡 Anything related to the open source Rasa framework type:bug 🐛 Inconsistencies or issues which will cause an issue or problem for users or implementors.

Comments

@akelad
Copy link
Contributor

akelad commented Aug 6, 2020

With the switch to YAML for 2.0, we forgot to update the /model/train endpoint in the server to be able to handle that, and so the tests are failing/hanging in the PR #6352

It still writes the files to markdown files, that should be yaml now and also the concept of nlu vs stories files doesn't exist anymore: https://github.com/RasaHQ/rasa/blob/master/rasa/server.py#L746

I haven't checked if there's other endpoints that incorrectly handle that, so we may have to address that too

@akelad akelad added type:bug 🐛 Inconsistencies or issues which will cause an issue or problem for users or implementors. area:rasa-oss 🎡 Anything related to the open source Rasa framework labels Aug 6, 2020
@akelad akelad added this to the 2.0a2 Rasa Open Source milestone Aug 6, 2020
@wochinge
Copy link
Contributor

wochinge commented Aug 6, 2020

I'd add another key rules here and change everything to yaml. We can also check the the header and change to md if it says markdown. If we decide to drop md support it's easy too remove (and not a lot of engineering overhead from our side now).

@akelad You mentioned

and yes we can probably change to yaml but also the concept of stories vs nlu files doesn’t exist anymore

Is the additional rules key solution fine? I wouldn't merge the rules and stories keys into a core key because they are still somewhat separated in yaml (stories: vs rules: )

@akelad
Copy link
Contributor Author

akelad commented Aug 7, 2020

@wochinge i would avoid using the term "core" - why can't we just pass it all into one training_data_files key for example?

@akelad
Copy link
Contributor Author

akelad commented Aug 7, 2020

passing in rules: and stories doesn't make sense to me since you can have those in one file

@wochinge
Copy link
Contributor

wochinge commented Aug 7, 2020

passing in rules: and stories doesn't make sense to me since you can have those in one file

True. I assumed that when using the API (especially from Rasa X), we'd have everything separate anyway. Putting all in one ( core (stories+rules), nlu, domain) makes more sense when thinking from a disk perspective 👍

On the other hand: If you think of YAML as JSON then it's already separated by keys now. As our training data is YAML now it also makes sense to support it's JSON representation, no?

In addition we shouldn't simply change the API endpoint without at least support the old separated way as deprecated option. How about doing both:

  • Having everything as giant string in a training_data key (I slightly prefer that to training_data_files)
  • Having everything split by rules, stories and so on (maybe print a deprecation warning then)

We can add support for full JSON payloads later.

@akelad
Copy link
Contributor Author

akelad commented Aug 7, 2020

sure, whichever works, as long as we make sure that if people run a rasa server without Rasa X, they can send a request to the /train endpoint without a huge hassle :D

@tmbo
Copy link
Member

tmbo commented Aug 10, 2020

What is the benefit of introducing the training_data key?

I am not sure I fully understand what you are proposing @wochinge, I think an example would help a lot.

I think the goal should be:

  • support the existing format at least for the old training data format
  • add an endpoint that supports submitting yaml only training data + domain + config

if possible:

  • support the new training data format with the old api

@wochinge
Copy link
Contributor

I am not sure I fully understand what you are proposing @wochinge, I think an example would help a lot.

If I understood Akela correctly in this comment then the motivation is that - as all training data can be in one single file from with YAML - there is only one key for all training data. The request payload would look like as follows:

{
   "training_data": "... yaml string containing nlu, core (stories, rules)",
   "domain": "domain as string"
}

In addition I'd support the old way and distinguish MD / YAML based on the HTTP header (YAML as default):

{
  "nlu": "nlu data string",
  "stories": "stories as string",
  "responses": "responses as string",
  "rules": "rules as string",
  "domain": "domain as string"
}

Supporting they way you propose (see snippet below) makes a lot of sense, but I'd do it as a separate PR as it also changes the current behavior of the API and we don't need it atm (to be fair the same could be said about example 1). In that case I'd do it nevertheless as it improves the UX and Akela stated "as long as we make sure that if people run a rasa server without Rasa X, they can send a request to the /train endpoint without a huge hassle :D")

{
  "nlu": [{"intent": greet", examples: [...]}]
}

@tmbo
Copy link
Member

tmbo commented Aug 10, 2020

If I understood Akela correctly in this comment then the motivation is that - as all training data can be in one single file from with YAML - there is only one key for all training data. The request payload would look like as follows:

I don't see a reason to split training data and domain there, I'd rather go for the payload being just yaml (without a json wrapper) since the information is already split in yaml using different toplevel keys.

In addition I'd support the old way and distinguish MD / YAML based on the HTTP header (YAML as default):

This will break compatibility, so needs proper explanation in migration guide (and probably a pointer to it in the exception that we will throw when we are trying to parse md as yaml when a user didn't send the header).

@wochinge
Copy link
Contributor

I don't see a reason to split training data and domain there, I'd rather go for the payload being just yaml (without a json wrapper)

I understand that part, but what do you mean by "since the information is already split in yaml using different toplevel keys."? It's either a big string for each key or we make everything JSON, no?

This will break compatibility, so needs proper explanation in migration guide (and probably a pointer to it in the exception that we will throw when we are trying to parse md as yaml when a user didn't send the header).

We can also keep MD as default. Imo this depends how we move forward regarding keeping / dropping markdown support.

@tmbo
Copy link
Member

tmbo commented Aug 10, 2020

wait, this doesn't make sense:

if we use json, e.g.

{
  "nlu": "nlu data string",
  "stories": "stories as string",
  "responses": "responses as string",
  "rules": "rules as string",
  "domain": "domain as string"
}

the HTTP content type header should be set to JSON (and not to markdown or YAML). I think it should work to keep MD the default here.

To support YAML, we can use the content type header, set it to YAML and avoid the json wrapper, e.g. we'd post something like this to the endpoint (with content type set to yaml):

intents:
- greet

stories:
- ...

responses:
- ...

@wochinge
Copy link
Contributor

Got what you mean know 👍 🙈
Will implement it later today or tomorrow 👍

@tmbo
Copy link
Member

tmbo commented Aug 10, 2020

sounds great 💯 let me know how things go so I can update the multimedia responses PR

@wochinge
Copy link
Contributor

Other endpoints:

  • retrieve_story should be changed when story writer is done (TODO: Create issue)
  • evaluate_stories works with new payloads

@wochinge
Copy link
Contributor

@tmbo The PR is now in review.

@wochinge
Copy link
Contributor

@tmbo This is merged btw.

@b-quachtran
Copy link
Contributor

@wochinge It looks like the example payload we provide on our docs page doesn't match the correct YAML spec for the endpoint.

Is application/json only compatible with markdown at the moment?

@wochinge
Copy link
Contributor

Is application/json only compatible with markdown at the moment?

Yes!
You need to use application/x-yaml for the yaml payload

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:rasa-oss 🎡 Anything related to the open source Rasa framework type:bug 🐛 Inconsistencies or issues which will cause an issue or problem for users or implementors.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants