-
Notifications
You must be signed in to change notification settings - Fork 912
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow for customization of pipeline templates #2701
Allow for customization of pipeline templates #2701
Conversation
@idanov Any thoughts? |
Hi @jasonmhite, sorry we thought this was still in progress because it's marked as draft. Is it ready for discussion? |
@astrojuanlu Yeah, I was hoping someone could take a look before I marked it as ready to review. Specifically I wanted to resolve three things first:
|
Just want to say, users have been asking for this for a long time and this is a neat solution @jasonmhite |
Thank you @jasonmhite for the great start!
I think it's best to update it in CLI and piplines doc. i.e. In addition, we need to document WHY one may do this and what it is useful for. We also need to make it clear that user cannot change the
|
@noklam So the goal for me is that we want to distribute the pipeline template as part of a starter among collaborators. So my team can share a common starter and then "automatically" they get the pipeline template we prefer. This includes a slightly different file layout for the pipeline code and also adding some standard imports of internal tools we use, but there are lots of other possible uses. I put it in I'm kind of averse to making it an extra flag in the CLI because if I understand what you're suggesting, every time we made a new pipeline users would have to remember to I don't really envision that this is a feature a normal user would use directly, in my mind it's just a place that someone writing custom starter can use if they also want to override the way pipelines are generated. It seems like a mistake to hardcode the pipeline template when starter templates are possible, but I also doubt it's something users will often want to manually override. Regarding docs/testing, good point on the parameters files being a bit sensitive. I do see this as a "developer" feature akin to writing plugins and not something a normal user would do routinely, which is why I was curious where you think it's best to document. But it's also a bit of a "you'd better know what you're doing if you want to write a custom pipeline template" situation, so there are likely to be some gotchas. |
Thank you very much for opening a PR for discussion @jasonmhite and for all the careful thought you've obviously put into this! I didn't even think of the obvious problem about needing to use My initial reaction here is similar to @noklam, that I was expecting Let me propose something that would satisfy both our preference for a CLI flag and your requirement for not needing to manually specify |
@antonymilne Actually I'm perfectly fine with defaulting to checking for a static templates path within the project directory, it covers my needs. I assume the flow you're imagining is to first check for the existence of something like I'll update my implementation to do the path checks as described above. I'll also take a look at the CLI code to see how much trouble it is to implement; I'm not super familiar with the Kedro codebase yet so I may need help on that depending on how complicated it is. |
Annnnd... done. Added a check for a One concern is this doesn't do much validation to check if the target folder is a valid template, but I'm of the opinion of that it's the user's responsibility. I/we can spell out the limitations in the docs. |
Separately to this awesome contribution @jasonmhite, it would be great to learn what you (and others) are customising in your pipeline templates :) |
@datajoely The example project starter+pipeline template I linked in the original comment shows the main stuff. Basically just address some minor nuisances.
|
This looks great, thank you very much @jasonmhite! Just what I was imagining. I'm fine with it not doing the validation check to see if it's a valid template, just like you have it now. There's just one thing I'm not sure about: should we be restricting Making it more general is an easy change:
Pros of making this change:
Cons:
So overall I don't really mind if we make it general or just keep it as you have it here. wdyt @noklam @astrojuanlu? |
I think it would be a nice to have to make it general. I would prefer we can extend it without any breaking change if we are not going to do this now. |
Sorry for the delay, juggling lots of different things. Regarding support for cookiecutter templates it doesn't matter to me but I can make that change. I will note that this kinda goes back to the putting a config option |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am happy about the PR as it. We can always add more support to it if it turns out to be common interest.
I won't worry too much about the Path
issue for now since this is expect to be a CLI arg, even if we later change the type it shouldn't be considered as a breaking change.
DCO & some linting issue failing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much for this contribution @jasonmhite! ⭐ I've read the full conversation and I'm happy to proceed with the implementation like this. We can always make it more general if there's a need for it from users. I'm also happy with not adding validation, but that should be mentioned clearly in the docs so users know what a "valid" template requires.
In terms of tests, it would be good to add some unit tests into tests/framework/cli/pipeline/test_pipeline.py
to verify the expected behaviour.
For docs, I would add a description about this behaviour to https://docs.kedro.org/en/stable/kedro_project_setup/starters.html#how-to-create-a-kedro-starter and perhaps also to https://docs.kedro.org/en/stable/nodes_and_pipelines/modular_pipelines.html @stichbury what are your thoughts on this?
So I've been looking at writing some unit tests. The test framework you guys have set up is fairly intricate, could I get someone more familiar with the harness to help me with writing them? |
Sure, maybe start with creating the test without the implementation to see what tests u want to create? |
Just an update: still working on this, haven't had much time the past couple weeks. |
Thanks for the update, let us know if you need help if get stuck! |
@jasonmhite just checking is this ready to be review again? |
@noklam Yes sorry, I have to find time for this on the side. I wrote what I consider sufficient unit tests, but the way I mocked up injecting the template files during the tests is pretty ugly. I was hoping to maybe clean that up a bit, but actually thinking about it maybe it'd be better if I got some help on that from you. Mocking files is not something I've done much of in pytest. |
* Update on credentials.md Updating example code in credentials.md to make it usable when copy/pasting. Since as of now it will generate a TypeError Signed-off-by: Jose <jmnunezd123@gmail.com> * adding a more explicit code example, now we directly show the user that project_path is a pathlib.Path object Signed-off-by: Jose <jmnunezd123@gmail.com> * Update docs/source/configuration/credentials.md Co-authored-by: Jo Stichbury <jo_stichbury@mckinsey.com> --------- Signed-off-by: Jose <jmnunezd123@gmail.com> Co-authored-by: Jo Stichbury <jo_stichbury@mckinsey.com> Co-authored-by: Nok Lam Chan <nok.lam.chan@quantumblack.com> Signed-off-by: Jason Hite <jasonmhite@gmail.com>
Signed-off-by: Dmitry Sorokin <dmd40in@gmail.com> Signed-off-by: Jason Hite <jasonmhite@gmail.com>
Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> Signed-off-by: Jason Hite <jasonmhite@gmail.com>
Signed-off-by: Jason Hite <jasonmhite@gmail.com>
* Change starters to use OmegaConfigLoader Signed-off-by: lrcouto <laurarccouto@gmail.com> * Fix linter error Signed-off-by: lrcouto <laurarccouto@gmail.com> * Suppress import outside toplevel linting for starters template Signed-off-by: L. R. Couto <laurarccouto@gmail.com> * Fix linter error Signed-off-by: lrcouto <laurarccouto@gmail.com> * Add changes to release notes Signed-off-by: lrcouto <laurarccouto@gmail.com> --------- Signed-off-by: lrcouto <laurarccouto@gmail.com> Signed-off-by: L. R. Couto <laurarccouto@gmail.com> Signed-off-by: Jason Hite <jasonmhite@gmail.com>
… `ipynb_checkpoints` (#2977) * Check plugins implement valid hooks Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com> * Add release note Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com> * Staging work - add custom functions to check hidden folder and files. Tests still failing Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com> * Fix test - checkpoints should use the same environment Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com> * Revert "Check plugins implement valid hooks" This reverts commit f10bede. * Update RELEASE.md Co-authored-by: Ankita Katiyar <110245118+ankatiyar@users.noreply.github.com> * fix lint Signed-off-by: Nok <nok.lam.chan@quantumblack.com> --------- Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com> Signed-off-by: Nok <nok.lam.chan@quantumblack.com> Co-authored-by: Ankita Katiyar <110245118+ankatiyar@users.noreply.github.com> Signed-off-by: Jason Hite <jasonmhite@gmail.com>
#2904) Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com> Signed-off-by: Jason Hite <jasonmhite@gmail.com>
…upport defaults `0` or `None` (#2976) * Add None support to globals Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Add warning when default value is used Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Check keys Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Nok's suggestions Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Create the test to check the non-existing keys Signed-off-by: Nok <nok.lam.chan@quantumblack.com> * add more tests to catch case when global key is not a dict Signed-off-by: Nok <nok.lam.chan@quantumblack.com> * Fix the null test Signed-off-by: Nok <nok.lam.chan@quantumblack.com> * Introduce sentinel value _NO_VALUE Signed-off-by: Nok <nok.lam.chan@quantumblack.com> * rename test Signed-off-by: Nok <nok.lam.chan@quantumblack.com> * Improve error mesasge and raise InterpolationResolutionError when key does not exist and no default Signed-off-by: Nok <nok.lam.chan@quantumblack.com> * Fix non exist default test Signed-off-by: Nok <nok.lam.chan@quantumblack.com> * Fix test Signed-off-by: Nok <nok.lam.chan@quantumblack.com> * Use omegaconf to replace the custom resolving logic Signed-off-by: Nok <nok.lam.chan@quantumblack.com> * uncommented some tests Signed-off-by: Nok <nok.lam.chan@quantumblack.com> * Remove dead code Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Update error message Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> --------- Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> Signed-off-by: Nok <nok.lam.chan@quantumblack.com> Co-authored-by: Nok <nok.lam.chan@quantumblack.com> Signed-off-by: Jason Hite <jasonmhite@gmail.com>
Signed-off-by: Jason Hite <jasonmhite@gmail.com>
* Fix docstrings on kedro/extras/datasets Signed-off-by: lrcouto <laurarccouto@gmail.com> * Fix formatting error Signed-off-by: lrcouto <laurarccouto@gmail.com> * Explicitly define code block Signed-off-by: lrcouto <laurarccouto@gmail.com> * Fix empty line under code block Signed-off-by: lrcouto <laurarccouto@gmail.com> * Fix formatting error Signed-off-by: lrcouto <laurarccouto@gmail.com> * Fix broken link Signed-off-by: lrcouto <laurarccouto@gmail.com> * Bump kedro-datasets version Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Update links in partitioned and incremental datasets Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Update links in partitioned and incremental datasets Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Update links in partitioned dataset Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Update links in partitioned dataset Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Add polars.GenericDataSet to .rst Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> --------- Signed-off-by: lrcouto <laurarccouto@gmail.com> Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> Co-authored-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> Signed-off-by: Jason Hite <jasonmhite@gmail.com>
#2966) * Minor changes to create a PR and test Vale styles Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com> * fix some vale warnings Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com> --------- Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com> Co-authored-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> Signed-off-by: Jason Hite <jasonmhite@gmail.com>
* Resynced Signed-off-by: rxm7706 <95496360+rxm7706@users.noreply.github.com> * restore lower bound for pluggy Signed-off-by: rxm7706 <95496360+rxm7706@users.noreply.github.com> --------- Signed-off-by: rxm7706 <95496360+rxm7706@users.noreply.github.com> Co-authored-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> Signed-off-by: Jason Hite <jasonmhite@gmail.com>
Signed-off-by: rxm7706 <95496360+rxm7706@users.noreply.github.com> Signed-off-by: Jason Hite <jasonmhite@gmail.com>
* Check plugins implement valid hooks Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com> * add Metadtahook Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com> * Fix docs according to comments Signed-off-by: Nok <nok.lam.chan@quantumblack.com> --------- Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com> Signed-off-by: Nok <nok.lam.chan@quantumblack.com> Co-authored-by: Jo Stichbury <jo_stichbury@mckinsey.com> Signed-off-by: Jason Hite <jasonmhite@gmail.com>
Signed-off-by: Jason Hite <jasonmhite@gmail.com>
Signed-off-by: Jason Hite <jasonmhite@gmail.com>
Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> Signed-off-by: Jason Hite <jasonmhite@gmail.com>
Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> Signed-off-by: Jason Hite <jasonmhite@gmail.com>
… tests (#3010) * Bump build version Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Remove telemetry from test default starter Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Add package_name back Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Pin build only for 3.7 Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Try upgrade pip Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Add constraint Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Update for windows Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Run build-reqs with backtracking resolver Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Run build-reqs with backtracking resolver Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> --------- Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> Signed-off-by: Jason Hite <jasonmhite@gmail.com>
Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> Signed-off-by: Jason Hite <jasonmhite@gmail.com>
Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com> Signed-off-by: Jason Hite <jasonmhite@gmail.com>
Signed-off-by: Jason Hite <jasonmhite@gmail.com>
Co-authored-by: Jo Stichbury <jo_stichbury@mckinsey.com> Signed-off-by: Jason Hite <jasonmhite@gmail.com>
Signed-off-by: Jason Hite <jasonmhite@gmail.com>
@ankatiyar Should be fixed now. |
Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this contribution @jasonmhite! ⭐
Implement ability to override the cookiecutter template used to generate pipelines so that the user can customize them. Controlled via
settings.PIPELINE_TEMPLATE_PATH
.Description
The current implementation provides a way to make custom starter templates, however the template used to generate new pipelines is hardcoded. My team would like to be able to customize the way pipelines are generated to a format that is more amenable to our workflow. In particular, we would also like to be able to include this as part of the starter project template, so that our team members can have the new pipeline template automatically when they use our starter. Per the discussion in #2543 this is currently not possible any other way than modifying Kedro because the pipeline template path is hardcoded, hence this PR.
Development notes
This only requires a very small change to
_create_pipeline
inkedro/framework/cli/pipeline.py
. When populating thetemplate_path
variable, my patch first checkssettings.py
to see if variablePIPELINE_TEMPLATE_PATH
has been set. If it has been, it uses that value as the path for the pipeline cookiecutter template. If it is not set, it falls back to the old default ofPath(kedro.__file__).parent / "templates" / "pipeline"
, thus the change should be completely backwards-compatible.I'm marking this PR as a work in progress because I'm not sure if there is a better way to do the check in
settings.py
. I'd also like some feedback on what the preferred way to document and test this feature.Actually implementing a working template is up to the user. This repo demonstrates what I want, and I have tested that this template works perfectly. I have also tested with overriding the pipeline template in an existing project. I have further tested that the patch doesn't break any of my existing projects that don't attempt to override the pipeline template.
It's a bit out of scope for this PR since it's more how you would implement the actual starter template, but as a note you need to make sure cookiecutter doesn't attempt to render the embedded pipeline template in your starter. You can see how I did this in
cookiecutter.json
, but the tl;dr is you add the template path to the_copy_without_render
array like so:Checklist
RELEASE.md
file