Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plan to improve ci/cd + automated release setup of kedro-plugins #2333

Closed
merelcht opened this issue Feb 28, 2022 · 10 comments
Closed

Plan to improve ci/cd + automated release setup of kedro-plugins #2333

merelcht opened this issue Feb 28, 2022 · 10 comments

Comments

@merelcht
Copy link
Member

merelcht commented Feb 28, 2022

Description

Following up on kedro-org/kedro-plugins#4

Currently kedro-plugins has one top level Makefile, each plugin has it's own .pre-commit-config.yaml and pyproject.toml. Look at whether it would be better for these configuration files to be unified or for each plugin to have it's own files.

@AntonyMilneQB Raised a good question about whether the pre-commit hooks actually still work without the file being at the root of the repo.

For this issue come up with a plan to improve the setup:

  • Decide what parts exactly need to be improved
  • Come up with a proposal of breaking this down and how to implement
  • Discuss in Technical Design when needed
@deepyaman
Copy link
Member

@AntonyMilneQB Raised a good question about whether the pre-commit hooks actually still work without the file being at the root of the repo.

Having .pre-commit-config.yaml at the root of the repo is also recognizable by other tools, such as pre-commit.ci (which we could consider using).

There's no great answer for this, but I've seen pre-commit/pre-commit#466 (comment). Keep in mind that the pre-commit author also says that it's not really designed for monorepos (although it can be made to work).

@noklam raised a good point:

The main drawback is if you are working on the smaller repositories like kedro-airflow , you still have to run the linting for the full datasets repo which takes more time

Maybe it makes sense to just use something like tox to manage the way you run tests in something like this?

@jmholzer
Copy link
Contributor

jmholzer commented Jan 6, 2023

In order to install the Kedro dependency correctly, we need to add pip install -r requirements.txt under install-test-requirements in the case that $plugin is equal to kedro-datasets in the kedro-plugins Makefile.

@deepyaman
Copy link
Member

I would be open to putting together a PoC with tox, if you (others) are interested @jmholzer.

@merelcht merelcht changed the title [KED-3056] Improve configuration setup of kedro-plugins Improve ci/cd setup setup of kedro-plugins Jan 9, 2023
@merelcht merelcht changed the title Improve ci/cd setup setup of kedro-plugins Plan to improve ci/cd + automated release setup of kedro-plugins Jan 9, 2023
@deepyaman deepyaman self-assigned this Jan 9, 2023
@merelcht merelcht added the Stage: Technical Design 🎨 Ticket needs to undergo technical design before implementation label Feb 6, 2023
@deepyaman
Copy link
Member

deepyaman commented Feb 6, 2023

Preface

The below breakdown of challenges and solutions aims to:

  1. provide tactical solutions to existing challenges (primary motivation)
  2. use kedro-plugins as a lightweight testbed for tooling for better practices/other improvements, rather than on the more complex main kedro repo (secondary motivation)

Challenges

  • pre-commit configuration is maintained per-plugin
    • Not ideal because part of the idea of a monorepo is that configuration for consistent styling, etc. can be shared.
    • pre-commit doesn't have first-class monorepo support (e.g. support hierarchical config); see the author's guidance on monorepo support
  • Makefile doesn't support local testing of individual plugins in isolated environments
    • Need to create/manage an environment per plugin (not documented, on your own)
    • Using kedro_plugins conda environment for every plugin in CI is not reasonably transferrable to local environment
  • It's not clear which changes affect which plugins
    • Denoted inconsistently using tags like [kedro-docker] on PR/issues (nonstandard solution)

Proposal

@noklam
Copy link
Contributor

noklam commented Feb 15, 2023

Agree with the challenges but I would order them in reverse order.

  • It's not clear which changes affect which plugins
    • Denoted inconsistently using tags like [kedro-docker] on PR/issues (nonstandard solution)

This creates overhead for maintenance, particularly since the changes are quite sparse (not everyone is familiar with all the plugins, usually one person touch a small part of it). It's quite important to isolate the changes source of error.

  • Makefile doesn't support local testing of individual plugins in isolated environments
    • Need to create/manage an environment per plugin (not documented, on your own)
    • Using kedro_plugins conda environment for every plugin in CI is not reasonably transferrable to local environment

As mentioned, it's not documented and I have a separate conda environment for development. This probably affects us as developers more as most contributors will only touch one of the plugin? Nevertheless, it's great to have good practices

  • pre-commit configuration is maintained per-plugin
    • Not ideal because part of the idea of a monorepo is that configuration for consistent styling, etc. can be shared.
    • pre-commit doesn't have first-class monorepo support (e.g. support hierarchical config); see the author's guidance on monorepo support

I agree this could be an improvement, but I am not sure whether this is the top priority. Most of the configurations are only set once from the beginning and we rarely touch the configuration. It definitely violates the DRY principle, but I don't find the hierarchical config helps a lot.

I would add one more challenge:

  • The CI is not run regularly - for kedro we have frequent changes so any dependencies change will be found out pretty quickly. For plugins, there could be a dependency change but we only have new commits after a few months. In this case, it is unclear whether the new changes break the CI or something in between happens, often it takes a long time to isolate the error.

@merelcht
Copy link
Member Author

merelcht commented Feb 17, 2023

As discussed in technical design on 15/2:

  1. This issue becomes a milestone: https://github.com/kedro-org/kedro/milestone/30
  2. All P0 points are converted into issues
  3. When all P0 issues have been completed, we'll revisited what more needs to be done. Most likely in technical design sessions.

@deepyaman
Copy link
Member

One additional note to keep in mind that I just remembered; we need to address the behavior when multiple plugins are modified (or just make sure this doesn't happen in the future approach):

image

@merelcht merelcht removed the Stage: Technical Design 🎨 Ticket needs to undergo technical design before implementation label Apr 14, 2023
@astrojuanlu
Copy link
Member

Potentially relevant for the automated release https://blog.pypi.org/posts/2023-04-20-introducing-trusted-publishers/

@astrojuanlu
Copy link
Member

I think most of #2333 (comment) has already been done, except for kedro-org/kedro-plugins#512

@merelcht
Copy link
Member Author

merelcht commented Jan 9, 2024

Closing this because the CI/CD of kedro-plugins has been hugely improved already. I created a new milestone for testing the plugins individually: https://github.com/kedro-org/kedro/milestone/60

@merelcht merelcht closed this as completed Jan 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants