-
Notifications
You must be signed in to change notification settings - Fork 901
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add fallback to starters pull on kedro new #3900
Changes from 24 commits
48064c5
791e821
2ffa441
4bd7ec9
41c7a60
aa42e54
11e599b
a53b459
b1b03ea
6aca1f4
2572d3d
6867f92
5088d84
18790ad
15ba4da
d2358af
c3bc522
2a3572f
4a11ed2
8298dbd
a8d3631
661b124
c10b0ba
7f658dd
486671d
3e35466
92b0058
4077bad
c7b261d
4ee2587
5cd8dc7
715afe6
c30e6eb
eaacab8
c279c6a
d6246b3
9ca8844
34e60ba
5a9142c
8e4c8f0
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||
---|---|---|---|---|---|---|---|---|
|
@@ -5,6 +5,7 @@ | |||||||
""" | ||||||||
from __future__ import annotations | ||||||||
|
||||||||
import logging | ||||||||
import os | ||||||||
import re | ||||||||
import shutil | ||||||||
|
@@ -17,6 +18,7 @@ | |||||||
from typing import Any, Callable | ||||||||
|
||||||||
import click | ||||||||
import requests | ||||||||
import yaml | ||||||||
from attrs import define, field | ||||||||
from importlib_metadata import EntryPoints | ||||||||
|
@@ -95,7 +97,44 @@ class KedroStarterSpec: | |||||||
KEDRO_PATH = Path(kedro.__file__).parent | ||||||||
TEMPLATE_PATH = KEDRO_PATH / "templates" / "project" | ||||||||
|
||||||||
_STARTERS_REPO = "git+https://github.com/kedro-org/kedro-starters.git" | ||||||||
|
||||||||
def _get_latest_starters_version() -> str: | ||||||||
if "KEDRO_STARTERS_VERSION" not in os.environ: | ||||||||
GITHUB_TOKEN = os.getenv("GITHUB_TOKEN") | ||||||||
headers = {} | ||||||||
if GITHUB_TOKEN: | ||||||||
headers["Authorization"] = f"token {GITHUB_TOKEN}" | ||||||||
|
||||||||
try: | ||||||||
response = requests.get( | ||||||||
"https://api.github.com/repos/kedro-org/kedro-starters/releases/latest", | ||||||||
headers=headers, | ||||||||
timeout=10, | ||||||||
) | ||||||||
response.raise_for_status() # Raise an HTTPError for bad status codes | ||||||||
latest_release = response.json() | ||||||||
except requests.exceptions.RequestException as e: | ||||||||
logging.error(f"Error fetching kedro-starters latest release version: {e}") | ||||||||
return "" | ||||||||
|
||||||||
os.environ["KEDRO_STARTERS_VERSION"] = latest_release["tag_name"] | ||||||||
return str(latest_release["tag_name"]) | ||||||||
else: | ||||||||
return str(os.getenv("KEDRO_STARTERS_VERSION")) | ||||||||
|
||||||||
|
||||||||
def _kedro_and_starters_version_identical() -> bool: | ||||||||
starters_version = _get_latest_starters_version() | ||||||||
return True if version == starters_version else False | ||||||||
|
||||||||
|
||||||||
_STARTERS_REPO = ( | ||||||||
"git+https://github.com/kedro-org/kedro-starters.git" | ||||||||
if _kedro_and_starters_version_identical() | ||||||||
else "https://github.com/kedro-org/kedro-starters.git@main" | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What happens if we point to the main branch (
Will it then work as we expect I mean creating a project and passing the branch and the checkout version to the cookiecutter? kedro/kedro/framework/cli/starters.py Line 932 in 15ba4da
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This will lead us to cookiecutter shenanigans, which is where these parameters are ultimately used. You have to dig down four or five functions to see it, but cookiecutter is using the template parameter to determine the repo address and the checkout parameter to determine the branch, tag or commit ID. What it does is the following: if clone:
try:
subprocess.check_output( # nosec
[repo_type, 'clone', repo_url],
cwd=clone_to_dir,
stderr=subprocess.STDOUT,
)
if checkout is not None:
checkout_params = [checkout]
# Avoid Mercurial "--config" and "--debugger" injection vulnerability
if repo_type == "hg":
checkout_params.insert(0, "--")
subprocess.check_output( # nosec
[repo_type, 'checkout', *checkout_params],
cwd=repo_dir,
stderr=subprocess.STDOUT,
) So if you passed the repo_url as the address to the main branch and, for example, passed the checkout value as a different branch, cookiecutter would clone the main branch and then checkout to the other branch. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thank you for explaining, that's what I thought as well. Doesn't it mean that in this case instead of kedro/kedro/framework/cli/starters.py Line 344 in 15ba4da
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's a good point, let me test that. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah I think you're right. The stuff passed to cookiecutter has to be formatted in a different way. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was thinking of refactoring the logic in Now, we can just identify the case when repos versions are not matched (as you do for setting the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Or we can just do something like
kedro/kedro/framework/cli/starters.py Line 344 in 15ba4da
Edit: updated link There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could you please comment why do we need that two lines here now:
|
||||||||
) | ||||||||
|
||||||||
|
||||||||
_OFFICIAL_STARTER_SPECS = [ | ||||||||
KedroStarterSpec("astro-airflow-iris", _STARTERS_REPO, "astro-airflow-iris"), | ||||||||
KedroStarterSpec("spaceflights-pandas", _STARTERS_REPO, "spaceflights-pandas"), | ||||||||
|
@@ -766,35 +805,39 @@ def _make_cookiecutter_args_and_fetch_template( | |||||||
"extra_context": config, | ||||||||
} | ||||||||
|
||||||||
if checkout: | ||||||||
cookiecutter_args["checkout"] = checkout | ||||||||
kedro_version_match_starters = _kedro_and_starters_version_identical() | ||||||||
|
||||||||
if directory: | ||||||||
cookiecutter_args["directory"] = directory | ||||||||
|
||||||||
tools = config["tools"] | ||||||||
example_pipeline = config["example_pipeline"] | ||||||||
starter_path = "git+https://github.com/kedro-org/kedro-starters.git" | ||||||||
checkout_version = version if kedro_version_match_starters else "main" | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are we sure that the
Now we're mixing |
||||||||
|
||||||||
if "PySpark" in tools and "Kedro Viz" in tools: | ||||||||
# Use the spaceflights-pyspark-viz starter if both PySpark and Kedro Viz are chosen. | ||||||||
cookiecutter_args["directory"] = "spaceflights-pyspark-viz" | ||||||||
# Ensures we use the same tag version of kedro for kedro-starters | ||||||||
cookiecutter_args["checkout"] = version | ||||||||
cookiecutter_args["checkout"] = checkout_version | ||||||||
elif "PySpark" in tools: | ||||||||
# Use the spaceflights-pyspark starter if only PySpark is chosen. | ||||||||
cookiecutter_args["directory"] = "spaceflights-pyspark" | ||||||||
cookiecutter_args["checkout"] = version | ||||||||
cookiecutter_args["checkout"] = checkout_version | ||||||||
elif "Kedro Viz" in tools: | ||||||||
# Use the spaceflights-pandas-viz starter if only Kedro Viz is chosen. | ||||||||
cookiecutter_args["directory"] = "spaceflights-pandas-viz" | ||||||||
merelcht marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||
cookiecutter_args["checkout"] = checkout_version | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree with Merel, it's better to put
could you please explain checkout logic with that part, what should we do here in terms of checkout? As I understood here we are taking standard kedro template from kedro repo? |
||||||||
elif example_pipeline == "True": | ||||||||
# Use spaceflights-pandas starter if example was selected, but PySpark or Viz wasn't | ||||||||
cookiecutter_args["directory"] = "spaceflights-pandas" | ||||||||
cookiecutter_args["checkout"] = version | ||||||||
cookiecutter_args["checkout"] = checkout_version | ||||||||
else: | ||||||||
# Use the default template path for non PySpark, Viz or example options: | ||||||||
starter_path = template_path | ||||||||
|
||||||||
cookiecutter_args["checkout"] = ( | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Related to the previous comment. If the In this case, there's also a discrepancy with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Shouldn't this just be |
||||||||
checkout if checkout and kedro_version_match_starters else "main" | ||||||||
) | ||||||||
return cookiecutter_args, starter_path | ||||||||
|
||||||||
|
||||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We started discussing the use of environment variables here to optimise time for different runs, such as in CI/CD pipelines. However, I read the following about os.environ:
The environment variable set using os.environ within a Python script is only set for the duration of the current process (i.e., the script execution) and does not persist beyond that. It will not be available to other processes or sessions, and once the script finishes executing, the environment variable will be lost.
If this is correct, it doesn't make sense to set this type of environment variable. Have you tested that it works?