Skip to content
This repository has been archived by the owner on Oct 23, 2023. It is now read-only.

added DataLoadingConfig to K8sPod message #368

Merged
merged 2 commits into from
Mar 13, 2023

Conversation

hamersaw
Copy link
Contributor

TL;DR

This PR adds the DataLoadingConfig information to the K8sPod proto message so that Flyte can inject copilot as a sidecar in Pod tasks. This allows the use of PodTemplate configuration with ContainerTasks.

Type

  • Bug Fix
  • Feature
  • Plugin

Are all requirements met?

  • Code completed
  • Smoke tested
  • Unit tests added
  • Code documentation added
  • Any pending items have an associated Issue

Complete description

^^^

Tracking Issue

flyteorg/flyte#3123

Follow-up issue

NA

Signed-off-by: Daniel Rammer <daniel@union.ai>
@flixr
Copy link

flixr commented Mar 2, 2023

@hamersaw anything still needed here until it can be merged?

@hamersaw
Copy link
Contributor Author

hamersaw commented Mar 2, 2023

@flixr we typically don't merge until we have e2e working.

@flixr
Copy link

flixr commented Mar 10, 2023

I tested this successfully in the sandbox that @wild-endeavor so kindly created and with flyteorg/flytekit#1515
IMHO this is good to merge.

@hamersaw hamersaw marked this pull request as ready for review March 10, 2023 20:02
Signed-off-by: Daniel Rammer <daniel@union.ai>
@codecov
Copy link

codecov bot commented Mar 10, 2023

Codecov Report

Merging #368 (8c504f5) into master (3dfcaf6) will increase coverage by 2.33%.
The diff coverage is n/a.

❗ Current head 8c504f5 differs from pull request most recent head f971ff9. Consider uploading reports for the commit f971ff9 to get more accurate results

@@            Coverage Diff             @@
##           master     #368      +/-   ##
==========================================
+ Coverage   76.71%   79.05%   +2.33%     
==========================================
  Files          18       18              
  Lines        1426     1227     -199     
==========================================
- Hits         1094      970     -124     
+ Misses        280      205      -75     
  Partials       52       52              
Flag Coverage Δ
unittests ?

Flags with carried forward coverage won't be shown. Click here to find out more.

see 18 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@hamersaw hamersaw merged commit e1a667e into master Mar 13, 2023
@hamersaw hamersaw deleted the feature/pod-data-loading-config branch March 13, 2023 15:32
eapolinario pushed a commit that referenced this pull request May 16, 2023
Signed-off-by: Daniel Rammer <daniel@union.ai>
eapolinario added a commit that referenced this pull request May 16, 2023
* added dynamic_job_spec_uri to dynamic workflow metadata and node execution closure (#360)

Signed-off-by: Daniel Rammer <daniel@union.ai>

* Use TokenCache in ClientCredentialsTokenSourceProvider (#377)

* Init customTokenSource.refreshTime (#381)

Signed-off-by: Andrew Dye <andrewwdye@gmail.com>

* added DataLoadingConfig to K8sPod message (#368)

Signed-off-by: Daniel Rammer <daniel@union.ai>

* Add Reasons field to TaskExecutionClosure to track time-series of reasons (#382)

* added a time-series of reasons to the TaskExecution closure

Signed-off-by: Daniel Rammer <daniel@union.ai>

* added docs

Signed-off-by: Daniel Rammer <daniel@union.ai>

* actually finishing docs too

Signed-off-by: Daniel Rammer <daniel@union.ai>

---------

Signed-off-by: Daniel Rammer <daniel@union.ai>

* Create service for runtime metrics (#367)

* added span messages

Signed-off-by: Daniel Rammer <daniel@union.ai>

* added endpoints to service

Signed-off-by: Daniel Rammer <daniel@union.ai>

* generated mocks

Signed-off-by: Daniel Rammer <daniel@union.ai>

* removed get task execution metrics rpc

Signed-off-by: Daniel Rammer <daniel@union.ai>

* added EXECUTION_IDLE category

Signed-off-by: Daniel Rammer <daniel@union.ai>

* updated PLUGIN_EXECUTION to PLUGIN_RUNTIME

Signed-off-by: Daniel Rammer <daniel@union.ai>

* removed recorded_at on workflow and node level events

Signed-off-by: Daniel Rammer <daniel@union.ai>

* added docs for task event reported_at field

Signed-off-by: Daniel Rammer <daniel@union.ai>

* removed GetNodeExecutionMetrics endpoint - will implement later if necessary

Signed-off-by: Daniel Rammer <daniel@union.ai>

* updated docs

Signed-off-by: Daniel Rammer <daniel@union.ai>

* added reported_at for node execution events

Signed-off-by: Daniel Rammer <daniel@union.ai>

* fixed typo

Signed-off-by: Daniel Rammer <daniel@union.ai>

* fixed typos and removed dead code

Signed-off-by: Daniel Rammer <daniel@union.ai>

* updated categories

Signed-off-by: Daniel Rammer <daniel@union.ai>

* added workflow setup and teardown categories

Signed-off-by: Daniel Rammer <daniel@union.ai>

* simplified span message and moved to flyteidl.core

Signed-off-by: Daniel Rammer <daniel@union.ai>

---------

Signed-off-by: Daniel Rammer <daniel@union.ai>

* Remove misleading token refresh logic from client credentials token source provider (#383)

* Out of core plugin (#378)

* Add backend plugin system service

Signed-off-by: Kevin Su <pingsutw@apache.org>

* Add backend plugin system service

Signed-off-by: Kevin Su <pingsutw@apache.org>

* nit

Signed-off-by: Kevin Su <pingsutw@apache.org>

* nit

Signed-off-by: Kevin Su <pingsutw@apache.org>

* nit

Signed-off-by: Kevin Su <pingsutw@apache.org>

* nit

Signed-off-by: Kevin Su <pingsutw@apache.org>

* update state

Signed-off-by: Kevin Su <pingsutw@apache.org>

* update state

Signed-off-by: Kevin Su <pingsutw@apache.org>

* dics

Signed-off-by: Kevin Su <pingsutw@apache.org>

* Remove output prefix from get request

Signed-off-by: Kevin Su <pingsutw@apache.org>

* update

Signed-off-by: Kevin Su <pingsutw@apache.org>

* remove prev state

Signed-off-by: Kevin Su <pingsutw@apache.org>

* update proto

Signed-off-by: Kevin Su <pingsutw@apache.org>

* remove error message

Signed-off-by: Kevin Su <pingsutw@apache.org>

* update comment

Signed-off-by: Kevin Su <pingsutw@apache.org>

* make generate

Signed-off-by: Kevin Su <pingsutw@apache.org>

* Rename the service

Signed-off-by: Kevin Su <pingsutw@apache.org>

* nit

Signed-off-by: Kevin Su <pingsutw@apache.org>

---------

Signed-off-by: Kevin Su <pingsutw@apache.org>

* Feat: Add `ElasticConfig` message type for torch elastic training (#394)

* Add elastic config args to pytorch proto

Signed-off-by: Fabio Graetz <fabiograetz@googlemail.com>

* Add elastic config message type for torchrun training

Signed-off-by: Fabio Graetz <fabiograetz@googlemail.com>

---------

Signed-off-by: Fabio Graetz <fabiograetz@googlemail.com>
Co-authored-by: Fabio Grätz <fabiogratz@googlemail.com>
Co-authored-by: Ketan Umare <ketan.umare@gmail.com>

* Retract 1.4.x (#397)

Signed-off-by: eduardo apolinario <eapolinario@users.noreply.github.com>
Co-authored-by: eduardo apolinario <eapolinario@users.noreply.github.com>

* Data addresses #minor (#391)

Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com>

* Refactor kf-operator plugins configs and support setting different specs for different replica groups (#386)

* refactor kubeflow operators proto

Signed-off-by: Yubo Wang <yubwang@linkedin.com>

* add back the original proto for backward compatible

Signed-off-by: Yubo Wang <yubwang@linkedin.com>

* clean up comments

Signed-off-by: Yubo Wang <yubwang@linkedin.com>

* add kubeflow.rs

Signed-off-by: Yubo Wang <yubwang@linkedin.com>

* add elastic config

Signed-off-by: Yubo Wang <yubwang@linkedin.com>

* add command to MPI

Signed-off-by: Yubo Wang <yubwang@linkedin.com>

* add slots and command to mpi spec

Signed-off-by: Yubo Wang <yubwang@linkedin.com>

---------

Signed-off-by: Yubo Wang <yubwang@linkedin.com>
Co-authored-by: Yubo Wang <yubwang@linkedin.com>

* add user_identifier (#388)

Signed-off-by: byhsu <byhsu@linkedin.com>
Signed-off-by: eduardo apolinario <eapolinario@users.noreply.github.com>
Co-authored-by: byhsu <byhsu@linkedin.com>
Co-authored-by: eduardo apolinario <eapolinario@users.noreply.github.com>

* Add envs to execution spec (#400)

Signed-off-by: Kevin Su <pingsutw@apache.org>

* Support union and none type in flyteidl (#401)

* add support for Union Scalar

Signed-off-by: Yubo Wang <yubwang@linkedin.com>

* support union type and literals

Signed-off-by: Yubo Wang <yubwang@linkedin.com>

* change union type extraction

Signed-off-by: Yubo Wang <yubwang@linkedin.com>

---------

Signed-off-by: Yubo Wang <yubwang@linkedin.com>
Co-authored-by: Yubo Wang <yubwang@linkedin.com>
Co-authored-by: Kevin Su <pingsutw@apache.org>

* Rename user_identity to execution_identity (#402)

Signed-off-by: byhsu <byhsu@linkedin.com>
Co-authored-by: byhsu <byhsu@linkedin.com>

* make generate

Signed-off-by: eduardo apolinario <eapolinario@users.noreply.github.com>

* Revert "Support union and none type in flyteidl (#401)"

This reverts commit 3284f61.

Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com>

* We should not update flyteidl version in backend components in the case of this branch

Signed-off-by: eduardo apolinario <eapolinario@users.noreply.github.com>

---------

Signed-off-by: Daniel Rammer <daniel@union.ai>
Signed-off-by: Andrew Dye <andrewwdye@gmail.com>
Signed-off-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: Fabio Graetz <fabiograetz@googlemail.com>
Signed-off-by: eduardo apolinario <eapolinario@users.noreply.github.com>
Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com>
Signed-off-by: Yubo Wang <yubwang@linkedin.com>
Signed-off-by: byhsu <byhsu@linkedin.com>
Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com>
Co-authored-by: Dan Rammer <daniel@union.ai>
Co-authored-by: Andrew Dye <andrewwdye@gmail.com>
Co-authored-by: Kevin Su <pingsutw@apache.org>
Co-authored-by: Fabio M. Graetz, Ph.D <fabiograetz@googlemail.com>
Co-authored-by: Fabio Grätz <fabiogratz@googlemail.com>
Co-authored-by: Ketan Umare <ketan.umare@gmail.com>
Co-authored-by: eduardo apolinario <eapolinario@users.noreply.github.com>
Co-authored-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com>
Co-authored-by: Yubo Wang <yubowang2019@gmail.com>
Co-authored-by: Yubo Wang <yubwang@linkedin.com>
Co-authored-by: ByronHsu <byronhsu1230@gmail.com>
Co-authored-by: byhsu <byhsu@linkedin.com>
eapolinario pushed a commit that referenced this pull request Jun 27, 2023
Signed-off-by: Daniel Rammer <daniel@union.ai>
eapolinario added a commit that referenced this pull request Jun 28, 2023
* Adding support for structured dataset (#369)

Signed-off-by: pmahindrakar-oss <prafulla.mahindrakar@gmail.com>

* added dynamic_job_spec_uri to dynamic workflow metadata and node execution closure (#360)

Signed-off-by: Daniel Rammer <daniel@union.ai>

* Use TokenCache in ClientCredentialsTokenSourceProvider (#377)

* Init customTokenSource.refreshTime (#381)

Signed-off-by: Andrew Dye <andrewwdye@gmail.com>

* added DataLoadingConfig to K8sPod message (#368)

Signed-off-by: Daniel Rammer <daniel@union.ai>

* Add Reasons field to TaskExecutionClosure to track time-series of reasons (#382)

* added a time-series of reasons to the TaskExecution closure

Signed-off-by: Daniel Rammer <daniel@union.ai>

* added docs

Signed-off-by: Daniel Rammer <daniel@union.ai>

* actually finishing docs too

Signed-off-by: Daniel Rammer <daniel@union.ai>

---------

Signed-off-by: Daniel Rammer <daniel@union.ai>

* Create service for runtime metrics (#367)

* added span messages

Signed-off-by: Daniel Rammer <daniel@union.ai>

* added endpoints to service

Signed-off-by: Daniel Rammer <daniel@union.ai>

* generated mocks

Signed-off-by: Daniel Rammer <daniel@union.ai>

* removed get task execution metrics rpc

Signed-off-by: Daniel Rammer <daniel@union.ai>

* added EXECUTION_IDLE category

Signed-off-by: Daniel Rammer <daniel@union.ai>

* updated PLUGIN_EXECUTION to PLUGIN_RUNTIME

Signed-off-by: Daniel Rammer <daniel@union.ai>

* removed recorded_at on workflow and node level events

Signed-off-by: Daniel Rammer <daniel@union.ai>

* added docs for task event reported_at field

Signed-off-by: Daniel Rammer <daniel@union.ai>

* removed GetNodeExecutionMetrics endpoint - will implement later if necessary

Signed-off-by: Daniel Rammer <daniel@union.ai>

* updated docs

Signed-off-by: Daniel Rammer <daniel@union.ai>

* added reported_at for node execution events

Signed-off-by: Daniel Rammer <daniel@union.ai>

* fixed typo

Signed-off-by: Daniel Rammer <daniel@union.ai>

* fixed typos and removed dead code

Signed-off-by: Daniel Rammer <daniel@union.ai>

* updated categories

Signed-off-by: Daniel Rammer <daniel@union.ai>

* added workflow setup and teardown categories

Signed-off-by: Daniel Rammer <daniel@union.ai>

* simplified span message and moved to flyteidl.core

Signed-off-by: Daniel Rammer <daniel@union.ai>

---------

Signed-off-by: Daniel Rammer <daniel@union.ai>

* Remove misleading token refresh logic from client credentials token source provider (#383)

* Out of core plugin (#378)

* Add backend plugin system service

Signed-off-by: Kevin Su <pingsutw@apache.org>

* Add backend plugin system service

Signed-off-by: Kevin Su <pingsutw@apache.org>

* nit

Signed-off-by: Kevin Su <pingsutw@apache.org>

* nit

Signed-off-by: Kevin Su <pingsutw@apache.org>

* nit

Signed-off-by: Kevin Su <pingsutw@apache.org>

* nit

Signed-off-by: Kevin Su <pingsutw@apache.org>

* update state

Signed-off-by: Kevin Su <pingsutw@apache.org>

* update state

Signed-off-by: Kevin Su <pingsutw@apache.org>

* dics

Signed-off-by: Kevin Su <pingsutw@apache.org>

* Remove output prefix from get request

Signed-off-by: Kevin Su <pingsutw@apache.org>

* update

Signed-off-by: Kevin Su <pingsutw@apache.org>

* remove prev state

Signed-off-by: Kevin Su <pingsutw@apache.org>

* update proto

Signed-off-by: Kevin Su <pingsutw@apache.org>

* remove error message

Signed-off-by: Kevin Su <pingsutw@apache.org>

* update comment

Signed-off-by: Kevin Su <pingsutw@apache.org>

* make generate

Signed-off-by: Kevin Su <pingsutw@apache.org>

* Rename the service

Signed-off-by: Kevin Su <pingsutw@apache.org>

* nit

Signed-off-by: Kevin Su <pingsutw@apache.org>

---------

Signed-off-by: Kevin Su <pingsutw@apache.org>

* Feat: Add `ElasticConfig` message type for torch elastic training (#394)

* Add elastic config args to pytorch proto

Signed-off-by: Fabio Graetz <fabiograetz@googlemail.com>

* Add elastic config message type for torchrun training

Signed-off-by: Fabio Graetz <fabiograetz@googlemail.com>

---------

Signed-off-by: Fabio Graetz <fabiograetz@googlemail.com>
Co-authored-by: Fabio Grätz <fabiogratz@googlemail.com>
Co-authored-by: Ketan Umare <ketan.umare@gmail.com>

* Retract 1.4.x (#397)

Signed-off-by: eduardo apolinario <eapolinario@users.noreply.github.com>
Co-authored-by: eduardo apolinario <eapolinario@users.noreply.github.com>

* Data addresses #minor (#391)

Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com>

* Refactor kf-operator plugins configs and support setting different specs for different replica groups (#386)

* refactor kubeflow operators proto

Signed-off-by: Yubo Wang <yubwang@linkedin.com>

* add back the original proto for backward compatible

Signed-off-by: Yubo Wang <yubwang@linkedin.com>

* clean up comments

Signed-off-by: Yubo Wang <yubwang@linkedin.com>

* add kubeflow.rs

Signed-off-by: Yubo Wang <yubwang@linkedin.com>

* add elastic config

Signed-off-by: Yubo Wang <yubwang@linkedin.com>

* add command to MPI

Signed-off-by: Yubo Wang <yubwang@linkedin.com>

* add slots and command to mpi spec

Signed-off-by: Yubo Wang <yubwang@linkedin.com>

---------

Signed-off-by: Yubo Wang <yubwang@linkedin.com>
Co-authored-by: Yubo Wang <yubwang@linkedin.com>

* add user_identifier (#388)

Signed-off-by: byhsu <byhsu@linkedin.com>
Signed-off-by: eduardo apolinario <eapolinario@users.noreply.github.com>
Co-authored-by: byhsu <byhsu@linkedin.com>
Co-authored-by: eduardo apolinario <eapolinario@users.noreply.github.com>

* Add envs to execution spec (#400)

Signed-off-by: Kevin Su <pingsutw@apache.org>

* Support union and none type in flyteidl (#401)

* add support for Union Scalar

Signed-off-by: Yubo Wang <yubwang@linkedin.com>

* support union type and literals

Signed-off-by: Yubo Wang <yubwang@linkedin.com>

* change union type extraction

Signed-off-by: Yubo Wang <yubwang@linkedin.com>

---------

Signed-off-by: Yubo Wang <yubwang@linkedin.com>
Co-authored-by: Yubo Wang <yubwang@linkedin.com>
Co-authored-by: Kevin Su <pingsutw@apache.org>

* Rename user_identity to execution_identity (#402)

Signed-off-by: byhsu <byhsu@linkedin.com>
Co-authored-by: byhsu <byhsu@linkedin.com>

* Single literal in GetDataResponse (#404)

Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com>

* Add namespace to execution system metadata (#406)

Signed-off-by: Katrina Rogan <katroganGH@gmail.com>

* Add oauth2 http proxy client (#405)

Signed-off-by: byhsu <byhsu@linkedin.com>

* Rename externalPluginService to AgentService (#410)

* Rename externalPluginService to AgentService

Signed-off-by: Kevin Su <pingsutw@apache.org>

* nit

Signed-off-by: Kevin Su <pingsutw@apache.org>

---------

Signed-off-by: Kevin Su <pingsutw@apache.org>

* Add external_plugin_service proto back to the idl (#413)

* Add external-plugin-service proto back to the idl

Signed-off-by: Kevin Su <pingsutw@apache.org>

* update idl

Signed-off-by: Kevin Su <pingsutw@apache.org>

* update idll

Signed-off-by: Kevin Su <pingsutw@apache.org>

* update idll

Signed-off-by: Kevin Su <pingsutw@apache.org>

* AsyncAgentService

Signed-off-by: Kevin Su <pingsutw@apache.org>

---------

Signed-off-by: Kevin Su <pingsutw@apache.org>

* Rerun make generate

Signed-off-by: eduardo apolinario <eapolinario@users.noreply.github.com>

---------

Signed-off-by: pmahindrakar-oss <prafulla.mahindrakar@gmail.com>
Signed-off-by: Daniel Rammer <daniel@union.ai>
Signed-off-by: Andrew Dye <andrewwdye@gmail.com>
Signed-off-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: Fabio Graetz <fabiograetz@googlemail.com>
Signed-off-by: eduardo apolinario <eapolinario@users.noreply.github.com>
Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com>
Signed-off-by: Yubo Wang <yubwang@linkedin.com>
Signed-off-by: byhsu <byhsu@linkedin.com>
Signed-off-by: Katrina Rogan <katroganGH@gmail.com>
Co-authored-by: pmahindrakar-oss <prafulla.mahindrakar@gmail.com>
Co-authored-by: Dan Rammer <daniel@union.ai>
Co-authored-by: Andrew Dye <andrewwdye@gmail.com>
Co-authored-by: Kevin Su <pingsutw@apache.org>
Co-authored-by: Fabio M. Graetz, Ph.D <fabiograetz@googlemail.com>
Co-authored-by: Fabio Grätz <fabiogratz@googlemail.com>
Co-authored-by: Ketan Umare <ketan.umare@gmail.com>
Co-authored-by: eduardo apolinario <eapolinario@users.noreply.github.com>
Co-authored-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com>
Co-authored-by: Yubo Wang <yubowang2019@gmail.com>
Co-authored-by: Yubo Wang <yubwang@linkedin.com>
Co-authored-by: ByronHsu <byronhsu1230@gmail.com>
Co-authored-by: byhsu <byhsu@linkedin.com>
Co-authored-by: Katrina Rogan <katroganGH@gmail.com>
eapolinario pushed a commit that referenced this pull request Sep 8, 2023
Signed-off-by: Daniel Rammer <daniel@union.ai>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants