-
Notifications
You must be signed in to change notification settings - Fork 61
added DataLoadingConfig to K8sPod message #368
Conversation
Signed-off-by: Daniel Rammer <daniel@union.ai>
@hamersaw anything still needed here until it can be merged? |
@flixr we typically don't merge until we have e2e working. |
I tested this successfully in the sandbox that @wild-endeavor so kindly created and with flyteorg/flytekit#1515 |
Signed-off-by: Daniel Rammer <daniel@union.ai>
Codecov Report
@@ Coverage Diff @@
## master #368 +/- ##
==========================================
+ Coverage 76.71% 79.05% +2.33%
==========================================
Files 18 18
Lines 1426 1227 -199
==========================================
- Hits 1094 970 -124
+ Misses 280 205 -75
Partials 52 52
Flags with carried forward coverage won't be shown. Click here to find out more. see 18 files with indirect coverage changes Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
Signed-off-by: Daniel Rammer <daniel@union.ai>
* added dynamic_job_spec_uri to dynamic workflow metadata and node execution closure (#360) Signed-off-by: Daniel Rammer <daniel@union.ai> * Use TokenCache in ClientCredentialsTokenSourceProvider (#377) * Init customTokenSource.refreshTime (#381) Signed-off-by: Andrew Dye <andrewwdye@gmail.com> * added DataLoadingConfig to K8sPod message (#368) Signed-off-by: Daniel Rammer <daniel@union.ai> * Add Reasons field to TaskExecutionClosure to track time-series of reasons (#382) * added a time-series of reasons to the TaskExecution closure Signed-off-by: Daniel Rammer <daniel@union.ai> * added docs Signed-off-by: Daniel Rammer <daniel@union.ai> * actually finishing docs too Signed-off-by: Daniel Rammer <daniel@union.ai> --------- Signed-off-by: Daniel Rammer <daniel@union.ai> * Create service for runtime metrics (#367) * added span messages Signed-off-by: Daniel Rammer <daniel@union.ai> * added endpoints to service Signed-off-by: Daniel Rammer <daniel@union.ai> * generated mocks Signed-off-by: Daniel Rammer <daniel@union.ai> * removed get task execution metrics rpc Signed-off-by: Daniel Rammer <daniel@union.ai> * added EXECUTION_IDLE category Signed-off-by: Daniel Rammer <daniel@union.ai> * updated PLUGIN_EXECUTION to PLUGIN_RUNTIME Signed-off-by: Daniel Rammer <daniel@union.ai> * removed recorded_at on workflow and node level events Signed-off-by: Daniel Rammer <daniel@union.ai> * added docs for task event reported_at field Signed-off-by: Daniel Rammer <daniel@union.ai> * removed GetNodeExecutionMetrics endpoint - will implement later if necessary Signed-off-by: Daniel Rammer <daniel@union.ai> * updated docs Signed-off-by: Daniel Rammer <daniel@union.ai> * added reported_at for node execution events Signed-off-by: Daniel Rammer <daniel@union.ai> * fixed typo Signed-off-by: Daniel Rammer <daniel@union.ai> * fixed typos and removed dead code Signed-off-by: Daniel Rammer <daniel@union.ai> * updated categories Signed-off-by: Daniel Rammer <daniel@union.ai> * added workflow setup and teardown categories Signed-off-by: Daniel Rammer <daniel@union.ai> * simplified span message and moved to flyteidl.core Signed-off-by: Daniel Rammer <daniel@union.ai> --------- Signed-off-by: Daniel Rammer <daniel@union.ai> * Remove misleading token refresh logic from client credentials token source provider (#383) * Out of core plugin (#378) * Add backend plugin system service Signed-off-by: Kevin Su <pingsutw@apache.org> * Add backend plugin system service Signed-off-by: Kevin Su <pingsutw@apache.org> * nit Signed-off-by: Kevin Su <pingsutw@apache.org> * nit Signed-off-by: Kevin Su <pingsutw@apache.org> * nit Signed-off-by: Kevin Su <pingsutw@apache.org> * nit Signed-off-by: Kevin Su <pingsutw@apache.org> * update state Signed-off-by: Kevin Su <pingsutw@apache.org> * update state Signed-off-by: Kevin Su <pingsutw@apache.org> * dics Signed-off-by: Kevin Su <pingsutw@apache.org> * Remove output prefix from get request Signed-off-by: Kevin Su <pingsutw@apache.org> * update Signed-off-by: Kevin Su <pingsutw@apache.org> * remove prev state Signed-off-by: Kevin Su <pingsutw@apache.org> * update proto Signed-off-by: Kevin Su <pingsutw@apache.org> * remove error message Signed-off-by: Kevin Su <pingsutw@apache.org> * update comment Signed-off-by: Kevin Su <pingsutw@apache.org> * make generate Signed-off-by: Kevin Su <pingsutw@apache.org> * Rename the service Signed-off-by: Kevin Su <pingsutw@apache.org> * nit Signed-off-by: Kevin Su <pingsutw@apache.org> --------- Signed-off-by: Kevin Su <pingsutw@apache.org> * Feat: Add `ElasticConfig` message type for torch elastic training (#394) * Add elastic config args to pytorch proto Signed-off-by: Fabio Graetz <fabiograetz@googlemail.com> * Add elastic config message type for torchrun training Signed-off-by: Fabio Graetz <fabiograetz@googlemail.com> --------- Signed-off-by: Fabio Graetz <fabiograetz@googlemail.com> Co-authored-by: Fabio Grätz <fabiogratz@googlemail.com> Co-authored-by: Ketan Umare <ketan.umare@gmail.com> * Retract 1.4.x (#397) Signed-off-by: eduardo apolinario <eapolinario@users.noreply.github.com> Co-authored-by: eduardo apolinario <eapolinario@users.noreply.github.com> * Data addresses #minor (#391) Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> * Refactor kf-operator plugins configs and support setting different specs for different replica groups (#386) * refactor kubeflow operators proto Signed-off-by: Yubo Wang <yubwang@linkedin.com> * add back the original proto for backward compatible Signed-off-by: Yubo Wang <yubwang@linkedin.com> * clean up comments Signed-off-by: Yubo Wang <yubwang@linkedin.com> * add kubeflow.rs Signed-off-by: Yubo Wang <yubwang@linkedin.com> * add elastic config Signed-off-by: Yubo Wang <yubwang@linkedin.com> * add command to MPI Signed-off-by: Yubo Wang <yubwang@linkedin.com> * add slots and command to mpi spec Signed-off-by: Yubo Wang <yubwang@linkedin.com> --------- Signed-off-by: Yubo Wang <yubwang@linkedin.com> Co-authored-by: Yubo Wang <yubwang@linkedin.com> * add user_identifier (#388) Signed-off-by: byhsu <byhsu@linkedin.com> Signed-off-by: eduardo apolinario <eapolinario@users.noreply.github.com> Co-authored-by: byhsu <byhsu@linkedin.com> Co-authored-by: eduardo apolinario <eapolinario@users.noreply.github.com> * Add envs to execution spec (#400) Signed-off-by: Kevin Su <pingsutw@apache.org> * Support union and none type in flyteidl (#401) * add support for Union Scalar Signed-off-by: Yubo Wang <yubwang@linkedin.com> * support union type and literals Signed-off-by: Yubo Wang <yubwang@linkedin.com> * change union type extraction Signed-off-by: Yubo Wang <yubwang@linkedin.com> --------- Signed-off-by: Yubo Wang <yubwang@linkedin.com> Co-authored-by: Yubo Wang <yubwang@linkedin.com> Co-authored-by: Kevin Su <pingsutw@apache.org> * Rename user_identity to execution_identity (#402) Signed-off-by: byhsu <byhsu@linkedin.com> Co-authored-by: byhsu <byhsu@linkedin.com> * make generate Signed-off-by: eduardo apolinario <eapolinario@users.noreply.github.com> * Revert "Support union and none type in flyteidl (#401)" This reverts commit 3284f61. Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * We should not update flyteidl version in backend components in the case of this branch Signed-off-by: eduardo apolinario <eapolinario@users.noreply.github.com> --------- Signed-off-by: Daniel Rammer <daniel@union.ai> Signed-off-by: Andrew Dye <andrewwdye@gmail.com> Signed-off-by: Kevin Su <pingsutw@apache.org> Signed-off-by: Fabio Graetz <fabiograetz@googlemail.com> Signed-off-by: eduardo apolinario <eapolinario@users.noreply.github.com> Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> Signed-off-by: Yubo Wang <yubwang@linkedin.com> Signed-off-by: byhsu <byhsu@linkedin.com> Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> Co-authored-by: Dan Rammer <daniel@union.ai> Co-authored-by: Andrew Dye <andrewwdye@gmail.com> Co-authored-by: Kevin Su <pingsutw@apache.org> Co-authored-by: Fabio M. Graetz, Ph.D <fabiograetz@googlemail.com> Co-authored-by: Fabio Grätz <fabiogratz@googlemail.com> Co-authored-by: Ketan Umare <ketan.umare@gmail.com> Co-authored-by: eduardo apolinario <eapolinario@users.noreply.github.com> Co-authored-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> Co-authored-by: Yubo Wang <yubowang2019@gmail.com> Co-authored-by: Yubo Wang <yubwang@linkedin.com> Co-authored-by: ByronHsu <byronhsu1230@gmail.com> Co-authored-by: byhsu <byhsu@linkedin.com>
Signed-off-by: Daniel Rammer <daniel@union.ai>
* Adding support for structured dataset (#369) Signed-off-by: pmahindrakar-oss <prafulla.mahindrakar@gmail.com> * added dynamic_job_spec_uri to dynamic workflow metadata and node execution closure (#360) Signed-off-by: Daniel Rammer <daniel@union.ai> * Use TokenCache in ClientCredentialsTokenSourceProvider (#377) * Init customTokenSource.refreshTime (#381) Signed-off-by: Andrew Dye <andrewwdye@gmail.com> * added DataLoadingConfig to K8sPod message (#368) Signed-off-by: Daniel Rammer <daniel@union.ai> * Add Reasons field to TaskExecutionClosure to track time-series of reasons (#382) * added a time-series of reasons to the TaskExecution closure Signed-off-by: Daniel Rammer <daniel@union.ai> * added docs Signed-off-by: Daniel Rammer <daniel@union.ai> * actually finishing docs too Signed-off-by: Daniel Rammer <daniel@union.ai> --------- Signed-off-by: Daniel Rammer <daniel@union.ai> * Create service for runtime metrics (#367) * added span messages Signed-off-by: Daniel Rammer <daniel@union.ai> * added endpoints to service Signed-off-by: Daniel Rammer <daniel@union.ai> * generated mocks Signed-off-by: Daniel Rammer <daniel@union.ai> * removed get task execution metrics rpc Signed-off-by: Daniel Rammer <daniel@union.ai> * added EXECUTION_IDLE category Signed-off-by: Daniel Rammer <daniel@union.ai> * updated PLUGIN_EXECUTION to PLUGIN_RUNTIME Signed-off-by: Daniel Rammer <daniel@union.ai> * removed recorded_at on workflow and node level events Signed-off-by: Daniel Rammer <daniel@union.ai> * added docs for task event reported_at field Signed-off-by: Daniel Rammer <daniel@union.ai> * removed GetNodeExecutionMetrics endpoint - will implement later if necessary Signed-off-by: Daniel Rammer <daniel@union.ai> * updated docs Signed-off-by: Daniel Rammer <daniel@union.ai> * added reported_at for node execution events Signed-off-by: Daniel Rammer <daniel@union.ai> * fixed typo Signed-off-by: Daniel Rammer <daniel@union.ai> * fixed typos and removed dead code Signed-off-by: Daniel Rammer <daniel@union.ai> * updated categories Signed-off-by: Daniel Rammer <daniel@union.ai> * added workflow setup and teardown categories Signed-off-by: Daniel Rammer <daniel@union.ai> * simplified span message and moved to flyteidl.core Signed-off-by: Daniel Rammer <daniel@union.ai> --------- Signed-off-by: Daniel Rammer <daniel@union.ai> * Remove misleading token refresh logic from client credentials token source provider (#383) * Out of core plugin (#378) * Add backend plugin system service Signed-off-by: Kevin Su <pingsutw@apache.org> * Add backend plugin system service Signed-off-by: Kevin Su <pingsutw@apache.org> * nit Signed-off-by: Kevin Su <pingsutw@apache.org> * nit Signed-off-by: Kevin Su <pingsutw@apache.org> * nit Signed-off-by: Kevin Su <pingsutw@apache.org> * nit Signed-off-by: Kevin Su <pingsutw@apache.org> * update state Signed-off-by: Kevin Su <pingsutw@apache.org> * update state Signed-off-by: Kevin Su <pingsutw@apache.org> * dics Signed-off-by: Kevin Su <pingsutw@apache.org> * Remove output prefix from get request Signed-off-by: Kevin Su <pingsutw@apache.org> * update Signed-off-by: Kevin Su <pingsutw@apache.org> * remove prev state Signed-off-by: Kevin Su <pingsutw@apache.org> * update proto Signed-off-by: Kevin Su <pingsutw@apache.org> * remove error message Signed-off-by: Kevin Su <pingsutw@apache.org> * update comment Signed-off-by: Kevin Su <pingsutw@apache.org> * make generate Signed-off-by: Kevin Su <pingsutw@apache.org> * Rename the service Signed-off-by: Kevin Su <pingsutw@apache.org> * nit Signed-off-by: Kevin Su <pingsutw@apache.org> --------- Signed-off-by: Kevin Su <pingsutw@apache.org> * Feat: Add `ElasticConfig` message type for torch elastic training (#394) * Add elastic config args to pytorch proto Signed-off-by: Fabio Graetz <fabiograetz@googlemail.com> * Add elastic config message type for torchrun training Signed-off-by: Fabio Graetz <fabiograetz@googlemail.com> --------- Signed-off-by: Fabio Graetz <fabiograetz@googlemail.com> Co-authored-by: Fabio Grätz <fabiogratz@googlemail.com> Co-authored-by: Ketan Umare <ketan.umare@gmail.com> * Retract 1.4.x (#397) Signed-off-by: eduardo apolinario <eapolinario@users.noreply.github.com> Co-authored-by: eduardo apolinario <eapolinario@users.noreply.github.com> * Data addresses #minor (#391) Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> * Refactor kf-operator plugins configs and support setting different specs for different replica groups (#386) * refactor kubeflow operators proto Signed-off-by: Yubo Wang <yubwang@linkedin.com> * add back the original proto for backward compatible Signed-off-by: Yubo Wang <yubwang@linkedin.com> * clean up comments Signed-off-by: Yubo Wang <yubwang@linkedin.com> * add kubeflow.rs Signed-off-by: Yubo Wang <yubwang@linkedin.com> * add elastic config Signed-off-by: Yubo Wang <yubwang@linkedin.com> * add command to MPI Signed-off-by: Yubo Wang <yubwang@linkedin.com> * add slots and command to mpi spec Signed-off-by: Yubo Wang <yubwang@linkedin.com> --------- Signed-off-by: Yubo Wang <yubwang@linkedin.com> Co-authored-by: Yubo Wang <yubwang@linkedin.com> * add user_identifier (#388) Signed-off-by: byhsu <byhsu@linkedin.com> Signed-off-by: eduardo apolinario <eapolinario@users.noreply.github.com> Co-authored-by: byhsu <byhsu@linkedin.com> Co-authored-by: eduardo apolinario <eapolinario@users.noreply.github.com> * Add envs to execution spec (#400) Signed-off-by: Kevin Su <pingsutw@apache.org> * Support union and none type in flyteidl (#401) * add support for Union Scalar Signed-off-by: Yubo Wang <yubwang@linkedin.com> * support union type and literals Signed-off-by: Yubo Wang <yubwang@linkedin.com> * change union type extraction Signed-off-by: Yubo Wang <yubwang@linkedin.com> --------- Signed-off-by: Yubo Wang <yubwang@linkedin.com> Co-authored-by: Yubo Wang <yubwang@linkedin.com> Co-authored-by: Kevin Su <pingsutw@apache.org> * Rename user_identity to execution_identity (#402) Signed-off-by: byhsu <byhsu@linkedin.com> Co-authored-by: byhsu <byhsu@linkedin.com> * Single literal in GetDataResponse (#404) Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> * Add namespace to execution system metadata (#406) Signed-off-by: Katrina Rogan <katroganGH@gmail.com> * Add oauth2 http proxy client (#405) Signed-off-by: byhsu <byhsu@linkedin.com> * Rename externalPluginService to AgentService (#410) * Rename externalPluginService to AgentService Signed-off-by: Kevin Su <pingsutw@apache.org> * nit Signed-off-by: Kevin Su <pingsutw@apache.org> --------- Signed-off-by: Kevin Su <pingsutw@apache.org> * Add external_plugin_service proto back to the idl (#413) * Add external-plugin-service proto back to the idl Signed-off-by: Kevin Su <pingsutw@apache.org> * update idl Signed-off-by: Kevin Su <pingsutw@apache.org> * update idll Signed-off-by: Kevin Su <pingsutw@apache.org> * update idll Signed-off-by: Kevin Su <pingsutw@apache.org> * AsyncAgentService Signed-off-by: Kevin Su <pingsutw@apache.org> --------- Signed-off-by: Kevin Su <pingsutw@apache.org> * Rerun make generate Signed-off-by: eduardo apolinario <eapolinario@users.noreply.github.com> --------- Signed-off-by: pmahindrakar-oss <prafulla.mahindrakar@gmail.com> Signed-off-by: Daniel Rammer <daniel@union.ai> Signed-off-by: Andrew Dye <andrewwdye@gmail.com> Signed-off-by: Kevin Su <pingsutw@apache.org> Signed-off-by: Fabio Graetz <fabiograetz@googlemail.com> Signed-off-by: eduardo apolinario <eapolinario@users.noreply.github.com> Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> Signed-off-by: Yubo Wang <yubwang@linkedin.com> Signed-off-by: byhsu <byhsu@linkedin.com> Signed-off-by: Katrina Rogan <katroganGH@gmail.com> Co-authored-by: pmahindrakar-oss <prafulla.mahindrakar@gmail.com> Co-authored-by: Dan Rammer <daniel@union.ai> Co-authored-by: Andrew Dye <andrewwdye@gmail.com> Co-authored-by: Kevin Su <pingsutw@apache.org> Co-authored-by: Fabio M. Graetz, Ph.D <fabiograetz@googlemail.com> Co-authored-by: Fabio Grätz <fabiogratz@googlemail.com> Co-authored-by: Ketan Umare <ketan.umare@gmail.com> Co-authored-by: eduardo apolinario <eapolinario@users.noreply.github.com> Co-authored-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> Co-authored-by: Yubo Wang <yubowang2019@gmail.com> Co-authored-by: Yubo Wang <yubwang@linkedin.com> Co-authored-by: ByronHsu <byronhsu1230@gmail.com> Co-authored-by: byhsu <byhsu@linkedin.com> Co-authored-by: Katrina Rogan <katroganGH@gmail.com>
Signed-off-by: Daniel Rammer <daniel@union.ai>
TL;DR
This PR adds the
DataLoadingConfig
information to theK8sPod
proto message so that Flyte can inject copilot as a sidecar in Pod tasks. This allows the use ofPodTemplate
configuration withContainerTasks
.Type
Are all requirements met?
Complete description
^^^
Tracking Issue
flyteorg/flyte#3123
Follow-up issue
NA