Skip to content
This repository has been archived by the owner on Oct 23, 2023. It is now read-only.

GPU Type #419

Merged
merged 15 commits into from
Sep 20, 2023
Merged

GPU Type #419

merged 15 commits into from
Sep 20, 2023

Conversation

wild-endeavor
Copy link
Contributor

@wild-endeavor wild-endeavor commented Jun 27, 2023

TL;DR

Adds support for passing along extra metadata related to resources (e.g. accelerator type) to the backend

Type

  • Bug Fix
  • Feature
  • Plugin

Are all requirements met?

  • Code completed
  • Smoke tested
  • Unit tests added
  • Code documentation added
  • Any pending items have an associated Issue

Complete description

How did you fix the bug, make the feature etc. Link to any design docs etc

Tracking Issue

Remove the 'fixes' keyword if there will be multiple PRs to fix the linked issue

fixes https://github.com/flyteorg/flyte/issues/

Follow-up issue

NA
OR
https://github.com/flyteorg/flyte/issues/

@codecov
Copy link

codecov bot commented Jun 27, 2023

Codecov Report

Patch coverage has no change and project coverage change: +2.55% 🎉

Comparison is base (7f62a15) 75.92% compared to head (fe70c3d) 78.48%.

❗ Current head fe70c3d differs from pull request most recent head 152a861. Consider uploading reports for the commit 152a861 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #419      +/-   ##
==========================================
+ Coverage   75.92%   78.48%   +2.55%     
==========================================
  Files          18       18              
  Lines        1458     1250     -208     
==========================================
- Hits         1107      981     -126     
+ Misses        294      212      -82     
  Partials       57       57              
Flag Coverage Δ
unittests ?

Flags with carried forward coverage won't be shown. Click here to find out more.

see 18 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

wild-endeavor and others added 13 commits September 18, 2023 08:51
Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com>
Signed-off-by: Jeev B <jeevb@users.noreply.github.com>
* Pass task execution metadata from agent

Signed-off-by: Hongxin Liang <honnix@users.noreply.github.com>

* Add doc

Signed-off-by: Hongxin Liang <honnix@users.noreply.github.com>

* Update protos/flyteidl/admin/agent.proto

Co-authored-by: Kevin Su <pingsutw@gmail.com>
Signed-off-by: Honnix <honnix@users.noreply.github.com>

* Regenerate

---------

Signed-off-by: Hongxin Liang <honnix@users.noreply.github.com>
Signed-off-by: Honnix <honnix@users.noreply.github.com>
Co-authored-by: Kevin Su <pingsutw@gmail.com>
Signed-off-by: Jeev B <jeevb@users.noreply.github.com>
* add tags to execution spec

Signed-off-by: Kevin Su <pingsutw@apache.org>

* add tags to execution spec

Signed-off-by: Kevin Su <pingsutw@apache.org>

* add comment

Signed-off-by: Kevin Su <pingsutw@apache.org>

---------

Signed-off-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: Jeev B <jeevb@users.noreply.github.com>
Signed-off-by: Katrina Rogan <katroganGH@gmail.com>
Signed-off-by: Jeev B <jeevb@users.noreply.github.com>
Signed-off-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: Jeev B <jeevb@users.noreply.github.com>
Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com>
Signed-off-by: Jeev B <jeevb@users.noreply.github.com>
Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com>
Signed-off-by: Jeev B <jeevb@users.noreply.github.com>
Signed-off-by: Jeev B <jeevb@users.noreply.github.com>
Signed-off-by: Daniel Rammer <daniel@union.ai>
Signed-off-by: Jeev B <jeevb@users.noreply.github.com>
Signed-off-by: Jeev B <jeevb@users.noreply.github.com>
Signed-off-by: Jeev B <jeevb@users.noreply.github.com>
Signed-off-by: Jeev B <jeevb@users.noreply.github.com>
Signed-off-by: Jeev B <jeevb@users.noreply.github.com>
@jeevb jeevb marked this pull request as ready for review September 19, 2023 19:41
Signed-off-by: Jeev B <jeevb@users.noreply.github.com>
@jeevb jeevb enabled auto-merge (squash) September 20, 2023 20:05
@jeevb jeevb merged commit ef37788 into master Sep 20, 2023
13 checks passed
@jeevb jeevb deleted the gpu-type branch September 20, 2023 20:11

// Additional metadata associated with resources to allocate to a task
message ResourceMetadata {
oneof accelerator_value {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why oneof? if it's not set, it'll just be nil, do you want to allow nullifying a field? if so maybe that should go into the override message?

Copy link
Contributor

@jeevb jeevb Sep 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thought was to support other accelerator types like TPUs in the future. One option would have been to just add all possible accelerator types as separate fields without the guarantee that oneof provides - that only one type is set at any given point in time.

I actually did not consider nullifying an accelerator specified in the task metadata, in the override. That does sound nice. If we were to do that, would it be a bad idea to add a accelerator_not_set to the oneof?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since oneofs do not change wire format, I would keep it out and kick the can down the road when we add a different accelerator to decide then.

There isn't a great way to nullify in Protos unfortunately...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jeevb added a commit that referenced this pull request Sep 22, 2023
This reverts commit ef37788.
jeevb added a commit that referenced this pull request Sep 22, 2023
This reverts commit ef37788.

Signed-off-by: Jeev B <jeevb@users.noreply.github.com>
jeevb added a commit that referenced this pull request Sep 25, 2023
* Revert "GPU Type (#419)"

This reverts commit ef37788.

Signed-off-by: Jeev B <jeevb@users.noreply.github.com>

* Restore .readthedocs.yml

Signed-off-by: Jeev B <jeevb@users.noreply.github.com>

---------

Signed-off-by: Jeev B <jeevb@users.noreply.github.com>
eapolinario pushed a commit that referenced this pull request Sep 28, 2023
* add field

Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com>
Signed-off-by: Jeev B <jeevb@users.noreply.github.com>

* Pass task execution metadata from agent (#422)

* Pass task execution metadata from agent

Signed-off-by: Hongxin Liang <honnix@users.noreply.github.com>

* Add doc

Signed-off-by: Hongxin Liang <honnix@users.noreply.github.com>

* Update protos/flyteidl/admin/agent.proto

Co-authored-by: Kevin Su <pingsutw@gmail.com>
Signed-off-by: Honnix <honnix@users.noreply.github.com>

* Regenerate

---------

Signed-off-by: Hongxin Liang <honnix@users.noreply.github.com>
Signed-off-by: Honnix <honnix@users.noreply.github.com>
Co-authored-by: Kevin Su <pingsutw@gmail.com>
Signed-off-by: Jeev B <jeevb@users.noreply.github.com>

* Add tags to execution spec (#414)

* add tags to execution spec

Signed-off-by: Kevin Su <pingsutw@apache.org>

* add tags to execution spec

Signed-off-by: Kevin Su <pingsutw@apache.org>

* add comment

Signed-off-by: Kevin Su <pingsutw@apache.org>

---------

Signed-off-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: Jeev B <jeevb@users.noreply.github.com>

* Correct comment for array job max parallelism (#431)

Signed-off-by: Katrina Rogan <katroganGH@gmail.com>
Signed-off-by: Jeev B <jeevb@users.noreply.github.com>

* Add the scalar to the operand (#427)

Signed-off-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: Jeev B <jeevb@users.noreply.github.com>

* add selector

Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com>
Signed-off-by: Jeev B <jeevb@users.noreply.github.com>

* move selectors from container to task metadata

Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com>
Signed-off-by: Jeev B <jeevb@users.noreply.github.com>

* drop only_preferred

Signed-off-by: Jeev B <jeevb@users.noreply.github.com>

* Updating boilerplate to lock golangci-lint version (#435)

Signed-off-by: Daniel Rammer <daniel@union.ai>
Signed-off-by: Jeev B <jeevb@users.noreply.github.com>

* add unpartitioned selector

Signed-off-by: Jeev B <jeevb@users.noreply.github.com>

* refactor

Signed-off-by: Jeev B <jeevb@users.noreply.github.com>

* refactor

Signed-off-by: Jeev B <jeevb@users.noreply.github.com>

* fix oneof names

Signed-off-by: Jeev B <jeevb@users.noreply.github.com>

* add build.os for read the docs

Signed-off-by: Jeev B <jeevb@users.noreply.github.com>

---------

Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com>
Signed-off-by: Jeev B <jeevb@users.noreply.github.com>
Signed-off-by: Hongxin Liang <honnix@users.noreply.github.com>
Signed-off-by: Honnix <honnix@users.noreply.github.com>
Signed-off-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: Katrina Rogan <katroganGH@gmail.com>
Signed-off-by: Daniel Rammer <daniel@union.ai>
Co-authored-by: Honnix <honnix@users.noreply.github.com>
Co-authored-by: Kevin Su <pingsutw@gmail.com>
Co-authored-by: Kevin Su <pingsutw@apache.org>
Co-authored-by: Katrina Rogan <katroganGH@gmail.com>
Co-authored-by: Jeev B <jeevb@users.noreply.github.com>
Co-authored-by: Dan Rammer <daniel@union.ai>
eapolinario pushed a commit that referenced this pull request Sep 28, 2023
* Revert "GPU Type (#419)"

This reverts commit 7bd98a9.

Signed-off-by: Jeev B <jeevb@users.noreply.github.com>

* Restore .readthedocs.yml

Signed-off-by: Jeev B <jeevb@users.noreply.github.com>

---------

Signed-off-by: Jeev B <jeevb@users.noreply.github.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants