Skip to content
This repository has been archived by the owner on Sep 18, 2024. It is now read-only.

All trail jobs status stays on 'waiting' for long time on PAI platform #592

Closed
chicm-ms opened this issue Jan 10, 2019 · 1 comment
Closed
Assignees
Labels

Comments

@chicm-ms
Copy link
Contributor

All trail jobs status stays on 'waiting' until all trial jobs are submitted on PAI training service, while trialConcurrency is large, such as 50, it may take a few minutes to submit all trial jobs. During the period that trial jobs being submitted, the status of trial jobs do not change.

The root cause is:

  1. It is a little bit slow to submit trial job with pai training service.
  2. Trial job status query logic and job submission logic in a same sequential loop.
@chicm-ms
Copy link
Contributor Author

chicm-ms commented May 5, 2019

Fixed.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants