-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Conversation
Merge master
add internal prefix for internal storage methods for clear usage. fix pylint errors minor fixes
rename methods of storageService move trial to a seperated file fix some bugs.
fix openPAI breaking changes
fix minor bugs
to router training service for better understanding.
merge master
trialService is used to support different submission types like AML.
merge master
TrialDispatcher is easier to understand it's purpose.
merge master
Some small fix and merge logic
} | ||
|
||
export class AMLEnvironmentInformation extends EnvironmentInformation { | ||
public amlClient?: AMLClient; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bad indentation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
trial: | ||
command: python3 mnist.py | ||
codeDir: . | ||
computeTarget: ussc40rscl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace it with a placeholder
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
} | ||
const amlEnvironment: AMLEnvironmentInformation = environment as AMLEnvironmentInformation; | ||
const environmentLocalTempFolder = path.join(this.experimentRootDir, this.experimentId, "environment-temp"); | ||
environment.command = `import os\nos.system('${amlEnvironment.command}')`; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs to escape special characters like '
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't need to process this command, this is environment command, not trial's command here.
command: python3 mnist.py | ||
codeDir: . | ||
computeTarget: ussc40rscl | ||
nodeCount: 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each trial will use one node, i.e., all 8 GPUs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed.
Compared with [LocalMode](LocalMode.md) trial configuration in aml mode have these additional keys: | ||
* computeTarget | ||
* required key. The computer cluster name you want to use in your AML workspace. | ||
* nodeCount |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think nodeCount can default to 1 because multi-machine runs are seldom used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, perhaps hide this variable is better, has removed.
command: python3 mnist.py | ||
codeDir: . | ||
computeTarget: ussc40rscl | ||
nodeCount: 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is docker image?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed, missed this variable in doc.
@@ -58,6 +59,8 @@ class TrialDispatcher implements TrainingService { | |||
this.environments = new Map<string, EnvironmentInformation>(); | |||
this.metricsEmitter = new EventEmitter(); | |||
this.experimentId = getExperimentId(); | |||
this.experimentRootDir = getExperimentRootDir(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can be changed to a local variable, as it's used only once in run.
computeTarget: ussc40rscl | ||
nodeCount: 1 | ||
computeTarget: ${replace_to_your_computeTarget} | ||
image: msranni/nni |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is aml installed in this image?
No description provided.