-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Please rework the pipeline interactions with azureml.data.OutputFileDatasetConfig #23565
Comments
Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @Jingshu923, @zhangyd2015, @Frey-Wang. Issue DetailsProblem DescriptionAzure ML Python SDK documentation has provided numerous options to pass data between training pipelines, but currently the recommended option appears to be However, To define the This is extremely convoluted, as it clearly suggests that the Proposed solution
|
@weishengtoh Apologies for the late reply. Thanks for reaching out to us and sharing the feedback. Tagging the Service team to look into this ask. @bandsina @Jingshu923 @zhangyd2015 @Frey-Wang Could you please look into this issue and provide an update once you get a chance ? Awaiting your reply. |
@navba-MSFT , this issue is about azureml, not data factory. Thanks! |
Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @shbijlan. Issue DetailsProblem DescriptionAzure ML Python SDK documentation has provided numerous options to pass data between training pipelines, but currently the recommended option appears to be However, To define the This is extremely convoluted, as it clearly suggests that the Proposed solution
|
@shbijlan Could you please look into this issue and provide an update on this once you get a chance ? Awaiting your reply. |
@weishengtoh , Thanks for your suggestions and sorry for the inconvenience caused. We are also happy to set up a call to learn from your use case and see if components can better suit your scenario. Thanks! |
@likebupt Thanks! The ML component does appear to be a cleaner approach thou I was initially hesitant to use it as it is still under "Preview". Will try to reformat the codes to work with ML components and see how it goes. Cheers :) |
Problem Description
Azure ML Python SDK documentation has provided numerous options to pass data between training pipelines, but currently the recommended option appears to be
azureml.data.OutputFileDatasetConfig
.However,
azureml.data.OutputFileDatasetConfig
has a limitation that it cannot be accepted as a valid input for theinputs
parameter for all the classes inazureml.pipeline.steps
- e.g.PythonScriptStep
andHyperDriveStep
.To define the
OutputFileDatasetConfig
as an input of a pipeline step, the functionas_input()
has to be called on the object, and the function is not called if theOutputFileDatasetConfig
is used as an output of a pipeline step.This is extremely convoluted, as it clearly suggests that the
OutputFileDatasetConfig
was originally designed only as an output to a pipeline step.Proposed solution
OutputFileDatasetConfig
suggests that it is meant only as an output, and it is some kind of a config file to be used by internal classes (which it clearly is not). If the intention is to use it also as the input to downstream pipeline steps then the name should reflect that.inputs
parameter for all classes inazureml.pipeline.steps
. Theazureml.pipeline.core.PipelineData
class allows the user to specify it as both the input and output of a pipeline step. However, it is not the recommended approach.PipelineData
is also a much better name for a class that transfer data between pipeline steps.inputs
andoutputs
parameters for all classes inazureml.pipeline.steps
and enforce that inputs be declared withas_input()
and outputs asas_output()
.The text was updated successfully, but these errors were encountered: