Merge pull request #1253 from likebupt/master

update aml component spec
recommenders-team · Dec 1, 2020 · 1c34b90 · 1c34b90
2 parents 26c0cb4 + e19e16c
commit 1c34b90
Show file tree

Hide file tree

Showing 8 changed files with 541 additions and 585 deletions.
diff --git a/examples/00_quick_start/sar_movieratings_with_azureml_designer.ipynb b/examples/00_quick_start/sar_movieratings_with_azureml_designer.ipynb
@@ -10,9 +10,9 @@
     "\n",
     "[AzureML Designer](https://docs.microsoft.com/en-us/azure/machine-learning/concept-designer) lets you visually connect datasets and modules on an interactive canvas to create machine learning models. \n",
     "\n",
-    "![img](https://recodatasets.blob.core.windows.net/images/designer-drag-and-drop.gif)\n",
+    "One of the features of AzureML Designer is that it is possible for developers to integrate any python library to make it available as a module/component. In this notebook are are going to show how to integrate [SAR](sar_movielens.ipynb) and several other modules in Designer.\n",
     "\n",
-    "One of the features of AzureML Designer is that it is possible for developers to integrate any python library to make it available as a module. In this notebook are are going to show how to integrate [SAR](sar_movielens.ipynb) and several other modules in Designer\n",
+    "Note that custom module is renamed to component.\n",
     "\n",
     "\n",
     "## Installation\n",
@@ -24,10 +24,11 @@
     "# Uninstall azure-cli-ml (the `az ml` commands)\n",
     "az extension remove -n azure-cli-ml\n",
     "# Install local version of azure-cli-ml (which includes `az ml module` commands)\n",
-    "az extension add --source https://azuremlsdktestpypi.azureedge.net/CLI-SDK-Runners-Validation/13082891/azure_cli_ml-0.1.0.13082891-py3-none-any.whl --pip-extra-index-urls https://azuremlsdktestpypi.azureedge.net/CLI-SDK-Runners-Validation/13082891 --yes\n",
+    "CLI_SDK_VERSION=26005222\n",
+    "az extension add --source https://azuremlsdktestpypi.azureedge.net/CLI-SDK-Runners-Validation/$CLI_SDK_VERSION/azure_cli_ml-0.1.0.$CLI_SDK_VERSION-py3-none-any.whl --pip-extra-index-urls https://azuremlsdktestpypi.azureedge.net/CLI-SDK-Runners-Validation/$CLI_SDK_VERSION --yes --verbose\n",
     "```\n",
     "\n",
-    "## Module implementation\n",
+    "## Component implementation\n",
     "\n",
     "The scenario that we are going to reproduce in Designer, as a reference example, is the content of the [SAR quickstart notebook](sar_movielens.ipynb). In it, we load a dataset, split it into train and test sets, train SAR algorithm, predict using the test set and compute several ranking metrics (precision at k, recall at k, MAP and nDCG).\n",
     "\n",
@@ -91,82 +92,76 @@
     "Once we have the python entry, we need to create the yaml file that will interact with Designer, [precision_at_k.yaml](../../reco_utils/azureml/azureml_designer_modules/module_specs/precision_at_k.yaml).\n",
     "\n",
     "```yaml\n",
-    "moduleIdentifier: \n",
-    "  namespace: microsoft.com/cat\n",
-    "  moduleName: Precision at K\n",
-    "  moduleVersion: 1.1.0\n",
-    "description: \"Precision at K metric from Recommenders repo: https://github.com/Microsoft/Recommenders.\"\n",
-    "metadata:\n",
-    "  annotations:\n",
-    "    tags: [\"Recommenders\", \"Metrics\"]\n",
+    "$schema: http://azureml/sdk-2-0/CommandComponent.json\n",
+    "name: microsoft.com.cat.precision_at_k\n",
+    "version: 1.1.1\n",
+    "display_name: Precision at K\n",
+    "type: CommandComponent\n",
+    "description: 'Precision at K metric from Recommenders repo: https://github.com/Microsoft/Recommenders.'\n",
+    "tags:\n",
+    "  Recommenders:\n",
+    "  Metrics:\n",
     "inputs:\n",
-    "- name: Rating true\n",
-    "  type: DataFrameDirectory\n",
-    "  description: True DataFrame.\n",
-    "- name: Rating pred\n",
-    "  type: DataFrameDirectory\n",
-    "  description: Predicted DataFrame.\n",
-    "- name: User column\n",
-    "  type: String\n",
-    "  default: UserId\n",
-    "  description: Column name of user IDs.\n",
-    "- name: Item column\n",
-    "  type: String\n",
-    "  default: MovieId\n",
-    "  description: Column name of item IDs.\n",
-    "- name: Rating column\n",
-    "  type: String\n",
-    "  default: Rating\n",
-    "  description: Column name of ratings.\n",
-    "- name: Prediction column\n",
-    "  type: String\n",
-    "  default: prediction\n",
-    "  description: Column name of predictions.\n",
-    "- name: Relevancy method\n",
-    "  type: String\n",
-    "  default: top_k\n",
-    "  description: method for determining relevancy ['top_k', 'by_threshold'].\n",
-    "- name: Top k\n",
-    "  type: Integer\n",
-    "  default: 10\n",
-    "  description: Number of top k items per user.\n",
-    "- name: Threshold\n",
-    "  type: Float\n",
-    "  default: 10.0\n",
-    "  description: Threshold of top items per user.\n",
+    "  rating_true:\n",
+    "    type: AnyDirectory\n",
+    "    description: True DataFrame.\n",
+    "    optional: false\n",
+    "  rating_pred:\n",
+    "    type: AnyDirectory\n",
+    "    description: Predicted DataFrame.\n",
+    "    optional: false\n",
+    "  user_column:\n",
+    "    type: String\n",
+    "    description: Column name of user IDs.\n",
+    "    default: UserId\n",
+    "    optional: false\n",
+    "  item_column:\n",
+    "    type: String\n",
+    "    description: Column name of item IDs.\n",
+    "    default: MovieId\n",
+    "    optional: false\n",
+    "  rating_column:\n",
+    "    type: String\n",
+    "    description: Column name of ratings.\n",
+    "    default: Rating\n",
+    "    optional: false\n",
+    "  prediction_column:\n",
+    "    type: String\n",
+    "    description: Column name of predictions.\n",
+    "    default: prediction\n",
+    "    optional: false\n",
+    "  relevancy_method:\n",
+    "    type: String\n",
+    "    description: method for determining relevancy ['top_k', 'by_threshold'].\n",
+    "    default: top_k\n",
+    "    optional: false\n",
+    "  top_k:\n",
+    "    type: Integer\n",
+    "    description: Number of top k items per user.\n",
+    "    default: 10\n",
+    "    optional: false\n",
+    "  threshold:\n",
+    "    type: Float\n",
+    "    description: Threshold of top items per user.\n",
+    "    default: 10.0\n",
+    "    optional: false\n",
     "outputs:\n",
-    "- name: Score\n",
-    "  type: DataFrameDirectory\n",
-    "  description: Precision at k (min=0, max=1).\n",
-    "implementation:\n",
-    "  container:\n",
-    "    amlEnvironment:\n",
-    "      python:\n",
-    "        condaDependenciesFile: sar_conda.yaml\n",
-    "    additionalIncludes:\n",
-    "      - ../../../\n",
-    "    command: [python, reco_utils/azureml/azureml_designer_modules/entries/precision_at_k_entry.py]\n",
-    "    args:\n",
-    "    - --rating-true\n",
-    "    - inputPath: Rating true\n",
-    "    - --rating-pred\n",
-    "    - inputPath: Rating pred\n",
-    "    - --col-user\n",
-    "    - inputValue: User column\n",
-    "    - --col-item\n",
-    "    - inputValue: Item column\n",
-    "    - --col-rating\n",
-    "    - inputValue: Rating column\n",
-    "    - --col-prediction\n",
-    "    - inputValue: Prediction column\n",
-    "    - --relevancy-method\n",
-    "    - inputValue: Relevancy method\n",
-    "    - --k\n",
-    "    - inputValue: Top k\n",
-    "    - --threshold\n",
-    "    - inputValue: Threshold\n",
-    "    - --score-result\n",
-    "    - outputPath: Score\n",
+    "  score:\n",
+    "    type: AnyDirectory\n",
+    "    description: Precision at k (min=0, max=1).\n",
+    "code:\n",
+    "  ../../../../\n",
+    "command: >-\n",
+    "  python reco_utils/azureml/azureml_designer_modules/entries/precision_at_k_entry.py\n",
+    "  --rating-true {inputs.rating_true} --rating-pred {inputs.rating_pred} --col-user\n",
+    "  {inputs.user_column} --col-item {inputs.item_column} --col-rating {inputs.rating_column}\n",
+    "  --col-prediction {inputs.prediction_column} --relevancy-method {inputs.relevancy_method}\n",
+    "  --k {inputs.top_k} --threshold {inputs.threshold} --score-result {outputs.score}\n",
+    "environment:\n",
+    "  conda:\n",
+    "    conda_dependencies_file: reco_utils/azureml/azureml_designer_modules/module_specs/sar_conda.yaml\n",
+    "  os: Linux\n",
+    "\n",
     "```\n",
     "\n",
     "In the yaml file we can see a number of sections. The heading defines attributes like name, version or description. In the section inputs, all inputs are defined. The two main dataframes have ports, which can be connected to other modules. The inputs without port appear in a canvas menu. The output is defined as a DataFrame as well. The last section, implementation, defines the conda environment, the associated python entry and the arguments to the python file.\n",
@@ -237,15 +232,15 @@
     }
    ],
    "source": [
-    "# Regsiter modules with spec via Azure CLI\n",
+    "# Regsiter components with spec via Azure CLI\n",
     "root_path = os.path.abspath(os.path.join(os.getcwd(), \"../../\"))\n",
     "specs_folder = os.path.join(root_path, \"reco_utils/azureml/azureml_designer_modules/module_specs\")\n",
     "github_prefix = 'https://github.com/microsoft/recommenders/blob/master/reco_utils/azureml/azureml_designer_modules/module_specs/'\n",
     "specs = os.listdir(specs_folder)\n",
     "for spec in specs:\n",
     "    spec_path = github_prefix + spec\n",
-    "    print(f\"Start to register module spec: {spec} ...\")\n",
-    "    subprocess.run(f\"az ml module register --spec-file {spec_path}\", shell=True)\n",
+    "    print(f\"Start to register component spec: {spec} ...\")\n",
+    "    subprocess.run(f\"az ml component create --file {spec_path}\", shell=True)\n",
     "    print(f\"Done.\")"
    ]
   },
@@ -257,7 +252,7 @@
     "\n",
     "Once the modules are registered, they will appear in the canvas as the module `Recommenders`. There you will be able to create a pipeline like this:\n",
     "\n",
-    "![img](https://recodatasets.blob.core.windows.net/images/azureml_designer_sar_precisionatk.png)\n",
+    "![img](https://raw.githubusercontent.com/Azure/AzureMachineLearningGallery/main/pipelines/sar-pipeline/sar-pipeline.png)\n",
     "\n",
     "Now, thanks to AzureML Designer, users can compute the latest state of the art algorithms in recommendation systems without writing a line of python code.\n",
     "\n",
@@ -272,9 +267,13 @@
  ],
  "metadata": {
   "kernelspec": {
-   "display_name": "Python 3",
-   "language": "python",
-   "name": "python3"
+   "name": "python3",
+   "display_name": "Python 3.6.8 64-bit ('test': conda)",
+   "metadata": {
+    "interpreter": {
+     "hash": "ad1389e27ccf93b6cb9b27912fdce5bd72b7d47f7c4b29627ffa9bc4b1e3e5d1"
+    }
+   }
   },
   "language_info": {
    "codemirror_mode": {
@@ -286,7 +285,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.6.10"
+   "version": "3.6.8-final"
   }
  },
  "nbformat": 4,

diff --git a/reco_utils/azureml/azureml_designer_modules/module_specs/map.yaml b/reco_utils/azureml/azureml_designer_modules/module_specs/map.yaml
@@ -1,76 +1,69 @@
-amlModuleIdentifier: 
-  namespace: microsoft.com/cat
-  moduleName: MAP
-  moduleVersion: 1.1.1
-description: "Mean Average Precision at K metric from Recommenders repo: https://github.com/Microsoft/Recommenders."
-metadata:
-  annotations:
-    tags: ["Recommenders", "Metrics"]
+$schema: http://azureml/sdk-2-0/CommandComponent.json
+name: microsoft.com.cat.map
+version: 1.1.1
+display_name: MAP
+type: CommandComponent
+description: 'Mean Average Precision at K metric from Recommenders repo: https://github.com/Microsoft/Recommenders.'
+tags:
+  Recommenders:
+  Metrics:
 inputs:
-- name: Rating true
-  type: AnyDirectory
-  description: True DataFrame.
-- name: Rating pred
-  type: AnyDirectory
-  description: Predicted DataFrame.
-- name: User column
-  type: String
-  default: UserId
-  description: Column name of user IDs.
-- name: Item column
-  type: String
-  default: MovieId
-  description: Column name of item IDs.
-- name: Rating column
-  type: String
-  default: Rating
-  description: Column name of ratings.
-- name: Prediction column
-  type: String
-  default: prediction
-  description: Column name of predictions.
-- name: Relevancy method
-  type: String
-  default: top_k
-  description: method for determining relevancy ['top_k', 'by_threshold'].
-- name: Top k
-  type: Integer
-  default: 10
-  description: Number of top k items per user.
-- name: Threshold
-  type: Float
-  default: 10.0
-  description: Threshold of top items per user.
+  rating_true:
+    type: AnyDirectory
+    description: True DataFrame.
+    optional: false
+  rating_pred:
+    type: AnyDirectory
+    description: Predicted DataFrame.
+    optional: false
+  user_column:
+    type: String
+    description: Column name of user IDs.
+    default: UserId
+    optional: false
+  item_column:
+    type: String
+    description: Column name of item IDs.
+    default: MovieId
+    optional: false
+  rating_column:
+    type: String
+    description: Column name of ratings.
+    default: Rating
+    optional: false
+  prediction_column:
+    type: String
+    description: Column name of predictions.
+    default: prediction
+    optional: false
+  relevancy_method:
+    type: String
+    description: method for determining relevancy ['top_k', 'by_threshold'].
+    default: top_k
+    optional: false
+  top_k:
+    type: Integer
+    description: Number of top k items per user.
+    default: 10
+    optional: false
+  threshold:
+    type: Float
+    description: Threshold of top items per user.
+    default: 10.0
+    optional: false
 outputs:
-- name: Score
-  type: AnyDirectory
-  description: MAP at k (min=0, max=1).
-implementation:
-  container:
-    amlEnvironment:
-      python:
-        condaDependenciesFile: sar_conda.yaml
-    additionalIncludes:
-      - ../../../
-    command: [python, reco_utils/azureml/azureml_designer_modules/entries/map_entry.py]
-    args:
-    - --rating-true
-    - inputPath: Rating true
-    - --rating-pred
-    - inputPath: Rating pred
-    - --col-user
-    - inputValue: User column
-    - --col-item
-    - inputValue: Item column
-    - --col-rating
-    - inputValue: Rating column
-    - --col-prediction
-    - inputValue: Prediction column
-    - --relevancy-method
-    - inputValue: Relevancy method
-    - --k
-    - inputValue: Top k
-    - --threshold
-    - inputValue: Threshold
-    - --score-result
-    - outputPath: Score
+  score:
+    type: AnyDirectory
+    description: MAP at k (min=0, max=1).
+code:
+  ../../../../
+command: >-
+  python reco_utils/azureml/azureml_designer_modules/entries/map_entry.py --rating-true
+  {inputs.rating_true} --rating-pred {inputs.rating_pred} --col-user {inputs.user_column}
+  --col-item {inputs.item_column} --col-rating {inputs.rating_column} --col-prediction
+  {inputs.prediction_column} --relevancy-method {inputs.relevancy_method} --k {inputs.top_k}
+  --threshold {inputs.threshold} --score-result {outputs.score}
+environment:
+  conda:
+    conda_dependencies_file: reco_utils/azureml/azureml_designer_modules/module_specs/sar_conda.yaml
+  os: Linux