Skip to content

Commit

Permalink
Merge pull request #1253 from likebupt/master
Browse files Browse the repository at this point in the history
update aml component spec
  • Loading branch information
miguelgfierro authored Dec 1, 2020
2 parents 26c0cb4 + e19e16c commit 1c34b90
Show file tree
Hide file tree
Showing 8 changed files with 541 additions and 585 deletions.
171 changes: 85 additions & 86 deletions examples/00_quick_start/sar_movieratings_with_azureml_designer.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,9 @@
"\n",
"[AzureML Designer](https://docs.microsoft.com/en-us/azure/machine-learning/concept-designer) lets you visually connect datasets and modules on an interactive canvas to create machine learning models. \n",
"\n",
"![img](https://recodatasets.blob.core.windows.net/images/designer-drag-and-drop.gif)\n",
"One of the features of AzureML Designer is that it is possible for developers to integrate any python library to make it available as a module/component. In this notebook are are going to show how to integrate [SAR](sar_movielens.ipynb) and several other modules in Designer.\n",
"\n",
"One of the features of AzureML Designer is that it is possible for developers to integrate any python library to make it available as a module. In this notebook are are going to show how to integrate [SAR](sar_movielens.ipynb) and several other modules in Designer\n",
"Note that custom module is renamed to component.\n",
"\n",
"\n",
"## Installation\n",
Expand All @@ -24,10 +24,11 @@
"# Uninstall azure-cli-ml (the `az ml` commands)\n",
"az extension remove -n azure-cli-ml\n",
"# Install local version of azure-cli-ml (which includes `az ml module` commands)\n",
"az extension add --source https://azuremlsdktestpypi.azureedge.net/CLI-SDK-Runners-Validation/13082891/azure_cli_ml-0.1.0.13082891-py3-none-any.whl --pip-extra-index-urls https://azuremlsdktestpypi.azureedge.net/CLI-SDK-Runners-Validation/13082891 --yes\n",
"CLI_SDK_VERSION=26005222\n",
"az extension add --source https://azuremlsdktestpypi.azureedge.net/CLI-SDK-Runners-Validation/$CLI_SDK_VERSION/azure_cli_ml-0.1.0.$CLI_SDK_VERSION-py3-none-any.whl --pip-extra-index-urls https://azuremlsdktestpypi.azureedge.net/CLI-SDK-Runners-Validation/$CLI_SDK_VERSION --yes --verbose\n",
"```\n",
"\n",
"## Module implementation\n",
"## Component implementation\n",
"\n",
"The scenario that we are going to reproduce in Designer, as a reference example, is the content of the [SAR quickstart notebook](sar_movielens.ipynb). In it, we load a dataset, split it into train and test sets, train SAR algorithm, predict using the test set and compute several ranking metrics (precision at k, recall at k, MAP and nDCG).\n",
"\n",
Expand Down Expand Up @@ -91,82 +92,76 @@
"Once we have the python entry, we need to create the yaml file that will interact with Designer, [precision_at_k.yaml](../../reco_utils/azureml/azureml_designer_modules/module_specs/precision_at_k.yaml).\n",
"\n",
"```yaml\n",
"moduleIdentifier: \n",
" namespace: microsoft.com/cat\n",
" moduleName: Precision at K\n",
" moduleVersion: 1.1.0\n",
"description: \"Precision at K metric from Recommenders repo: https://github.com/Microsoft/Recommenders.\"\n",
"metadata:\n",
" annotations:\n",
" tags: [\"Recommenders\", \"Metrics\"]\n",
"$schema: http://azureml/sdk-2-0/CommandComponent.json\n",
"name: microsoft.com.cat.precision_at_k\n",
"version: 1.1.1\n",
"display_name: Precision at K\n",
"type: CommandComponent\n",
"description: 'Precision at K metric from Recommenders repo: https://github.com/Microsoft/Recommenders.'\n",
"tags:\n",
" Recommenders:\n",
" Metrics:\n",
"inputs:\n",
"- name: Rating true\n",
" type: DataFrameDirectory\n",
" description: True DataFrame.\n",
"- name: Rating pred\n",
" type: DataFrameDirectory\n",
" description: Predicted DataFrame.\n",
"- name: User column\n",
" type: String\n",
" default: UserId\n",
" description: Column name of user IDs.\n",
"- name: Item column\n",
" type: String\n",
" default: MovieId\n",
" description: Column name of item IDs.\n",
"- name: Rating column\n",
" type: String\n",
" default: Rating\n",
" description: Column name of ratings.\n",
"- name: Prediction column\n",
" type: String\n",
" default: prediction\n",
" description: Column name of predictions.\n",
"- name: Relevancy method\n",
" type: String\n",
" default: top_k\n",
" description: method for determining relevancy ['top_k', 'by_threshold'].\n",
"- name: Top k\n",
" type: Integer\n",
" default: 10\n",
" description: Number of top k items per user.\n",
"- name: Threshold\n",
" type: Float\n",
" default: 10.0\n",
" description: Threshold of top items per user.\n",
" rating_true:\n",
" type: AnyDirectory\n",
" description: True DataFrame.\n",
" optional: false\n",
" rating_pred:\n",
" type: AnyDirectory\n",
" description: Predicted DataFrame.\n",
" optional: false\n",
" user_column:\n",
" type: String\n",
" description: Column name of user IDs.\n",
" default: UserId\n",
" optional: false\n",
" item_column:\n",
" type: String\n",
" description: Column name of item IDs.\n",
" default: MovieId\n",
" optional: false\n",
" rating_column:\n",
" type: String\n",
" description: Column name of ratings.\n",
" default: Rating\n",
" optional: false\n",
" prediction_column:\n",
" type: String\n",
" description: Column name of predictions.\n",
" default: prediction\n",
" optional: false\n",
" relevancy_method:\n",
" type: String\n",
" description: method for determining relevancy ['top_k', 'by_threshold'].\n",
" default: top_k\n",
" optional: false\n",
" top_k:\n",
" type: Integer\n",
" description: Number of top k items per user.\n",
" default: 10\n",
" optional: false\n",
" threshold:\n",
" type: Float\n",
" description: Threshold of top items per user.\n",
" default: 10.0\n",
" optional: false\n",
"outputs:\n",
"- name: Score\n",
" type: DataFrameDirectory\n",
" description: Precision at k (min=0, max=1).\n",
"implementation:\n",
" container:\n",
" amlEnvironment:\n",
" python:\n",
" condaDependenciesFile: sar_conda.yaml\n",
" additionalIncludes:\n",
" - ../../../\n",
" command: [python, reco_utils/azureml/azureml_designer_modules/entries/precision_at_k_entry.py]\n",
" args:\n",
" - --rating-true\n",
" - inputPath: Rating true\n",
" - --rating-pred\n",
" - inputPath: Rating pred\n",
" - --col-user\n",
" - inputValue: User column\n",
" - --col-item\n",
" - inputValue: Item column\n",
" - --col-rating\n",
" - inputValue: Rating column\n",
" - --col-prediction\n",
" - inputValue: Prediction column\n",
" - --relevancy-method\n",
" - inputValue: Relevancy method\n",
" - --k\n",
" - inputValue: Top k\n",
" - --threshold\n",
" - inputValue: Threshold\n",
" - --score-result\n",
" - outputPath: Score\n",
" score:\n",
" type: AnyDirectory\n",
" description: Precision at k (min=0, max=1).\n",
"code:\n",
" ../../../../\n",
"command: >-\n",
" python reco_utils/azureml/azureml_designer_modules/entries/precision_at_k_entry.py\n",
" --rating-true {inputs.rating_true} --rating-pred {inputs.rating_pred} --col-user\n",
" {inputs.user_column} --col-item {inputs.item_column} --col-rating {inputs.rating_column}\n",
" --col-prediction {inputs.prediction_column} --relevancy-method {inputs.relevancy_method}\n",
" --k {inputs.top_k} --threshold {inputs.threshold} --score-result {outputs.score}\n",
"environment:\n",
" conda:\n",
" conda_dependencies_file: reco_utils/azureml/azureml_designer_modules/module_specs/sar_conda.yaml\n",
" os: Linux\n",
"\n",
"```\n",
"\n",
"In the yaml file we can see a number of sections. The heading defines attributes like name, version or description. In the section inputs, all inputs are defined. The two main dataframes have ports, which can be connected to other modules. The inputs without port appear in a canvas menu. The output is defined as a DataFrame as well. The last section, implementation, defines the conda environment, the associated python entry and the arguments to the python file.\n",
Expand Down Expand Up @@ -237,15 +232,15 @@
}
],
"source": [
"# Regsiter modules with spec via Azure CLI\n",
"# Regsiter components with spec via Azure CLI\n",
"root_path = os.path.abspath(os.path.join(os.getcwd(), \"../../\"))\n",
"specs_folder = os.path.join(root_path, \"reco_utils/azureml/azureml_designer_modules/module_specs\")\n",
"github_prefix = 'https://github.com/microsoft/recommenders/blob/master/reco_utils/azureml/azureml_designer_modules/module_specs/'\n",
"specs = os.listdir(specs_folder)\n",
"for spec in specs:\n",
" spec_path = github_prefix + spec\n",
" print(f\"Start to register module spec: {spec} ...\")\n",
" subprocess.run(f\"az ml module register --spec-file {spec_path}\", shell=True)\n",
" print(f\"Start to register component spec: {spec} ...\")\n",
" subprocess.run(f\"az ml component create --file {spec_path}\", shell=True)\n",
" print(f\"Done.\")"
]
},
Expand All @@ -257,7 +252,7 @@
"\n",
"Once the modules are registered, they will appear in the canvas as the module `Recommenders`. There you will be able to create a pipeline like this:\n",
"\n",
"![img](https://recodatasets.blob.core.windows.net/images/azureml_designer_sar_precisionatk.png)\n",
"![img](https://raw.githubusercontent.com/Azure/AzureMachineLearningGallery/main/pipelines/sar-pipeline/sar-pipeline.png)\n",
"\n",
"Now, thanks to AzureML Designer, users can compute the latest state of the art algorithms in recommendation systems without writing a line of python code.\n",
"\n",
Expand All @@ -272,9 +267,13 @@
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
"name": "python3",
"display_name": "Python 3.6.8 64-bit ('test': conda)",
"metadata": {
"interpreter": {
"hash": "ad1389e27ccf93b6cb9b27912fdce5bd72b7d47f7c4b29627ffa9bc4b1e3e5d1"
}
}
},
"language_info": {
"codemirror_mode": {
Expand All @@ -286,7 +285,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.10"
"version": "3.6.8-final"
}
},
"nbformat": 4,
Expand Down
141 changes: 67 additions & 74 deletions reco_utils/azureml/azureml_designer_modules/module_specs/map.yaml
Original file line number Diff line number Diff line change
@@ -1,76 +1,69 @@
amlModuleIdentifier:
namespace: microsoft.com/cat
moduleName: MAP
moduleVersion: 1.1.1
description: "Mean Average Precision at K metric from Recommenders repo: https://github.com/Microsoft/Recommenders."
metadata:
annotations:
tags: ["Recommenders", "Metrics"]
$schema: http://azureml/sdk-2-0/CommandComponent.json
name: microsoft.com.cat.map
version: 1.1.1
display_name: MAP
type: CommandComponent
description: 'Mean Average Precision at K metric from Recommenders repo: https://github.com/Microsoft/Recommenders.'
tags:
Recommenders:
Metrics:
inputs:
- name: Rating true
type: AnyDirectory
description: True DataFrame.
- name: Rating pred
type: AnyDirectory
description: Predicted DataFrame.
- name: User column
type: String
default: UserId
description: Column name of user IDs.
- name: Item column
type: String
default: MovieId
description: Column name of item IDs.
- name: Rating column
type: String
default: Rating
description: Column name of ratings.
- name: Prediction column
type: String
default: prediction
description: Column name of predictions.
- name: Relevancy method
type: String
default: top_k
description: method for determining relevancy ['top_k', 'by_threshold'].
- name: Top k
type: Integer
default: 10
description: Number of top k items per user.
- name: Threshold
type: Float
default: 10.0
description: Threshold of top items per user.
rating_true:
type: AnyDirectory
description: True DataFrame.
optional: false
rating_pred:
type: AnyDirectory
description: Predicted DataFrame.
optional: false
user_column:
type: String
description: Column name of user IDs.
default: UserId
optional: false
item_column:
type: String
description: Column name of item IDs.
default: MovieId
optional: false
rating_column:
type: String
description: Column name of ratings.
default: Rating
optional: false
prediction_column:
type: String
description: Column name of predictions.
default: prediction
optional: false
relevancy_method:
type: String
description: method for determining relevancy ['top_k', 'by_threshold'].
default: top_k
optional: false
top_k:
type: Integer
description: Number of top k items per user.
default: 10
optional: false
threshold:
type: Float
description: Threshold of top items per user.
default: 10.0
optional: false
outputs:
- name: Score
type: AnyDirectory
description: MAP at k (min=0, max=1).
implementation:
container:
amlEnvironment:
python:
condaDependenciesFile: sar_conda.yaml
additionalIncludes:
- ../../../
command: [python, reco_utils/azureml/azureml_designer_modules/entries/map_entry.py]
args:
- --rating-true
- inputPath: Rating true
- --rating-pred
- inputPath: Rating pred
- --col-user
- inputValue: User column
- --col-item
- inputValue: Item column
- --col-rating
- inputValue: Rating column
- --col-prediction
- inputValue: Prediction column
- --relevancy-method
- inputValue: Relevancy method
- --k
- inputValue: Top k
- --threshold
- inputValue: Threshold
- --score-result
- outputPath: Score
score:
type: AnyDirectory
description: MAP at k (min=0, max=1).
code:
../../../../
command: >-
python reco_utils/azureml/azureml_designer_modules/entries/map_entry.py --rating-true
{inputs.rating_true} --rating-pred {inputs.rating_pred} --col-user {inputs.user_column}
--col-item {inputs.item_column} --col-rating {inputs.rating_column} --col-prediction
{inputs.prediction_column} --relevancy-method {inputs.relevancy_method} --k {inputs.top_k}
--threshold {inputs.threshold} --score-result {outputs.score}
environment:
conda:
conda_dependencies_file: reco_utils/azureml/azureml_designer_modules/module_specs/sar_conda.yaml
os: Linux
Loading

0 comments on commit 1c34b90

Please sign in to comment.