page_type | languages | products | name | description | azureDeploy | |||
---|---|---|---|---|---|---|---|---|
sample |
|
|
Distinct sample skill for cognitive search |
This custom skill removes duplicates from a list of terms. |
This custom skill removes duplicates from a list of terms.
Terms are considered the same if they only differ by casing, separators such as spaces, or punctuation, or if they have a common entry in the thesaurus.
This skill has no additional requirements than the ones described in the root README.md
file.
This function uses a JSON file called thesaurus.json
that can be found at the root of this project, and that will be deployed with the function. This file contains a simple list of lists of synonyms. For each list of synonyms, the first is considered the canonical form. Please replace this file with your own data.
{
"values": [
{
"recordId": "foobar2",
"data":
{
"words": [
"MSFT",
"U.S.A",
"word",
"United states",
"WOrD",
"Microsoft Corp."
]
}
}
]
}
{
"values": [
{
"recordId": "foobar2",
"data": {
"distinct": {
"value": [
"Microsoft",
"USA",
"word"
]
}
},
"errors": [],
"warnings": []
}
]
}
In order to use this skill in a cognitive search pipeline, you'll need to add a skill definition to your skillset. Here's a sample skill definition for this example (inputs and outputs should be updated to reflect your particular scenario and skillset environment):
{
"@odata.type": "#Microsoft.Skills.Custom.WebApiSkill",
"description": "Distinct entities",
"uri": "[AzureFunctionEndpointUrl]/api/link-acronyms-list?code=[AzureFunctionDefaultHostKey]",
"batchSize": 1,
"context": "/document/merged_content",
"inputs": [
{
"name": "words",
"source": "/document/merged_content/organizations"
}
],
"outputs": [
{
"name": "distinct",
"targetName": "distinct_organizations"
}
]
}