page_type | languages | products | name | description | azureDeploy | |||
---|---|---|---|---|---|---|---|---|
sample |
|
|
Tokenizer sample skill for AI search |
This custom skill extracts normalized non-stop words from a text using the ML.NET library. |
This custom skill extracts normalized non-stop words from a text using the ML.NET library.
The language used for stop word removal can be optionally specified with the languageCode
parameter using the ISO 639-1 code. Supported languages are:
- Arabic(ar)
- Czech (cs)
- Danish (da)
- Dutch (nl)
- English (en), is the default language used if none is specified.
- French (fr)
- German (de)
- Italian (it)
- Japanese (ja)
- Norwegian Bokmål (nb)
- Polish (pl)
- Portuguese (pt)
- Spanish (es)
- Swedish (sv)
- Russian (ru)
This skills have no additional requirements than the ones described in the root README.md
file.
{
"values": [
{
"recordId": "record1",
"data": {
"text": "ML.NET's RemoveDefaultStopWords API removes stop words from tHe text/string. It requires the text/string to be tokenized beforehand.",
"languageCode": "en"
}
}
]
}
{
"values": [
{
"recordId": "record1",
"data": {
"words": [
"mlnets",
"removedefaultstopwords",
"api",
"removes",
"stop",
"words",
"textstring",
"requires",
"textstring",
"tokenized"
]
},
"errors": [],
"warnings": []
}
]
}
In order to use this skill in a AI search pipeline, you'll need to add a skill definition to your skillset. Here's a sample skill definition for this example (inputs and outputs should be updated to reflect your particular scenario and skillset environment):
{
"@odata.type": "#Microsoft.Skills.Custom.WebApiSkill",
"description": "Tokenizer",
"uri": "[AzureFunctionEndpointUrl]/api/tokenizer?code=[AzureFunctionDefaultHostKey]",
"batchSize": 1,
"context": "/document/content",
"inputs": [
{
"name": "text",
"source": "/document/content"
},
{
"name": "languageCode",
"source": "document/language"
}
],
"outputs": [
{
"name": "words",
"targetName": "words"
}
]
}