MicrosoftLanguageStemmingTokenizer Class
Divides text using language-specific rules and reduces words to their base forms.
Constructor
MicrosoftLanguageStemmingTokenizer(*args: Any, **kwargs: Any)
Variables
| Name | Description |
|---|---|
|
name
|
The name of the tokenizer. It must only contain letters, digits, spaces, dashes or underscores, can only start and end with alphanumeric characters, and is limited to 128 characters. Required. |
|
max_token_length
|
The maximum token length. Tokens longer than the maximum length are split. Maximum token length that can be used is 300 characters. Tokens longer than 300 characters are first split into tokens of length 300 and then each of those tokens is split based on the max token length set. Default is 255. |
|
is_search_tokenizer
|
A value indicating how the tokenizer is used. Set to true if used as the search tokenizer, set to false if used as the indexing tokenizer. Default is false. |
|
language
|
The language to use. The default is English. Known values are: "arabic", "bangla", "bulgarian", "catalan", "croatian", "czech", "danish", "dutch", "english", "estonian", "finnish", "french", "german", "greek", "gujarati", "hebrew", "hindi", "hungarian", "icelandic", "indonesian", "italian", "kannada", "latvian", "lithuanian", "malay", "malayalam", "marathi", "norwegianBokmaal", "polish", "portuguese", "portugueseBrazilian", "punjabi", "romanian", "russian", "serbianCyrillic", "serbianLatin", "slovak", "slovenian", "spanish", "swedish", "tamil", "telugu", "turkish", "ukrainian", and "urdu". |
|
odata_type
|
A URI fragment specifying the type of tokenizer. Required. Default value is "#Microsoft.Azure.Search.MicrosoftLanguageStemmingTokenizer". |
Methods
| as_dict |
Return a dict that can be turned into json using json.dump. |
| clear |
Remove all items from D. |
| copy | |
| get |
Get the value for key if key is in the dictionary, else default. :param str key: The key to look up. :param any default: The value to return if key is not in the dictionary. Defaults to None :returns: D[k] if k in D, else d. :rtype: any |
| items | |
| keys | |
| pop |
Removes specified key and return the corresponding value. :param str key: The key to pop. :param any default: The value to return if key is not in the dictionary :returns: The value corresponding to the key. :rtype: any :raises KeyError: If key is not found and default is not given. |
| popitem |
Removes and returns some (key, value) pair :returns: The (key, value) pair. :rtype: tuple :raises KeyError: if D is empty. |
| setdefault |
Same as calling D.get(k, d), and setting D[k]=d if k not found :param str key: The key to look up. :param any default: The value to set if key is not in the dictionary :returns: D[k] if k in D, else d. :rtype: any |
| update |
Updates D from mapping/iterable E and F. :param any args: Either a mapping object or an iterable of key-value pairs. |
| values |
as_dict
Return a dict that can be turned into json using json.dump.
as_dict(*, exclude_readonly: bool = False) -> dict[str, Any]
Keyword-Only Parameters
| Name | Description |
|---|---|
|
exclude_readonly
|
Whether to remove the readonly properties. Default value: False
|
Returns
| Type | Description |
|---|---|
|
A dict JSON compatible object |
clear
Remove all items from D.
clear() -> None
copy
copy() -> Model
get
Get the value for key if key is in the dictionary, else default. :param str key: The key to look up. :param any default: The value to return if key is not in the dictionary. Defaults to None :returns: D[k] if k in D, else d. :rtype: any
get(key: str, default: Any = None) -> Any
Parameters
| Name | Description |
|---|---|
|
key
Required
|
|
|
default
|
Default value: None
|
items
items() -> ItemsView[str, Any]
Returns
| Type | Description |
|---|---|
|
set-like object providing a view on D's items |
keys
keys() -> KeysView[str]
Returns
| Type | Description |
|---|---|
|
a set-like object providing a view on D's keys |
pop
Removes specified key and return the corresponding value. :param str key: The key to pop. :param any default: The value to return if key is not in the dictionary :returns: The value corresponding to the key. :rtype: any :raises KeyError: If key is not found and default is not given.
pop(key: str, default: ~typing.Any = <object object>) -> Any
Parameters
| Name | Description |
|---|---|
|
key
Required
|
|
|
default
|
|
popitem
Removes and returns some (key, value) pair :returns: The (key, value) pair. :rtype: tuple :raises KeyError: if D is empty.
popitem() -> tuple[str, Any]
setdefault
Same as calling D.get(k, d), and setting D[k]=d if k not found :param str key: The key to look up. :param any default: The value to set if key is not in the dictionary :returns: D[k] if k in D, else d. :rtype: any
setdefault(key: str, default: ~typing.Any = <object object>) -> Any
Parameters
| Name | Description |
|---|---|
|
key
Required
|
|
|
default
|
|
update
Updates D from mapping/iterable E and F. :param any args: Either a mapping object or an iterable of key-value pairs.
update(*args: Any, **kwargs: Any) -> None
values
values() -> ValuesView[Any]
Returns
| Type | Description |
|---|---|
|
an object providing a view on D's values |
Attributes
is_search_tokenizer
A value indicating how the tokenizer is used. Set to true if used as the search tokenizer, set to false if used as the indexing tokenizer. Default is false.
is_search_tokenizer: bool | None
language
"arabic", "bangla", "bulgarian", "catalan", "croatian", "czech", "danish", "dutch", "english", "estonian", "finnish", "french", "german", "greek", "gujarati", "hebrew", "hindi", "hungarian", "icelandic", "indonesian", "italian", "kannada", "latvian", "lithuanian", "malay", "malayalam", "marathi", "norwegianBokmaal", "polish", "portuguese", "portugueseBrazilian", "punjabi", "romanian", "russian", "serbianCyrillic", "serbianLatin", "slovak", "slovenian", "spanish", "swedish", "tamil", "telugu", "turkish", "ukrainian", and "urdu".
language: str | _models.MicrosoftStemmingTokenizerLanguage | None
max_token_length
The maximum token length. Tokens longer than the maximum length are split. Maximum token length that can be used is 300 characters. Tokens longer than 300 characters are first split into tokens of length 300 and then each of those tokens is split based on the max token length set. Default is 255.
max_token_length: int | None
name
The name of the tokenizer. It must only contain letters, digits, spaces, dashes or underscores, can only start and end with alphanumeric characters, and is limited to 128 characters. Required.
name: str
odata_type
A URI fragment specifying the type of tokenizer. Required. Default value is "#Microsoft.Azure.Search.MicrosoftLanguageStemmingTokenizer".
odata_type: Literal['#Microsoft.Azure.Search.MicrosoftLanguageStemmingTokenizer']