MicrosoftLanguageStemmingTokenizer Class

Divides text using language-specific rules and reduces words to their base forms.

Constructor

MicrosoftLanguageStemmingTokenizer(*args: Any, **kwargs: Any)

Variables

Name	Description
name	str The name of the tokenizer. It must only contain letters, digits, spaces, dashes or underscores, can only start and end with alphanumeric characters, and is limited to 128 characters. Required.
max_token_length	int The maximum token length. Tokens longer than the maximum length are split. Maximum token length that can be used is 300 characters. Tokens longer than 300 characters are first split into tokens of length 300 and then each of those tokens is split based on the max token length set. Default is 255.
is_search_tokenizer	bool A value indicating how the tokenizer is used. Set to true if used as the search tokenizer, set to false if used as the indexing tokenizer. Default is false.
language	str or MicrosoftStemmingTokenizerLanguage The language to use. The default is English. Known values are: "arabic", "bangla", "bulgarian", "catalan", "croatian", "czech", "danish", "dutch", "english", "estonian", "finnish", "french", "german", "greek", "gujarati", "hebrew", "hindi", "hungarian", "icelandic", "indonesian", "italian", "kannada", "latvian", "lithuanian", "malay", "malayalam", "marathi", "norwegianBokmaal", "polish", "portuguese", "portugueseBrazilian", "punjabi", "romanian", "russian", "serbianCyrillic", "serbianLatin", "slovak", "slovenian", "spanish", "swedish", "tamil", "telugu", "turkish", "ukrainian", and "urdu".
odata_type	str A URI fragment specifying the type of tokenizer. Required. Default value is "#Microsoft.Azure.Search.MicrosoftLanguageStemmingTokenizer".

Methods

as_dict	Return a dict that can be turned into json using json.dump.
clear	Remove all items from D.
copy
get	Get the value for key if key is in the dictionary, else default. :param str key: The key to look up. :param any default: The value to return if key is not in the dictionary. Defaults to None :returns: D[k] if k in D, else d. :rtype: any
items
keys
pop	Removes specified key and return the corresponding value. :param str key: The key to pop. :param any default: The value to return if key is not in the dictionary :returns: The value corresponding to the key. :rtype: any :raises KeyError: If key is not found and default is not given.
popitem	Removes and returns some (key, value) pair :returns: The (key, value) pair. :rtype: tuple :raises KeyError: if D is empty.
setdefault	Same as calling D.get(k, d), and setting D[k]=d if k not found :param str key: The key to look up. :param any default: The value to set if key is not in the dictionary :returns: D[k] if k in D, else d. :rtype: any
update	Updates D from mapping/iterable E and F. :param any args: Either a mapping object or an iterable of key-value pairs.
values

as_dict

Return a dict that can be turned into json using json.dump.

as_dict(*, exclude_readonly: bool = False) -> dict[str, Any]

Keyword-Only Parameters

Name	Description
exclude_readonly	bool Whether to remove the readonly properties. Default value: False

Returns

Type	Description
dict	A dict JSON compatible object

clear

Remove all items from D.

clear() -> None

copy

copy() -> Model

get

Get the value for key if key is in the dictionary, else default. :param str key: The key to look up. :param any default: The value to return if key is not in the dictionary. Defaults to None :returns: D[k] if k in D, else d. :rtype: any

get(key: str, default: Any = None) -> Any

Parameters

Name	Description
key Required
default	Default value: None

items

items() -> ItemsView[str, Any]

Returns

Type	Description
ItemsView	set-like object providing a view on D's items

keys

keys() -> KeysView[str]

Returns

Type	Description
KeysView	a set-like object providing a view on D's keys

pop

Removes specified key and return the corresponding value. :param str key: The key to pop. :param any default: The value to return if key is not in the dictionary :returns: The value corresponding to the key. :rtype: any :raises KeyError: If key is not found and default is not given.

pop(key: str, default: ~typing.Any = <object object>) -> Any

Parameters

Name	Description
key Required
default

popitem

Removes and returns some (key, value) pair :returns: The (key, value) pair. :rtype: tuple :raises KeyError: if D is empty.

popitem() -> tuple[str, Any]

setdefault

Same as calling D.get(k, d), and setting D[k]=d if k not found :param str key: The key to look up. :param any default: The value to set if key is not in the dictionary :returns: D[k] if k in D, else d. :rtype: any

setdefault(key: str, default: ~typing.Any = <object object>) -> Any

Parameters

Name	Description
key Required
default

update

Updates D from mapping/iterable E and F. :param any args: Either a mapping object or an iterable of key-value pairs.

update(*args: Any, **kwargs: Any) -> None

values

values() -> ValuesView[Any]

Returns

Type	Description
ValuesView	an object providing a view on D's values

Attributes

is_search_tokenizer

A value indicating how the tokenizer is used. Set to true if used as the search tokenizer, set to false if used as the indexing tokenizer. Default is false.

is_search_tokenizer: bool | None

language

"arabic", "bangla", "bulgarian", "catalan", "croatian", "czech", "danish", "dutch", "english", "estonian", "finnish", "french", "german", "greek", "gujarati", "hebrew", "hindi", "hungarian", "icelandic", "indonesian", "italian", "kannada", "latvian", "lithuanian", "malay", "malayalam", "marathi", "norwegianBokmaal", "polish", "portuguese", "portugueseBrazilian", "punjabi", "romanian", "russian", "serbianCyrillic", "serbianLatin", "slovak", "slovenian", "spanish", "swedish", "tamil", "telugu", "turkish", "ukrainian", and "urdu".

language: str | _models.MicrosoftStemmingTokenizerLanguage | None

max_token_length

The maximum token length. Tokens longer than the maximum length are split. Maximum token length that can be used is 300 characters. Tokens longer than 300 characters are first split into tokens of length 300 and then each of those tokens is split based on the max token length set. Default is 255.

max_token_length: int | None

name

The name of the tokenizer. It must only contain letters, digits, spaces, dashes or underscores, can only start and end with alphanumeric characters, and is limited to 128 characters. Required.

name: str

odata_type

A URI fragment specifying the type of tokenizer. Required. Default value is "#Microsoft.Azure.Search.MicrosoftLanguageStemmingTokenizer".

odata_type: Literal['#Microsoft.Azure.Search.MicrosoftLanguageStemmingTokenizer']

Commenti e suggerimenti

Questa pagina è stata utile?