AnalyzeTextOptions Class

Specifies some text and analysis components used to break that text into tokens.

Constructor

AnalyzeTextOptions(*args: Any, **kwargs: Any)

Variables

Name Description
text
str

The text to break into tokens. Required.

analyzer_name

The name of the analyzer to use to break the given text. If this parameter is not specified, you must specify a tokenizer instead. The tokenizer and analyzer parameters are mutually exclusive. Known values are: "ar.microsoft", "ar.lucene", "hy.lucene", "bn.microsoft", "eu.lucene", "bg.microsoft", "bg.lucene", "ca.microsoft", "ca.lucene", "zh-Hans.microsoft", "zh-Hans.lucene", "zh-Hant.microsoft", "zh-Hant.lucene", "hr.microsoft", "cs.microsoft", "cs.lucene", "da.microsoft", "da.lucene", "nl.microsoft", "nl.lucene", "en.microsoft", "en.lucene", "et.microsoft", "fi.microsoft", "fi.lucene", "fr.microsoft", "fr.lucene", "gl.lucene", "de.microsoft", "de.lucene", "el.microsoft", "el.lucene", "gu.microsoft", "he.microsoft", "hi.microsoft", "hi.lucene", "hu.microsoft", "hu.lucene", "is.microsoft", "id.microsoft", "id.lucene", "ga.lucene", "it.microsoft", "it.lucene", "ja.microsoft", "ja.lucene", "kn.microsoft", "ko.microsoft", "ko.lucene", "lv.microsoft", "lv.lucene", "lt.microsoft", "ml.microsoft", "ms.microsoft", "mr.microsoft", "nb.microsoft", "no.lucene", "fa.lucene", "pl.microsoft", "pl.lucene", "pt-BR.microsoft", "pt-BR.lucene", "pt-PT.microsoft", "pt-PT.lucene", "pa.microsoft", "ro.microsoft", "ro.lucene", "ru.microsoft", "ru.lucene", "sr-cyrillic.microsoft", "sr-latin.microsoft", "sk.microsoft", "sl.microsoft", "es.microsoft", "es.lucene", "sv.microsoft", "sv.lucene", "ta.microsoft", "te.microsoft", "th.microsoft", "th.lucene", "tr.microsoft", "tr.lucene", "uk.microsoft", "ur.microsoft", "vi.microsoft", "standard.lucene", "standardasciifolding.lucene", "keyword", "pattern", "simple", "stop", and "whitespace".

tokenizer_name

The name of the tokenizer to use to break the given text. If this parameter is not specified, you must specify an analyzer instead. The tokenizer and analyzer parameters are mutually exclusive. Known values are: "classic", "edgeNGram", "keyword_v2", "letter", "lowercase", "microsoft_language_tokenizer", "microsoft_language_stemming_tokenizer", "nGram", "path_hierarchy_v2", "pattern", "standard_v2", "uax_url_email", and "whitespace".

normalizer_name

The name of the normalizer to use to normalize the given text. Known values are: "asciifolding", "elision", "lowercase", "standard", and "uppercase".

token_filters

An optional list of token filters to use when breaking the given text. This parameter can only be set when using the tokenizer parameter.

char_filters

An optional list of character filters to use when breaking the given text. This parameter can only be set when using the tokenizer parameter.

Methods

as_dict

Return a dict that can be turned into json using json.dump.

clear

Remove all items from D.

copy
get

Get the value for key if key is in the dictionary, else default. :param str key: The key to look up. :param any default: The value to return if key is not in the dictionary. Defaults to None :returns: D[k] if k in D, else d. :rtype: any

items
keys
pop

Removes specified key and return the corresponding value. :param str key: The key to pop. :param any default: The value to return if key is not in the dictionary :returns: The value corresponding to the key. :rtype: any :raises KeyError: If key is not found and default is not given.

popitem

Removes and returns some (key, value) pair :returns: The (key, value) pair. :rtype: tuple :raises KeyError: if D is empty.

setdefault

Same as calling D.get(k, d), and setting D[k]=d if k not found :param str key: The key to look up. :param any default: The value to set if key is not in the dictionary :returns: D[k] if k in D, else d. :rtype: any

update

Updates D from mapping/iterable E and F. :param any args: Either a mapping object or an iterable of key-value pairs.

values

as_dict

Return a dict that can be turned into json using json.dump.

as_dict(*, exclude_readonly: bool = False) -> dict[str, Any]

Keyword-Only Parameters

Name Description
exclude_readonly

Whether to remove the readonly properties.

Default value: False

Returns

Type Description

A dict JSON compatible object

clear

Remove all items from D.

clear() -> None

copy

copy() -> Model

get

Get the value for key if key is in the dictionary, else default. :param str key: The key to look up. :param any default: The value to return if key is not in the dictionary. Defaults to None :returns: D[k] if k in D, else d. :rtype: any

get(key: str, default: Any = None) -> Any

Parameters

Name Description
key
Required
default
Default value: None

items

items() -> ItemsView[str, Any]

Returns

Type Description

set-like object providing a view on D's items

keys

keys() -> KeysView[str]

Returns

Type Description

a set-like object providing a view on D's keys

pop

Removes specified key and return the corresponding value. :param str key: The key to pop. :param any default: The value to return if key is not in the dictionary :returns: The value corresponding to the key. :rtype: any :raises KeyError: If key is not found and default is not given.

pop(key: str, default: ~typing.Any = <object object>) -> Any

Parameters

Name Description
key
Required
default

popitem

Removes and returns some (key, value) pair :returns: The (key, value) pair. :rtype: tuple :raises KeyError: if D is empty.

popitem() -> tuple[str, Any]

setdefault

Same as calling D.get(k, d), and setting D[k]=d if k not found :param str key: The key to look up. :param any default: The value to set if key is not in the dictionary :returns: D[k] if k in D, else d. :rtype: any

setdefault(key: str, default: ~typing.Any = <object object>) -> Any

Parameters

Name Description
key
Required
default

update

Updates D from mapping/iterable E and F. :param any args: Either a mapping object or an iterable of key-value pairs.

update(*args: Any, **kwargs: Any) -> None

values

values() -> ValuesView[Any]

Returns

Type Description

an object providing a view on D's values

Attributes

analyzer_name

The name of the analyzer to use to break the given text. If this parameter is not specified, you must specify a tokenizer instead. The tokenizer and analyzer parameters are mutually exclusive. Known values are: "ar.microsoft", "ar.lucene", "hy.lucene", "bn.microsoft", "eu.lucene", "bg.microsoft", "bg.lucene", "ca.microsoft", "ca.lucene", "zh-Hans.microsoft", "zh-Hans.lucene", "zh-Hant.microsoft", "zh-Hant.lucene", "hr.microsoft", "cs.microsoft", "cs.lucene", "da.microsoft", "da.lucene", "nl.microsoft", "nl.lucene", "en.microsoft", "en.lucene", "et.microsoft", "fi.microsoft", "fi.lucene", "fr.microsoft", "fr.lucene", "gl.lucene", "de.microsoft", "de.lucene", "el.microsoft", "el.lucene", "gu.microsoft", "he.microsoft", "hi.microsoft", "hi.lucene", "hu.microsoft", "hu.lucene", "is.microsoft", "id.microsoft", "id.lucene", "ga.lucene", "it.microsoft", "it.lucene", "ja.microsoft", "ja.lucene", "kn.microsoft", "ko.microsoft", "ko.lucene", "lv.microsoft", "lv.lucene", "lt.microsoft", "ml.microsoft", "ms.microsoft", "mr.microsoft", "nb.microsoft", "no.lucene", "fa.lucene", "pl.microsoft", "pl.lucene", "pt-BR.microsoft", "pt-BR.lucene", "pt-PT.microsoft", "pt-PT.lucene", "pa.microsoft", "ro.microsoft", "ro.lucene", "ru.microsoft", "ru.lucene", "sr-cyrillic.microsoft", "sr-latin.microsoft", "sk.microsoft", "sl.microsoft", "es.microsoft", "es.lucene", "sv.microsoft", "sv.lucene", "ta.microsoft", "te.microsoft", "th.microsoft", "th.lucene", "tr.microsoft", "tr.lucene", "uk.microsoft", "ur.microsoft", "vi.microsoft", "standard.lucene", "standardasciifolding.lucene", "keyword", "pattern", "simple", "stop", and "whitespace".

analyzer_name: str | _models.LexicalAnalyzerName | None

char_filters

An optional list of character filters to use when breaking the given text. This parameter can only be set when using the tokenizer parameter.

char_filters: list[typing.Union[str, ForwardRef('_models.CharFilterName')]] | None

normalizer_name

The name of the normalizer to use to normalize the given text. Known values are: "asciifolding", "elision", "lowercase", "standard", and "uppercase".

normalizer_name: str | _models.LexicalNormalizerName | None

text

The text to break into tokens. Required.

text: str

token_filters

An optional list of token filters to use when breaking the given text. This parameter can only be set when using the tokenizer parameter.

token_filters: list[typing.Union[str, ForwardRef('_models.TokenFilterName')]] | None

tokenizer_name

The name of the tokenizer to use to break the given text. If this parameter is not specified, you must specify an analyzer instead. The tokenizer and analyzer parameters are mutually exclusive. Known values are: "classic", "edgeNGram", "keyword_v2", "letter", "lowercase", "microsoft_language_tokenizer", "microsoft_language_stemming_tokenizer", "nGram", "path_hierarchy_v2", "pattern", "standard_v2", "uax_url_email", and "whitespace".

tokenizer_name: str | _models.LexicalTokenizerName | None