AnalyzeTextOptions Class

Specifies some text and analysis components used to break that text into tokens.

Constructor

AnalyzeTextOptions(*args: Any, **kwargs: Any)

Variables

Name	Description
text	str The text to break into tokens. Required.
analyzer_name	str or LexicalAnalyzerName The name of the analyzer to use to break the given text. If this parameter is not specified, you must specify a tokenizer instead. The tokenizer and analyzer parameters are mutually exclusive. Known values are: "ar.microsoft", "ar.lucene", "hy.lucene", "bn.microsoft", "eu.lucene", "bg.microsoft", "bg.lucene", "ca.microsoft", "ca.lucene", "zh-Hans.microsoft", "zh-Hans.lucene", "zh-Hant.microsoft", "zh-Hant.lucene", "hr.microsoft", "cs.microsoft", "cs.lucene", "da.microsoft", "da.lucene", "nl.microsoft", "nl.lucene", "en.microsoft", "en.lucene", "et.microsoft", "fi.microsoft", "fi.lucene", "fr.microsoft", "fr.lucene", "gl.lucene", "de.microsoft", "de.lucene", "el.microsoft", "el.lucene", "gu.microsoft", "he.microsoft", "hi.microsoft", "hi.lucene", "hu.microsoft", "hu.lucene", "is.microsoft", "id.microsoft", "id.lucene", "ga.lucene", "it.microsoft", "it.lucene", "ja.microsoft", "ja.lucene", "kn.microsoft", "ko.microsoft", "ko.lucene", "lv.microsoft", "lv.lucene", "lt.microsoft", "ml.microsoft", "ms.microsoft", "mr.microsoft", "nb.microsoft", "no.lucene", "fa.lucene", "pl.microsoft", "pl.lucene", "pt-BR.microsoft", "pt-BR.lucene", "pt-PT.microsoft", "pt-PT.lucene", "pa.microsoft", "ro.microsoft", "ro.lucene", "ru.microsoft", "ru.lucene", "sr-cyrillic.microsoft", "sr-latin.microsoft", "sk.microsoft", "sl.microsoft", "es.microsoft", "es.lucene", "sv.microsoft", "sv.lucene", "ta.microsoft", "te.microsoft", "th.microsoft", "th.lucene", "tr.microsoft", "tr.lucene", "uk.microsoft", "ur.microsoft", "vi.microsoft", "standard.lucene", "standardasciifolding.lucene", "keyword", "pattern", "simple", "stop", and "whitespace".
tokenizer_name	str or LexicalTokenizerName The name of the tokenizer to use to break the given text. If this parameter is not specified, you must specify an analyzer instead. The tokenizer and analyzer parameters are mutually exclusive. Known values are: "classic", "edgeNGram", "keyword_v2", "letter", "lowercase", "microsoft_language_tokenizer", "microsoft_language_stemming_tokenizer", "nGram", "path_hierarchy_v2", "pattern", "standard_v2", "uax_url_email", and "whitespace".
normalizer_name	str or LexicalNormalizerName The name of the normalizer to use to normalize the given text. Known values are: "asciifolding", "elision", "lowercase", "standard", and "uppercase".
token_filters	list[str or TokenFilterName] An optional list of token filters to use when breaking the given text. This parameter can only be set when using the tokenizer parameter.
char_filters	list[str or CharFilterName] An optional list of character filters to use when breaking the given text. This parameter can only be set when using the tokenizer parameter.

Methods

as_dict	Return a dict that can be turned into json using json.dump.
clear	Remove all items from D.
copy
get	Get the value for key if key is in the dictionary, else default. :param str key: The key to look up. :param any default: The value to return if key is not in the dictionary. Defaults to None :returns: D[k] if k in D, else d. :rtype: any
items
keys
pop	Removes specified key and return the corresponding value. :param str key: The key to pop. :param any default: The value to return if key is not in the dictionary :returns: The value corresponding to the key. :rtype: any :raises KeyError: If key is not found and default is not given.
popitem	Removes and returns some (key, value) pair :returns: The (key, value) pair. :rtype: tuple :raises KeyError: if D is empty.
setdefault	Same as calling D.get(k, d), and setting D[k]=d if k not found :param str key: The key to look up. :param any default: The value to set if key is not in the dictionary :returns: D[k] if k in D, else d. :rtype: any
update	Updates D from mapping/iterable E and F. :param any args: Either a mapping object or an iterable of key-value pairs.
values

as_dict

Return a dict that can be turned into json using json.dump.

as_dict(*, exclude_readonly: bool = False) -> dict[str, Any]

Keyword-Only Parameters

Name	Description
exclude_readonly	bool Whether to remove the readonly properties. Default value: False

Returns

Type	Description
dict	A dict JSON compatible object

clear

Remove all items from D.

clear() -> None

copy

copy() -> Model

get

Get the value for key if key is in the dictionary, else default. :param str key: The key to look up. :param any default: The value to return if key is not in the dictionary. Defaults to None :returns: D[k] if k in D, else d. :rtype: any

get(key: str, default: Any = None) -> Any

Parameters

Name	Description
key Required
default	Default value: None

items

items() -> ItemsView[str, Any]

Returns

Type	Description
ItemsView	set-like object providing a view on D's items

keys

keys() -> KeysView[str]

Returns

Type	Description
KeysView	a set-like object providing a view on D's keys

pop

Removes specified key and return the corresponding value. :param str key: The key to pop. :param any default: The value to return if key is not in the dictionary :returns: The value corresponding to the key. :rtype: any :raises KeyError: If key is not found and default is not given.

pop(key: str, default: ~typing.Any = <object object>) -> Any

Parameters

Name	Description
key Required
default

popitem

Removes and returns some (key, value) pair :returns: The (key, value) pair. :rtype: tuple :raises KeyError: if D is empty.

popitem() -> tuple[str, Any]

setdefault

Same as calling D.get(k, d), and setting D[k]=d if k not found :param str key: The key to look up. :param any default: The value to set if key is not in the dictionary :returns: D[k] if k in D, else d. :rtype: any

setdefault(key: str, default: ~typing.Any = <object object>) -> Any

Parameters

Name	Description
key Required
default

update

Updates D from mapping/iterable E and F. :param any args: Either a mapping object or an iterable of key-value pairs.

update(*args: Any, **kwargs: Any) -> None

values

values() -> ValuesView[Any]

Returns

Type	Description
ValuesView	an object providing a view on D's values

Attributes

analyzer_name

The name of the analyzer to use to break the given text. If this parameter is not specified, you must specify a tokenizer instead. The tokenizer and analyzer parameters are mutually exclusive. Known values are: "ar.microsoft", "ar.lucene", "hy.lucene", "bn.microsoft", "eu.lucene", "bg.microsoft", "bg.lucene", "ca.microsoft", "ca.lucene", "zh-Hans.microsoft", "zh-Hans.lucene", "zh-Hant.microsoft", "zh-Hant.lucene", "hr.microsoft", "cs.microsoft", "cs.lucene", "da.microsoft", "da.lucene", "nl.microsoft", "nl.lucene", "en.microsoft", "en.lucene", "et.microsoft", "fi.microsoft", "fi.lucene", "fr.microsoft", "fr.lucene", "gl.lucene", "de.microsoft", "de.lucene", "el.microsoft", "el.lucene", "gu.microsoft", "he.microsoft", "hi.microsoft", "hi.lucene", "hu.microsoft", "hu.lucene", "is.microsoft", "id.microsoft", "id.lucene", "ga.lucene", "it.microsoft", "it.lucene", "ja.microsoft", "ja.lucene", "kn.microsoft", "ko.microsoft", "ko.lucene", "lv.microsoft", "lv.lucene", "lt.microsoft", "ml.microsoft", "ms.microsoft", "mr.microsoft", "nb.microsoft", "no.lucene", "fa.lucene", "pl.microsoft", "pl.lucene", "pt-BR.microsoft", "pt-BR.lucene", "pt-PT.microsoft", "pt-PT.lucene", "pa.microsoft", "ro.microsoft", "ro.lucene", "ru.microsoft", "ru.lucene", "sr-cyrillic.microsoft", "sr-latin.microsoft", "sk.microsoft", "sl.microsoft", "es.microsoft", "es.lucene", "sv.microsoft", "sv.lucene", "ta.microsoft", "te.microsoft", "th.microsoft", "th.lucene", "tr.microsoft", "tr.lucene", "uk.microsoft", "ur.microsoft", "vi.microsoft", "standard.lucene", "standardasciifolding.lucene", "keyword", "pattern", "simple", "stop", and "whitespace".

analyzer_name: str | _models.LexicalAnalyzerName | None

char_filters

An optional list of character filters to use when breaking the given text. This parameter can only be set when using the tokenizer parameter.

char_filters: list[typing.Union[str, ForwardRef('_models.CharFilterName')]] | None

normalizer_name

The name of the normalizer to use to normalize the given text. Known values are: "asciifolding", "elision", "lowercase", "standard", and "uppercase".

normalizer_name: str | _models.LexicalNormalizerName | None

text

The text to break into tokens. Required.

text: str

token_filters

An optional list of token filters to use when breaking the given text. This parameter can only be set when using the tokenizer parameter.

token_filters: list[typing.Union[str, ForwardRef('_models.TokenFilterName')]] | None

tokenizer_name

The name of the tokenizer to use to break the given text. If this parameter is not specified, you must specify an analyzer instead. The tokenizer and analyzer parameters are mutually exclusive. Known values are: "classic", "edgeNGram", "keyword_v2", "letter", "lowercase", "microsoft_language_tokenizer", "microsoft_language_stemming_tokenizer", "nGram", "path_hierarchy_v2", "pattern", "standard_v2", "uax_url_email", and "whitespace".

tokenizer_name: str | _models.LexicalTokenizerName | None

Comentarios

¿Le ha resultado útil esta página?