AnalyzeTextOptions Class
Specifies some text and analysis components used to break that text into tokens.
Constructor
AnalyzeTextOptions(*args: Any, **kwargs: Any)
Variables
| Name | Description |
|---|---|
|
text
|
The text to break into tokens. Required. |
|
analyzer_name
|
The name of the analyzer to use to break the given text. If this parameter is not specified, you must specify a tokenizer instead. The tokenizer and analyzer parameters are mutually exclusive. Known values are: "ar.microsoft", "ar.lucene", "hy.lucene", "bn.microsoft", "eu.lucene", "bg.microsoft", "bg.lucene", "ca.microsoft", "ca.lucene", "zh-Hans.microsoft", "zh-Hans.lucene", "zh-Hant.microsoft", "zh-Hant.lucene", "hr.microsoft", "cs.microsoft", "cs.lucene", "da.microsoft", "da.lucene", "nl.microsoft", "nl.lucene", "en.microsoft", "en.lucene", "et.microsoft", "fi.microsoft", "fi.lucene", "fr.microsoft", "fr.lucene", "gl.lucene", "de.microsoft", "de.lucene", "el.microsoft", "el.lucene", "gu.microsoft", "he.microsoft", "hi.microsoft", "hi.lucene", "hu.microsoft", "hu.lucene", "is.microsoft", "id.microsoft", "id.lucene", "ga.lucene", "it.microsoft", "it.lucene", "ja.microsoft", "ja.lucene", "kn.microsoft", "ko.microsoft", "ko.lucene", "lv.microsoft", "lv.lucene", "lt.microsoft", "ml.microsoft", "ms.microsoft", "mr.microsoft", "nb.microsoft", "no.lucene", "fa.lucene", "pl.microsoft", "pl.lucene", "pt-BR.microsoft", "pt-BR.lucene", "pt-PT.microsoft", "pt-PT.lucene", "pa.microsoft", "ro.microsoft", "ro.lucene", "ru.microsoft", "ru.lucene", "sr-cyrillic.microsoft", "sr-latin.microsoft", "sk.microsoft", "sl.microsoft", "es.microsoft", "es.lucene", "sv.microsoft", "sv.lucene", "ta.microsoft", "te.microsoft", "th.microsoft", "th.lucene", "tr.microsoft", "tr.lucene", "uk.microsoft", "ur.microsoft", "vi.microsoft", "standard.lucene", "standardasciifolding.lucene", "keyword", "pattern", "simple", "stop", and "whitespace". |
|
tokenizer_name
|
The name of the tokenizer to use to break the given text. If this parameter is not specified, you must specify an analyzer instead. The tokenizer and analyzer parameters are mutually exclusive. Known values are: "classic", "edgeNGram", "keyword_v2", "letter", "lowercase", "microsoft_language_tokenizer", "microsoft_language_stemming_tokenizer", "nGram", "path_hierarchy_v2", "pattern", "standard_v2", "uax_url_email", and "whitespace". |
|
normalizer_name
|
The name of the normalizer to use to normalize the given text. Known values are: "asciifolding", "elision", "lowercase", "standard", and "uppercase". |
|
token_filters
|
An optional list of token filters to use when breaking the given text. This parameter can only be set when using the tokenizer parameter. |
|
char_filters
|
An optional list of character filters to use when breaking the given text. This parameter can only be set when using the tokenizer parameter. |
Methods
| as_dict |
Return a dict that can be turned into json using json.dump. |
| clear |
Remove all items from D. |
| copy | |
| get |
Get the value for key if key is in the dictionary, else default. :param str key: The key to look up. :param any default: The value to return if key is not in the dictionary. Defaults to None :returns: D[k] if k in D, else d. :rtype: any |
| items | |
| keys | |
| pop |
Removes specified key and return the corresponding value. :param str key: The key to pop. :param any default: The value to return if key is not in the dictionary :returns: The value corresponding to the key. :rtype: any :raises KeyError: If key is not found and default is not given. |
| popitem |
Removes and returns some (key, value) pair :returns: The (key, value) pair. :rtype: tuple :raises KeyError: if D is empty. |
| setdefault |
Same as calling D.get(k, d), and setting D[k]=d if k not found :param str key: The key to look up. :param any default: The value to set if key is not in the dictionary :returns: D[k] if k in D, else d. :rtype: any |
| update |
Updates D from mapping/iterable E and F. :param any args: Either a mapping object or an iterable of key-value pairs. |
| values |
as_dict
Return a dict that can be turned into json using json.dump.
as_dict(*, exclude_readonly: bool = False) -> dict[str, Any]
Keyword-Only Parameters
| Name | Description |
|---|---|
|
exclude_readonly
|
Whether to remove the readonly properties. Default value: False
|
Returns
| Type | Description |
|---|---|
|
A dict JSON compatible object |
clear
Remove all items from D.
clear() -> None
copy
copy() -> Model
get
Get the value for key if key is in the dictionary, else default. :param str key: The key to look up. :param any default: The value to return if key is not in the dictionary. Defaults to None :returns: D[k] if k in D, else d. :rtype: any
get(key: str, default: Any = None) -> Any
Parameters
| Name | Description |
|---|---|
|
key
Required
|
|
|
default
|
Default value: None
|
items
items() -> ItemsView[str, Any]
Returns
| Type | Description |
|---|---|
|
set-like object providing a view on D's items |
keys
keys() -> KeysView[str]
Returns
| Type | Description |
|---|---|
|
a set-like object providing a view on D's keys |
pop
Removes specified key and return the corresponding value. :param str key: The key to pop. :param any default: The value to return if key is not in the dictionary :returns: The value corresponding to the key. :rtype: any :raises KeyError: If key is not found and default is not given.
pop(key: str, default: ~typing.Any = <object object>) -> Any
Parameters
| Name | Description |
|---|---|
|
key
Required
|
|
|
default
|
|
popitem
Removes and returns some (key, value) pair :returns: The (key, value) pair. :rtype: tuple :raises KeyError: if D is empty.
popitem() -> tuple[str, Any]
setdefault
Same as calling D.get(k, d), and setting D[k]=d if k not found :param str key: The key to look up. :param any default: The value to set if key is not in the dictionary :returns: D[k] if k in D, else d. :rtype: any
setdefault(key: str, default: ~typing.Any = <object object>) -> Any
Parameters
| Name | Description |
|---|---|
|
key
Required
|
|
|
default
|
|
update
Updates D from mapping/iterable E and F. :param any args: Either a mapping object or an iterable of key-value pairs.
update(*args: Any, **kwargs: Any) -> None
values
values() -> ValuesView[Any]
Returns
| Type | Description |
|---|---|
|
an object providing a view on D's values |
Attributes
analyzer_name
The name of the analyzer to use to break the given text. If this parameter is not specified, you must specify a tokenizer instead. The tokenizer and analyzer parameters are mutually exclusive. Known values are: "ar.microsoft", "ar.lucene", "hy.lucene", "bn.microsoft", "eu.lucene", "bg.microsoft", "bg.lucene", "ca.microsoft", "ca.lucene", "zh-Hans.microsoft", "zh-Hans.lucene", "zh-Hant.microsoft", "zh-Hant.lucene", "hr.microsoft", "cs.microsoft", "cs.lucene", "da.microsoft", "da.lucene", "nl.microsoft", "nl.lucene", "en.microsoft", "en.lucene", "et.microsoft", "fi.microsoft", "fi.lucene", "fr.microsoft", "fr.lucene", "gl.lucene", "de.microsoft", "de.lucene", "el.microsoft", "el.lucene", "gu.microsoft", "he.microsoft", "hi.microsoft", "hi.lucene", "hu.microsoft", "hu.lucene", "is.microsoft", "id.microsoft", "id.lucene", "ga.lucene", "it.microsoft", "it.lucene", "ja.microsoft", "ja.lucene", "kn.microsoft", "ko.microsoft", "ko.lucene", "lv.microsoft", "lv.lucene", "lt.microsoft", "ml.microsoft", "ms.microsoft", "mr.microsoft", "nb.microsoft", "no.lucene", "fa.lucene", "pl.microsoft", "pl.lucene", "pt-BR.microsoft", "pt-BR.lucene", "pt-PT.microsoft", "pt-PT.lucene", "pa.microsoft", "ro.microsoft", "ro.lucene", "ru.microsoft", "ru.lucene", "sr-cyrillic.microsoft", "sr-latin.microsoft", "sk.microsoft", "sl.microsoft", "es.microsoft", "es.lucene", "sv.microsoft", "sv.lucene", "ta.microsoft", "te.microsoft", "th.microsoft", "th.lucene", "tr.microsoft", "tr.lucene", "uk.microsoft", "ur.microsoft", "vi.microsoft", "standard.lucene", "standardasciifolding.lucene", "keyword", "pattern", "simple", "stop", and "whitespace".
analyzer_name: str | _models.LexicalAnalyzerName | None
char_filters
An optional list of character filters to use when breaking the given text. This parameter can only be set when using the tokenizer parameter.
char_filters: list[typing.Union[str, ForwardRef('_models.CharFilterName')]] | None
normalizer_name
The name of the normalizer to use to normalize the given text. Known values are: "asciifolding", "elision", "lowercase", "standard", and "uppercase".
normalizer_name: str | _models.LexicalNormalizerName | None
text
The text to break into tokens. Required.
text: str
token_filters
An optional list of token filters to use when breaking the given text. This parameter can only be set when using the tokenizer parameter.
token_filters: list[typing.Union[str, ForwardRef('_models.TokenFilterName')]] | None
tokenizer_name
The name of the tokenizer to use to break the given text. If this parameter is not specified, you must specify an analyzer instead. The tokenizer and analyzer parameters are mutually exclusive. Known values are: "classic", "edgeNGram", "keyword_v2", "letter", "lowercase", "microsoft_language_tokenizer", "microsoft_language_stemming_tokenizer", "nGram", "path_hierarchy_v2", "pattern", "standard_v2", "uax_url_email", and "whitespace".
tokenizer_name: str | _models.LexicalTokenizerName | None