IndexingParametersConfiguration Class

A dictionary of indexer-specific configuration properties. Each name is the name of a specific property. Each value must be of a primitive type.

Constructor

IndexingParametersConfiguration(*args: Any, **kwargs: Any)

Variables

Name	Description
parsing_mode	str or BlobIndexerParsingMode Represents the parsing mode for indexing from an Azure blob data source. Known values are: "default", "text", "delimitedText", "json", "jsonArray", "jsonLines", and "markdown".
excluded_file_name_extensions	str Comma-delimited list of filename extensions to ignore when processing from Azure blob storage. For example, you could exclude ".png, .mp4" to skip over those files during indexing.
indexed_file_name_extensions	str Comma-delimited list of filename extensions to select when processing from Azure blob storage. For example, you could focus indexing on specific application files ".docx, .pptx, .msg" to specifically include those file types.
fail_on_unsupported_content_type	bool For Azure blobs, set to false if you want to continue indexing when an unsupported content type is encountered, and you don't know all the content types (file extensions) in advance.
fail_on_unprocessable_document	bool For Azure blobs, set to false if you want to continue indexing if a document fails indexing.
index_storage_metadata_only_for_oversized_documents	bool For Azure blobs, set this property to true to still index storage metadata for blob content that is too large to process. Oversized blobs are treated as errors by default. For limits on blob size, see https://learn.microsoft.com/azure/search/search-limits-quotas-capacity.
delimited_text_headers	str For CSV blobs, specifies a comma-delimited list of column headers, useful for mapping source fields to destination fields in an index.
delimited_text_delimiter	str For CSV blobs, specifies the end-of-line single-character delimiter for CSV files where each line starts a new document (for example, "\|").
first_line_contains_headers	bool For CSV blobs, indicates that the first (non-blank) line of each blob contains headers.
markdown_parsing_submode	str or MarkdownParsingSubmode Specifies the submode that will determine whether a markdown file will be parsed into exactly one search document or multiple search documents. Default is `oneToMany`. Known values are: "oneToMany" and "oneToOne".
markdown_header_depth	str or MarkdownHeaderDepth Specifies the max header depth that will be considered while grouping markdown content. Default is `h6`. Known values are: "h1", "h2", "h3", "h4", "h5", and "h6".
document_root	str For JSON arrays, given a structured or semi-structured document, you can specify a path to the array using this property.
data_to_extract	str or BlobIndexerDataToExtract Specifies the data to extract from Azure blob storage and tells the indexer which data to extract from image content when "imageAction" is set to a value other than "none". This applies to embedded image content in a .PDF or other application, or image files such as .jpg and .png, in Azure blobs. Known values are: "storageMetadata", "allMetadata", and "contentAndMetadata".
image_action	str or BlobIndexerImageAction Determines how to process embedded images and image files in Azure blob storage. Setting the "imageAction" configuration to any value other than "none" requires that a skillset also be attached to that indexer. Known values are: "none", "generateNormalizedImages", and "generateNormalizedImagePerPage".
allow_skillset_to_read_file_data	bool If true, will create a path //document//file_data that is an object representing the original file data downloaded from your blob data source. This allows you to pass the original file data to a custom skill for processing within the enrichment pipeline, or to the Document Extraction skill.
pdf_text_rotation_algorithm	str or BlobIndexerPDFTextRotationAlgorithm Determines algorithm for text extraction from PDF files in Azure blob storage. Known values are: "none" and "detectAngles".
execution_environment	str or IndexerExecutionEnvironment Specifies the environment in which the indexer should execute. Known values are: "standard" and "private".
query_timeout	str Increases the timeout beyond the 5-minute default for Azure SQL database data sources, specified in the format "hh:mm:ss".

Methods

as_dict	Return a dict that can be turned into json using json.dump.
clear	Remove all items from D.
copy
get	Get the value for key if key is in the dictionary, else default. :param str key: The key to look up. :param any default: The value to return if key is not in the dictionary. Defaults to None :returns: D[k] if k in D, else d. :rtype: any
items
keys
pop	Removes specified key and return the corresponding value. :param str key: The key to pop. :param any default: The value to return if key is not in the dictionary :returns: The value corresponding to the key. :rtype: any :raises KeyError: If key is not found and default is not given.
popitem	Removes and returns some (key, value) pair :returns: The (key, value) pair. :rtype: tuple :raises KeyError: if D is empty.
setdefault	Same as calling D.get(k, d), and setting D[k]=d if k not found :param str key: The key to look up. :param any default: The value to set if key is not in the dictionary :returns: D[k] if k in D, else d. :rtype: any
update	Updates D from mapping/iterable E and F. :param any args: Either a mapping object or an iterable of key-value pairs.
values

as_dict

Return a dict that can be turned into json using json.dump.

as_dict(*, exclude_readonly: bool = False) -> dict[str, Any]

Keyword-Only Parameters

Name	Description
exclude_readonly	bool Whether to remove the readonly properties. Default value: False

Returns

Type	Description
dict	A dict JSON compatible object

clear

Remove all items from D.

clear() -> None

copy

copy() -> Model

get

Get the value for key if key is in the dictionary, else default. :param str key: The key to look up. :param any default: The value to return if key is not in the dictionary. Defaults to None :returns: D[k] if k in D, else d. :rtype: any

get(key: str, default: Any = None) -> Any

Parameters

Name	Description
key Required
default	Default value: None

items

items() -> ItemsView[str, Any]

Returns

Type	Description
ItemsView	set-like object providing a view on D's items

keys

keys() -> KeysView[str]

Returns

Type	Description
KeysView	a set-like object providing a view on D's keys

pop

Removes specified key and return the corresponding value. :param str key: The key to pop. :param any default: The value to return if key is not in the dictionary :returns: The value corresponding to the key. :rtype: any :raises KeyError: If key is not found and default is not given.

pop(key: str, default: ~typing.Any = <object object>) -> Any

Parameters

Name	Description
key Required
default

popitem

Removes and returns some (key, value) pair :returns: The (key, value) pair. :rtype: tuple :raises KeyError: if D is empty.

popitem() -> tuple[str, Any]

setdefault

Same as calling D.get(k, d), and setting D[k]=d if k not found :param str key: The key to look up. :param any default: The value to set if key is not in the dictionary :returns: D[k] if k in D, else d. :rtype: any

setdefault(key: str, default: ~typing.Any = <object object>) -> Any

Parameters

Name	Description
key Required
default

update

Updates D from mapping/iterable E and F. :param any args: Either a mapping object or an iterable of key-value pairs.

update(*args: Any, **kwargs: Any) -> None

values

values() -> ValuesView[Any]

Returns

Type	Description
ValuesView	an object providing a view on D's values

Attributes

allow_skillset_to_read_file_data

If true, will create a path //document//file_data that is an object representing the original file data downloaded from your blob data source. This allows you to pass the original file data to a custom skill for processing within the enrichment pipeline, or to the Document Extraction skill.

allow_skillset_to_read_file_data: bool | None

data_to_extract

Specifies the data to extract from Azure blob storage and tells the indexer which data to extract from image content when "imageAction" is set to a value other than "none". This applies to embedded image content in a .PDF or other application, or image files such as .jpg and .png, in Azure blobs. Known values are: "storageMetadata", "allMetadata", and "contentAndMetadata".

data_to_extract: str | _models.BlobIndexerDataToExtract | None

delimited_text_delimiter

For CSV blobs, specifies the end-of-line single-character delimiter for CSV files where each line starts a new document (for example, "|").

delimited_text_delimiter: str | None

delimited_text_headers

For CSV blobs, specifies a comma-delimited list of column headers, useful for mapping source fields to destination fields in an index.

delimited_text_headers: str | None

document_root

For JSON arrays, given a structured or semi-structured document, you can specify a path to the array using this property.

document_root: str | None

excluded_file_name_extensions

Comma-delimited list of filename extensions to ignore when processing from Azure blob storage. For example, you could exclude ".png, .mp4" to skip over those files during indexing.

excluded_file_name_extensions: str | None

execution_environment

"standard" and "private".

execution_environment: str | _models.IndexerExecutionEnvironment | None

fail_on_unprocessable_document

For Azure blobs, set to false if you want to continue indexing if a document fails indexing.

fail_on_unprocessable_document: bool | None

fail_on_unsupported_content_type

For Azure blobs, set to false if you want to continue indexing when an unsupported content type is encountered, and you don't know all the content types (file extensions) in advance.

fail_on_unsupported_content_type: bool | None

first_line_contains_headers

For CSV blobs, indicates that the first (non-blank) line of each blob contains headers.

first_line_contains_headers: bool | None

image_action

Determines how to process embedded images and image files in Azure blob storage. Setting the "imageAction" configuration to any value other than "none" requires that a skillset also be attached to that indexer. Known values are: "none", "generateNormalizedImages", and "generateNormalizedImagePerPage".

image_action: str | _models.BlobIndexerImageAction | None

index_storage_metadata_only_for_oversized_documents

For Azure blobs, set this property to true to still index storage metadata for blob content that is too large to process. Oversized blobs are treated as errors by default. For limits on blob size, see https://learn.microsoft.com/azure/search/search-limits-quotas-capacity.

index_storage_metadata_only_for_oversized_documents: bool | None

indexed_file_name_extensions

Comma-delimited list of filename extensions to select when processing from Azure blob storage. For example, you could focus indexing on specific application files ".docx, .pptx, .msg" to specifically include those file types.

indexed_file_name_extensions: str | None

markdown_header_depth

Specifies the max header depth that will be considered while grouping markdown content. Default is h6. Known values are: "h1", "h2", "h3", "h4", "h5", and "h6".

markdown_header_depth: str | _models.MarkdownHeaderDepth | None

markdown_parsing_submode

Specifies the submode that will determine whether a markdown file will be parsed into exactly one search document or multiple search documents. Default is oneToMany. Known values are: "oneToMany" and "oneToOne".

markdown_parsing_submode: str | _models.MarkdownParsingSubmode | None

parsing_mode

Represents the parsing mode for indexing from an Azure blob data source. Known values are: "default", "text", "delimitedText", "json", "jsonArray", "jsonLines", and "markdown".

parsing_mode: str | _models.BlobIndexerParsingMode | None

pdf_text_rotation_algorithm

Determines algorithm for text extraction from PDF files in Azure blob storage. Known values are: "none" and "detectAngles".

pdf_text_rotation_algorithm: str | _models.BlobIndexerPDFTextRotationAlgorithm | None

query_timeout

Increases the timeout beyond the 5-minute default for Azure SQL database data sources, specified in the format "hh:mm:ss".

query_timeout: str | None

Comentarios

¿Le ha resultado útil esta página?