IndexingParametersConfiguration Class
A dictionary of indexer-specific configuration properties. Each name is the name of a specific property. Each value must be of a primitive type.
Constructor
IndexingParametersConfiguration(*args: Any, **kwargs: Any)
Variables
| Name | Description |
|---|---|
|
parsing_mode
|
Represents the parsing mode for indexing from an Azure blob data source. Known values are: "default", "text", "delimitedText", "json", "jsonArray", "jsonLines", and "markdown". |
|
excluded_file_name_extensions
|
Comma-delimited list of filename extensions to ignore when processing from Azure blob storage. For example, you could exclude ".png, .mp4" to skip over those files during indexing. |
|
indexed_file_name_extensions
|
Comma-delimited list of filename extensions to select when processing from Azure blob storage. For example, you could focus indexing on specific application files ".docx, .pptx, .msg" to specifically include those file types. |
|
fail_on_unsupported_content_type
|
For Azure blobs, set to false if you want to continue indexing when an unsupported content type is encountered, and you don't know all the content types (file extensions) in advance. |
|
fail_on_unprocessable_document
|
For Azure blobs, set to false if you want to continue indexing if a document fails indexing. |
|
index_storage_metadata_only_for_oversized_documents
|
For Azure blobs, set this property to true to still index storage metadata for blob content that is too large to process. Oversized blobs are treated as errors by default. For limits on blob size, see https://learn.microsoft.com/azure/search/search-limits-quotas-capacity. |
|
delimited_text_headers
|
For CSV blobs, specifies a comma-delimited list of column headers, useful for mapping source fields to destination fields in an index. |
|
delimited_text_delimiter
|
For CSV blobs, specifies the end-of-line single-character delimiter for CSV files where each line starts a new document (for example, "|"). |
|
first_line_contains_headers
|
For CSV blobs, indicates that the first (non-blank) line of each blob contains headers. |
|
markdown_parsing_submode
|
Specifies the submode that will determine whether a markdown
file will be parsed into exactly one search document or multiple search documents. Default is
|
|
markdown_header_depth
|
Specifies the max header depth that will be considered while
grouping markdown content. Default is |
|
document_root
|
For JSON arrays, given a structured or semi-structured document, you can specify a path to the array using this property. |
|
data_to_extract
|
Specifies the data to extract from Azure blob storage and tells the indexer which data to extract from image content when "imageAction" is set to a value other than "none". This applies to embedded image content in a .PDF or other application, or image files such as .jpg and .png, in Azure blobs. Known values are: "storageMetadata", "allMetadata", and "contentAndMetadata". |
|
image_action
|
Determines how to process embedded images and image files in Azure blob storage. Setting the "imageAction" configuration to any value other than "none" requires that a skillset also be attached to that indexer. Known values are: "none", "generateNormalizedImages", and "generateNormalizedImagePerPage". |
|
allow_skillset_to_read_file_data
|
If true, will create a path //document//file_data that is an object representing the original file data downloaded from your blob data source. This allows you to pass the original file data to a custom skill for processing within the enrichment pipeline, or to the Document Extraction skill. |
|
pdf_text_rotation_algorithm
|
Determines algorithm for text extraction from PDF files in Azure blob storage. Known values are: "none" and "detectAngles". |
|
execution_environment
|
Specifies the environment in which the indexer should execute. Known values are: "standard" and "private". |
|
query_timeout
|
Increases the timeout beyond the 5-minute default for Azure SQL database data sources, specified in the format "hh:mm:ss". |
Methods
| as_dict |
Return a dict that can be turned into json using json.dump. |
| clear |
Remove all items from D. |
| copy | |
| get |
Get the value for key if key is in the dictionary, else default. :param str key: The key to look up. :param any default: The value to return if key is not in the dictionary. Defaults to None :returns: D[k] if k in D, else d. :rtype: any |
| items | |
| keys | |
| pop |
Removes specified key and return the corresponding value. :param str key: The key to pop. :param any default: The value to return if key is not in the dictionary :returns: The value corresponding to the key. :rtype: any :raises KeyError: If key is not found and default is not given. |
| popitem |
Removes and returns some (key, value) pair :returns: The (key, value) pair. :rtype: tuple :raises KeyError: if D is empty. |
| setdefault |
Same as calling D.get(k, d), and setting D[k]=d if k not found :param str key: The key to look up. :param any default: The value to set if key is not in the dictionary :returns: D[k] if k in D, else d. :rtype: any |
| update |
Updates D from mapping/iterable E and F. :param any args: Either a mapping object or an iterable of key-value pairs. |
| values |
as_dict
Return a dict that can be turned into json using json.dump.
as_dict(*, exclude_readonly: bool = False) -> dict[str, Any]
Keyword-Only Parameters
| Name | Description |
|---|---|
|
exclude_readonly
|
Whether to remove the readonly properties. Default value: False
|
Returns
| Type | Description |
|---|---|
|
A dict JSON compatible object |
clear
Remove all items from D.
clear() -> None
copy
copy() -> Model
get
Get the value for key if key is in the dictionary, else default. :param str key: The key to look up. :param any default: The value to return if key is not in the dictionary. Defaults to None :returns: D[k] if k in D, else d. :rtype: any
get(key: str, default: Any = None) -> Any
Parameters
| Name | Description |
|---|---|
|
key
Required
|
|
|
default
|
Default value: None
|
items
items() -> ItemsView[str, Any]
Returns
| Type | Description |
|---|---|
|
set-like object providing a view on D's items |
keys
keys() -> KeysView[str]
Returns
| Type | Description |
|---|---|
|
a set-like object providing a view on D's keys |
pop
Removes specified key and return the corresponding value. :param str key: The key to pop. :param any default: The value to return if key is not in the dictionary :returns: The value corresponding to the key. :rtype: any :raises KeyError: If key is not found and default is not given.
pop(key: str, default: ~typing.Any = <object object>) -> Any
Parameters
| Name | Description |
|---|---|
|
key
Required
|
|
|
default
|
|
popitem
Removes and returns some (key, value) pair :returns: The (key, value) pair. :rtype: tuple :raises KeyError: if D is empty.
popitem() -> tuple[str, Any]
setdefault
Same as calling D.get(k, d), and setting D[k]=d if k not found :param str key: The key to look up. :param any default: The value to set if key is not in the dictionary :returns: D[k] if k in D, else d. :rtype: any
setdefault(key: str, default: ~typing.Any = <object object>) -> Any
Parameters
| Name | Description |
|---|---|
|
key
Required
|
|
|
default
|
|
update
Updates D from mapping/iterable E and F. :param any args: Either a mapping object or an iterable of key-value pairs.
update(*args: Any, **kwargs: Any) -> None
values
values() -> ValuesView[Any]
Returns
| Type | Description |
|---|---|
|
an object providing a view on D's values |
Attributes
allow_skillset_to_read_file_data
If true, will create a path //document//file_data that is an object representing the original file data downloaded from your blob data source. This allows you to pass the original file data to a custom skill for processing within the enrichment pipeline, or to the Document Extraction skill.
allow_skillset_to_read_file_data: bool | None
data_to_extract
Specifies the data to extract from Azure blob storage and tells the indexer which data to extract from image content when "imageAction" is set to a value other than "none". This applies to embedded image content in a .PDF or other application, or image files such as .jpg and .png, in Azure blobs. Known values are: "storageMetadata", "allMetadata", and "contentAndMetadata".
data_to_extract: str | _models.BlobIndexerDataToExtract | None
delimited_text_delimiter
For CSV blobs, specifies the end-of-line single-character delimiter for CSV files where each line starts a new document (for example, "|").
delimited_text_delimiter: str | None
delimited_text_headers
For CSV blobs, specifies a comma-delimited list of column headers, useful for mapping source fields to destination fields in an index.
delimited_text_headers: str | None
document_root
For JSON arrays, given a structured or semi-structured document, you can specify a path to the array using this property.
document_root: str | None
excluded_file_name_extensions
Comma-delimited list of filename extensions to ignore when processing from Azure blob storage. For example, you could exclude ".png, .mp4" to skip over those files during indexing.
excluded_file_name_extensions: str | None
execution_environment
"standard" and "private".
execution_environment: str | _models.IndexerExecutionEnvironment | None
fail_on_unprocessable_document
For Azure blobs, set to false if you want to continue indexing if a document fails indexing.
fail_on_unprocessable_document: bool | None
fail_on_unsupported_content_type
For Azure blobs, set to false if you want to continue indexing when an unsupported content type is encountered, and you don't know all the content types (file extensions) in advance.
fail_on_unsupported_content_type: bool | None
first_line_contains_headers
For CSV blobs, indicates that the first (non-blank) line of each blob contains headers.
first_line_contains_headers: bool | None
image_action
Determines how to process embedded images and image files in Azure blob storage. Setting the "imageAction" configuration to any value other than "none" requires that a skillset also be attached to that indexer. Known values are: "none", "generateNormalizedImages", and "generateNormalizedImagePerPage".
image_action: str | _models.BlobIndexerImageAction | None
index_storage_metadata_only_for_oversized_documents
For Azure blobs, set this property to true to still index storage metadata for blob content that is too large to process. Oversized blobs are treated as errors by default. For limits on blob size, see https://learn.microsoft.com/azure/search/search-limits-quotas-capacity.
index_storage_metadata_only_for_oversized_documents: bool | None
indexed_file_name_extensions
Comma-delimited list of filename extensions to select when processing from Azure blob storage. For example, you could focus indexing on specific application files ".docx, .pptx, .msg" to specifically include those file types.
indexed_file_name_extensions: str | None
markdown_header_depth
Specifies the max header depth that will be considered while grouping markdown content. Default
is h6. Known values are: "h1", "h2", "h3", "h4", "h5", and "h6".
markdown_header_depth: str | _models.MarkdownHeaderDepth | None
markdown_parsing_submode
Specifies the submode that will determine whether a markdown file will be parsed into exactly
one search document or multiple search documents. Default is oneToMany. Known values are:
"oneToMany" and "oneToOne".
markdown_parsing_submode: str | _models.MarkdownParsingSubmode | None
parsing_mode
Represents the parsing mode for indexing from an Azure blob data source. Known values are: "default", "text", "delimitedText", "json", "jsonArray", "jsonLines", and "markdown".
parsing_mode: str | _models.BlobIndexerParsingMode | None
pdf_text_rotation_algorithm
Determines algorithm for text extraction from PDF files in Azure blob storage. Known values are: "none" and "detectAngles".
pdf_text_rotation_algorithm: str | _models.BlobIndexerPDFTextRotationAlgorithm | None
query_timeout
Increases the timeout beyond the 5-minute default for Azure SQL database data sources, specified in the format "hh:mm:ss".
query_timeout: str | None