Configure data sources for knowledge bases

9 minutes

Your knowledge base is only as good as the data it contains. Foundry IQ lets you connect to multiple data sources, enabling your AI agent to access the information it needs to answer questions accurately. You configure these sources when you set up your knowledge base, ensuring your agent has the right context for your specific use case.

Understanding which data source to use depends on where your data lives and how you need to access it. Foundry IQ supports six primary data source types:

Data Source	Access Type	Best For
Azure AI Search Index	Indexed	Enterprise search with custom pipelines
Azure Blob Storage	Direct	Document files in Azure Storage
Web	Real-time	Current, public information via Bing
SharePoint (Remote)	Real-time	Live SharePoint content with Microsoft 365 governance
SharePoint (Indexed)	Indexed	Advanced search on SharePoint with custom pipelines
OneLake	Direct	Unstructured data in Microsoft Fabric

With real-time sources, you get current information. With internal data sources like SharePoint or OneLake, you maintain security and governance while giving your agent access to proprietary knowledge.

Azure AI Search Index

Azure AI Search Index provides enterprise-scale search capabilities for your Foundry IQ knowledge base. This option is ideal when you already invested in Azure AI Search and want to use existing search indexes.

With this source, you connect directly to your Azure AI Search index, which can contain data from multiple origins that you've already processed and indexed. This becomes especially important when you need sophisticated search capabilities like semantic ranking, filters, or custom scoring profiles that Azure AI Search provides.

Tip

Learn more about Azure AI Search and how to create and manage search indexes for your knowledge bases.

Your agent can query this index to retrieve relevant information based on user questions. Key benefits include:

Semantic ranking - Finds contextually relevant results, not just keyword matches
Custom scoring - Prioritizes results based on your business logic
Faceted navigation - Filters results by categories or attributes
Multi-language support - Handles content in different languages

Azure Blob Storage

Azure Blob Storage lets you retrieve documents and files directly from your blob containers. You select specific containers or blobs, and Foundry IQ processes the content to make it available to your agent.

This source works well when you store documents in Azure Blob Storage. Common file types include:

PDF documents
Microsoft Word files (.docx)
Text files (.txt)
Markdown files (.md)
HTML files

Note

Unlike Azure AI Search, which requires you to build and maintain an index, Blob Storage provides a more direct path from your files to your knowledge base.

Building on this concept, you can organize your blobs into containers based on topics or access levels, making it easier to manage what information your agent can access. This organization helps maintain data governance while keeping your knowledge base current.

Web

Web access grounds your agent with real-time content from the internet via Bing. Instead of relying only on static, internal data, your agent can search for current information when answering questions.

This becomes especially important when users ask about:

Recent events or news
Current pricing or availability
Frequently changing information
Topics outside your internal knowledge base

Important

With web grounding, you're relying on Bing's search results, which means less control over the specific sources your agent references. When accuracy and source verification are critical, consider using indexed, controlled data sources instead.

Tip

You can combine web grounding with internal data sources, using web access as a supplementary source when internal knowledge doesn't provide an answer.

Microsoft SharePoint options

Foundry IQ provides two ways to connect to SharePoint, each with distinct advantages. The following table compares these approaches:

Feature	Remote	Indexed
Access method	Real-time queries	Preprocessed index
Response time	Depends on SharePoint	Faster
Maintenance	No index to maintain	Requires index updates
Advanced search	Limited	Full Azure AI Search capabilities
Data freshness	Always current	Depends on indexing schedule
Permission handling	Respects SharePoint permissions	Configured during indexing

SharePoint Remote

SharePoint Remote provides search capabilities with Microsoft 365 governance, retrieving content directly from SharePoint without preindexing. Your agent searches SharePoint sites and libraries in real-time when users ask questions.

Key benefits of remote access:

No index maintenance required
Always accesses current SharePoint content
Automatically respects existing SharePoint permissions
Simpler setup and configuration

Tip

Use SharePoint Remote when you need the simplest path to SharePoint data and don't require advanced search features.

SharePoint Indexed

SharePoint Indexed takes a different approach by indexing SharePoint content into Azure AI Search for custom pipelines. Unlike remote access, which queries SharePoint in real-time, indexing processes your SharePoint content in advance.

This preprocessing means faster response times and more sophisticated search capabilities. With indexed content, you can:

Apply custom analyzers for specialized terminology
Build enrichment pipelines with AI services
Combine SharePoint data with other sources
Create specialized search experiences

Note

Indexed SharePoint works best when you need advanced search features or when you're integrating SharePoint data with other sources in your Azure AI Search index.

Microsoft OneLake

Microsoft OneLake provides access to unstructured data stored in your Microsoft Fabric data lakehouse. You connect to OneLake to retrieve files and documents stored in your lakehouse, making this data available to your knowledge base.

Tip

Learn more about Microsoft Fabric OneLake and how it serves as a unified data lake for your organization.

This option matters when your organization uses Microsoft Fabric for data analytics and storage. Common use cases include:

Business intelligence reports - Reference analytical findings in agent responses
Data documentation - Provide context about datasets and metrics
Analytical findings - Share insights from data science work
Research outputs - Make research accessible through conversational AI

With this connection, your agent can reference this information when answering business questions, providing data-driven responses grounded in your organization's analytical work.

Choose the right data source

Selecting the appropriate data source depends on several factors. Use this decision guide:

If your data is...	And you need...	Choose...
In SharePoint	Simple setup, always current	SharePoint Remote
In SharePoint	Advanced search, custom pipelines	SharePoint Indexed
Files in Azure	Direct file access	Azure Blob Storage
In Microsoft Fabric	Data lakehouse content	OneLake
Already indexed	Existing Azure AI Search investment	Azure AI Search Index
Public, current information	Real-time web content	Web

Important

You can combine multiple sources in a single knowledge base. For example, use internal SharePoint data as the primary knowledge base while enabling web grounding for current events or supplementary information.

Feedback

Was this page helpful?