Configure data sources for knowledge bases
Your knowledge base is only as good as the data it contains. Foundry IQ lets you connect to multiple data sources, enabling your AI agent to access the information it needs to answer questions accurately. You configure these sources when you set up your knowledge base, ensuring your agent has the right context for your specific use case.
Understanding which data source to use depends on where your data lives and how you need to access it. Foundry IQ supports six primary data source types:
| Data Source | Access Type | Best For |
|---|---|---|
| Azure AI Search Index | Indexed | Enterprise search with custom pipelines |
| Azure Blob Storage | Direct | Document files in Azure Storage |
| Web | Real-time | Current, public information via Bing |
| SharePoint (Remote) | Real-time | Live SharePoint content with Microsoft 365 governance |
| SharePoint (Indexed) | Indexed | Advanced search on SharePoint with custom pipelines |
| OneLake | Direct | Unstructured data in Microsoft Fabric |
With real-time sources, you get current information. With internal data sources like SharePoint or OneLake, you maintain security and governance while giving your agent access to proprietary knowledge.
Azure AI Search Index
Azure AI Search Index provides enterprise-scale search capabilities for your Foundry IQ knowledge base. This option is ideal when you already invested in Azure AI Search and want to use existing search indexes.
With this source, you connect directly to your Azure AI Search index, which can contain data from multiple origins that you've already processed and indexed. This becomes especially important when you need sophisticated search capabilities like semantic ranking, filters, or custom scoring profiles that Azure AI Search provides.
Tip
Learn more about Azure AI Search and how to create and manage search indexes for your knowledge bases.
Your agent can query this index to retrieve relevant information based on user questions. Key benefits include:
- Semantic ranking - Finds contextually relevant results, not just keyword matches
- Custom scoring - Prioritizes results based on your business logic
- Faceted navigation - Filters results by categories or attributes
- Multi-language support - Handles content in different languages
Azure Blob Storage
Azure Blob Storage lets you retrieve documents and files directly from your blob containers. You select specific containers or blobs, and Foundry IQ processes the content to make it available to your agent.
This source works well when you store documents in Azure Blob Storage. Common file types include:
- PDF documents
- Microsoft Word files (.docx)
- Text files (.txt)
- Markdown files (.md)
- HTML files
Note
Unlike Azure AI Search, which requires you to build and maintain an index, Blob Storage provides a more direct path from your files to your knowledge base.
Building on this concept, you can organize your blobs into containers based on topics or access levels, making it easier to manage what information your agent can access. This organization helps maintain data governance while keeping your knowledge base current.
Web
Web access grounds your agent with real-time content from the internet via Bing. Instead of relying only on static, internal data, your agent can search for current information when answering questions.
This becomes especially important when users ask about:
- Recent events or news
- Current pricing or availability
- Frequently changing information
- Topics outside your internal knowledge base
Important
With web grounding, you're relying on Bing's search results, which means less control over the specific sources your agent references. When accuracy and source verification are critical, consider using indexed, controlled data sources instead.
Tip
You can combine web grounding with internal data sources, using web access as a supplementary source when internal knowledge doesn't provide an answer.
Microsoft SharePoint options
Foundry IQ provides two ways to connect to SharePoint, each with distinct advantages. The following table compares these approaches:
| Feature | Remote | Indexed |
|---|---|---|
| Access method | Real-time queries | Preprocessed index |
| Response time | Depends on SharePoint | Faster |
| Maintenance | No index to maintain | Requires index updates |
| Advanced search | Limited | Full Azure AI Search capabilities |
| Data freshness | Always current | Depends on indexing schedule |
| Permission handling | Respects SharePoint permissions | Configured during indexing |
SharePoint Remote
SharePoint Remote provides search capabilities with Microsoft 365 governance, retrieving content directly from SharePoint without preindexing. Your agent searches SharePoint sites and libraries in real-time when users ask questions.
Key benefits of remote access:
- No index maintenance required
- Always accesses current SharePoint content
- Automatically respects existing SharePoint permissions
- Simpler setup and configuration
Tip
Use SharePoint Remote when you need the simplest path to SharePoint data and don't require advanced search features.
SharePoint Indexed
SharePoint Indexed takes a different approach by indexing SharePoint content into Azure AI Search for custom pipelines. Unlike remote access, which queries SharePoint in real-time, indexing processes your SharePoint content in advance.
This preprocessing means faster response times and more sophisticated search capabilities. With indexed content, you can:
- Apply custom analyzers for specialized terminology
- Build enrichment pipelines with AI services
- Combine SharePoint data with other sources
- Create specialized search experiences
Note
Indexed SharePoint works best when you need advanced search features or when you're integrating SharePoint data with other sources in your Azure AI Search index.
Microsoft OneLake
Microsoft OneLake provides access to unstructured data stored in your Microsoft Fabric data lakehouse. You connect to OneLake to retrieve files and documents stored in your lakehouse, making this data available to your knowledge base.
Tip
Learn more about Microsoft Fabric OneLake and how it serves as a unified data lake for your organization.
This option matters when your organization uses Microsoft Fabric for data analytics and storage. Common use cases include:
- Business intelligence reports - Reference analytical findings in agent responses
- Data documentation - Provide context about datasets and metrics
- Analytical findings - Share insights from data science work
- Research outputs - Make research accessible through conversational AI
With this connection, your agent can reference this information when answering business questions, providing data-driven responses grounded in your organization's analytical work.
Choose the right data source
Selecting the appropriate data source depends on several factors. Use this decision guide:
| If your data is... | And you need... | Choose... |
|---|---|---|
| In SharePoint | Simple setup, always current | SharePoint Remote |
| In SharePoint | Advanced search, custom pipelines | SharePoint Indexed |
| Files in Azure | Direct file access | Azure Blob Storage |
| In Microsoft Fabric | Data lakehouse content | OneLake |
| Already indexed | Existing Azure AI Search investment | Azure AI Search Index |
| Public, current information | Real-time web content | Web |
Important
You can combine multiple sources in a single knowledge base. For example, use internal SharePoint data as the primary knowledge base while enabling web grounding for current events or supplementary information.