For many enterprises, the promise of AI-driven insights is hitting a structural wall. While Retrieval-Augmented Generation (RAG) has become the standard for connecting Large Language Models (LLMs) to private data, it is increasingly proving inadequate for complex, real-world business questions.
New research from Databricks suggests that the limitation isn’t the intelligence of the models themselves, but rather the architecture used to query them. The study highlights a critical shift: moving away from single-turn retrieval toward multi-step agentic workflows.
The “Hybrid Data” Problem
Most business intelligence requires connecting two different worlds:
1. Structured Data: Precise numbers, sales figures, and relational tables (SQL).
2. Unstructured Data: Customer reviews, academic papers, and support documents.
A standard RAG system is designed for the latter. It excels at finding text that “sounds like” a query, but it struggles to perform precise mathematical filters or join data across different formats.
“RAG works, but it doesn’t scale,” says Michael Bendersky, Research Director at Databricks. “If you want to understand why you have declining sales, you have to help the agent see the tables and look at the sales data. Your RAG pipeline will become incompetent at that task.”
Architecture vs. Intelligence: The 21% Gap
To prove that the issue lies in how data is accessed rather than how smart the model is, Databricks conducted a series of tests using the STaRK benchmark (covering Amazon products, Microsoft Academic Graph, and biomedical data).
They compared a high-performing, state-of-the-art single-turn RAG system against a multi-step agentic approach. Even when using a significantly stronger foundation model, the single-turn RAG system lost by:
* 21% in the academic domain.
* 38% in the biomedical domain.
This performance gap demonstrates that even the most “intelligent” model cannot compensate for a retrieval architecture that is fundamentally unable to bridge the gap between a spreadsheet and a text document.
How the “Supervisor Agent” Works
Databricks’ solution, the Supervisor Agent, moves away from the idea of “hybrid retrieval” (trying to merge embeddings and tables) and instead treats the problem as tool orchestration. The agent functions through three core capabilities:
- Parallel Tool Decomposition: Instead of one massive search, the agent simultaneously triggers SQL queries for numbers and vector searches for text. It then analyzes the combined results to form a coherent answer.
- Self-Correction: If an initial search yields no results—such as looking for a specific author with a precise publication count—the agent doesn’t give up. It reformulates the query, performs a SQL
JOIN, and verifies the result through a second search. - Declarative Configuration: Unlike traditional pipelines that require engineers to “flatten” or normalize data into text chunks, this agent uses plain-language descriptions. To add a new data source, an engineer simply describes what the data is; the agent learns how to use it.
The Shift from Engineering to Configuration
The implications for data engineering are significant. In a traditional RAG setup, every new data source requires a massive amount of “data plumbing”—converting JSON, normalizing tables, and managing embeddings. This creates a bottleneck that grows as an enterprise expands.
The agentic approach flips this model: “Just bring the agent to the data.”
Key Takeaways for Implementation:
- Scalability: The agentic model is more sustainable for growing datasets because adding a source is a configuration task, not a coding task.
- Complexity Limits: While powerful, the approach works best with 5 to 10 data sources. Connecting too many contradictory sources at once can degrade speed and reliability.
- Data Integrity: While the agent can navigate different formats, it cannot fix “garbage in, garbage out.” The source data must be factually accurate for the agent to be effective.
Conclusion
The transition from RAG to multi-step agents represents a fundamental evolution in enterprise AI: moving from systems that merely find information to systems that can reason across diverse data ecosystems. By treating data sources as tools rather than just text chunks, companies can finally begin to answer the complex, cross-functional questions that drive business decisions.




















