Product

As Tool Counts Grow, Prompts Need Layered Management

Jun 24, 2026·4 min read·QVeris Team

As tool counts grow, prompts need layered management

QVeris · Product Thinking

When a system starts integrating a large number of external tools, prompts quickly run into a practical problem: although the tools are all exposed as APIs, the semantics behind them are not the same.

The most typical case is field meaning. The same field, symbol, may refer to a security, index, futures contract, or foreign exchange pair in a market data tool. In a financial reporting tool, it may refer to the company entity. In a news search tool, the fields that actually determine query intent may instead be q, country, timeframe, or category. If every tool shares a single generic prompt, the amount of information soon becomes unmanageable: field rules for different tools, provider conventions, and exception-handling patterns all get packed into the same context. Many rules apply only to one class of tools. Putting them into a global prompt not only adds noise, but may also conflict with rules for other tools.

A generic prompt can handle basic issues, such as misspelled field names, parameter type mismatches, or invalid enum formats. But once tool-specific details keep getting appended to it, several maintenance problems appear:

Information bloat: the more tools there are, the longer the prompt becomes, and the truly relevant rules get buried.
Unclear scope: a rule may apply only to one provider, but the model may mistakenly apply it to other tools.
Conflicting rules: the same field may mean different things in different tools, so global rules can easily impose opposite constraints.
Hard-to-reuse experience: if all historical cases are placed into one prompt, it becomes difficult to reuse them precisely for the current tool.

So as the number of tools keeps growing, the prompt itself needs to evolve from “one fixed instruction block” into “context organized by tool scenario.” It needs to load the right rules, examples, and historical experience based on the current provider and specific tool.

This points to a more reasonable direction: layered prompts.

Layered Prompts: Let the Model Know Which Tool It Is Fixing

We can split the context required for tool repair and evaluation into three layers.

The first layer is global rules.

This layer applies to all tools, for example:

Do not use sample parameters to pretend that a repair succeeded.
Do not replace the core object specified by the user.
Repairs should prioritize low-risk adjustments such as field names, formats, types, and enum values.
If core query conditions must be changed, provide an info notice or refuse automatic repair.

The second layer is provider rules.

This layer is loaded based on the provider prefix. For example:

In tool A, converting certain market code suffixes may simply reflect differences in provider conventions.
In tool B, removing an incompatible symbol prefix may be a format correction, but replacing the core query object with another object is an intent shift.
In tool C, q, timeframe, country, and language all affect search scope, so judgment cannot be based only on field validity.
In tool D, sample values from documentation are not necessarily wrong by themselves; they should be marked high risk only when they replace the user’s original core object and cause the query intent to clearly diverge.

The third layer is exact tool rules.

This layer is loaded based on the full tool_id. For example:

Quote tools: symbol is the core entity field.
Balance sheet tools: symbol represents the financial reporting query object, while limit may only restrict the number of results.
News search tools: q carries the core query semantics, and deleting keywords changes intent.
Company overview tools: replacing an invalid symbol with a sample symbol usually should not count as a real repair.

The priority of these three layers should be:

exact tool rules > provider rules > global rules

This way, the model no longer uses one large, generic prompt to handle every tool. Instead, it loads context that fits the current tool.

RAG: Dynamically Bringing Rules and Historical Experience Into Context

Layered prompts solve the problem of rule organization, but in a real system, rules are not written once and finished.

As the number of tools grows, we will keep encountering new providers, new parameter conventions, and new patterns of incorrect repair. If all rules are hardcoded into the prompt, the prompt will become longer and harder to maintain.

A better approach is to introduce RAG.

Before tool repair or evaluation, the system first retrieves relevant context based on the current provider_id, tool_id, error type, and parameter fields:

The parameter definition of the current tool
Provider-specific rules
Historical repair examples
Sample parameters from documentation and their risk labels

The retrieved content is then assembled into the prompt for this run, so the model can make judgments based on that context.

What This Design Solves

The goal of layered prompts plus RAG is not only to improve the success rate of automatic repair.

Its more important value is that it:

Allows special rules for different providers to be expressed explicitly.
Allows risk-sensitive fields at the tool level to be constrained more carefully.
Allows historical incorrect repairs to constrain future repairs.

Ultimately, a tool-calling system should not only pursue “it runs,” but “it runs reliably.”

Summary

When there are only a small number of tools, writing rules into one prompt is feasible. But once tools start growing at scale, a prompt is no longer just a “prompt.” It becomes a maintainable way to organize knowledge.

Layered prompts solve the question of where rules should live: general principles belong in the global layer, provider-related conventions belong in the provider layer, and special judgments for specific tools belong in the tool layer. RAG solves the question of how rules continue to be updated: new documentation, examples, and human experience do not all need to be stuffed into a fixed prompt; they can be retrieved according to the current calling scenario.

The prompt no longer grows linearly with the number of tools, and conflicts between rules become easier to isolate and manage. For a tool-calling system that will evolve over the long term, this is more sustainable than simply writing an even longer prompt.

#QVeris#Agent

PreviousI Used QVeris as an AI Stock Research Assistant: Start with Industry Valuation, Then Filter Companies with a Screener

Back to all posts