7 signs your AI agent system needs to start building its own tools

Most AI agent systems are built once and then repeat the same patterns indefinitely, like an employee who insists every problem can be solved with the same spreadsheet.New research from Peking University proposes a more effective model: agents that accumulate working solutions as executable code, refine them with each new task, and become more capable over time.The static agent problemMost agent systems today follow a simple pattern. An agent receives a task, follows a prompt-defined process, produces output, and moves on.When it encounters a similar task later, it often starts again from scratch, apparently having learned absolutely nothing from the experience.Blueprints like AgentFactory, developed by researchers at Peking University and the Beijing Academy of Artificial Intelligence, take a different approach.Instead of storing successful solutions as prompt tweaks or textual reflections, this kind of framework saves them as executable Python subagents: reusable pieces of code with standardized documentation that can be retrieved, adapted, and redeployed for future tasks.The result is a system that improves over time:Its library of tools expandsRepeated tasks require less effortPerformance improves through reuse and refinementWithout further ado, here are seven signs your current setup would benefit from this type of architecture…How to prepare for physical AI: 5 steps for engineers and tech leadersJensen Huang called it “the ChatGPT moment for robotics.” Deloitte says 80% of businesses plan to use physical AI within two years. Here is what you actually need to know, and do, to prepare…AI Accelerator InstituteAndrew Lovell1. Your agents solve the same problems repeatedly from scratchIf your agent handles similar tasks across sessions and rebuilds its approach every time from a blank prompt, you are spending compute on work that already exists.Modern architectures address this directly. When a new task arrives without a relevant subagent, the system builds one and saves it for future use.The next time a similar task appears, the groundwork already exists.2. Successful runs leave nothing behindIn many frameworks, a successful execution becomes a one-time event. The output is produced, logged, and forgotten.💡Ideally, you need a framework that treats successful executions as reusable assets. The subagent responsible for the result gets stored as runnable Python code with documentation attached.If successful runs are not creating reusable capability, accumulated knowledge disappears after every session.3. Your agents get better prompts but not better toolsPrompt engineering remains the default response when agent performance needs improvement:Adjust the instructionsAdd more contextRefine the examplesRepeat until morale improvesThat can help, but it improves the reasoning layer rather than the execution layer.Many frameworks nowadays modify the tools themselves. Using execution feedback, it improves existing subagents over time, making them more robust and reusable.The rise of Agent Experience (AX)AI agents are becoming active participants in commerce, logistics, and enterprise systems. This shift is creating demand for a new product layer built for machines rather than humans, where negotiation, semantic visibility, and autonomous execution matter as much as traditional UX.AI Accelerator InstituteRohan Mitra4. Re-running a task type requires rebuilding context every timeOne of the hidden costs of static agent systems is context overhead.💡Without a persistent skill library, every task run must reconstruct the knowledge needed to solve the problem. In practice, this means the agent keeps forgetting how it solved the same issue last Tuesday.For example, AgentFactory reduces this overhead by retrieving saved subagents for similar tasks, cutting down on repeated reasoning and setup work.5. Your agent’s capabilities are tied to one platformIf your agent’s tools are tightly coupled to one framework or runtime, the work invested in building them stays trapped there.Most of today’s subagents can be exported and run in any Python-capable environment.Portable, documented code turns a tool into a long-term asset.6. Performance plateaus after deploymentA well-calibrated static agent often performs strongly at launch and then levels off, much like a New Year’s gym membership.Without a feedback loop that converts new tasks into improved tooling, capability growth slows quickly.If your deployment curve flattens after the first few weeks, the absence of a self-improvement loop is probably the reason.7. Scaling task volume means scaling manual maintenanceIn many agent architectures, handling more task types requires:More prompt engineeringMore configurationMore human oversightWhat you should be looking to do is reverse that relationship.As task volume increases, the subagent library should expand alongside it. A growing share of tasks can then be handled by retrieving and adapting existing subagents.The system scales its own capabilities.What this means in practiceThis shift is not incremental. It introduces a different mental model for successful agent systems. Instead of a pipeline that runs tasks, you need an infrastructure that learns from them.For teams building agentic AI systems, the practical takeaway is straightforward:Static agents are a starting pointReusable capability compounds value over timeSystems improve fastest when every run produces something reusableThe most capable agent systems will not simply complete tasks. They will build a growing library of proven, portable capabilities that improve with every task they encounter.Of course, self-improving agent systems come with tradeoffs. Persisting and modifying executable tools introduces questions around verification, security, version control, and failure propagation. A system that can improve its own tooling can also reinforce bad patterns if the feedback loop is poorly designed.Final thought💡The most important shift in agent design may not be bigger models or more sophisticated prompts. It may be the move from systems that simply execute tasks to systems that accumulate capability over time.Static agents eventually hit a ceiling. Systems that create reusable tools, refine them through feedback, and carry those improvements forward operate very differently.Every completed task becomes part of the system’s future capability instead of disappearing into the void five seconds later.The long-term winners in AI may not be the systems that sound the smartest in a demo. They may be the ones that quietly get better every week while nobody is looking.SourceZhang, Z., Lu, S., Qian, H., He, D. and Liu, Z. (2026), “AgentFactory: A Self-Evolving Framework Through Executable Subagent Accumulation and Reuse”, arXiv:2603.18000. Published 18 March 2026.

Related Posts

Beijing lab at $20B as AI investors look to China

ICE Plans to Develop Own Smart Glasses to ‘Supplement’ Its Facial Recognition App

AWS Launches Agentic AI Payment Capabilities