An Architect’s Guide To Agentic AI : From Boredom to Brilliance
In the wild world of RAG (Retrieval Augmented Generation), I’ve been itching to write this article but was waiting for the perfect storm — or maybe just for everyone to get as bored as I am with AI’s greatest hits: summaries, searches, and Q&As. Let’s be honest, the industry has had its fill of these one-trick ponies, and folks are now asking, “What else ya got, AI?” Funny thing is, Agentic AI has been lurking in the shadows since day one of this LLM saga, but it got ghosted thanks to the hype around zero-shot, few-shot, and RAG. But guess what? In the past few months, Agentic AI is finally getting its moment in the sun as everyone starts to realize just how much we need it.
What is an Agent? (If you are pro skip this)
A piece of code / software which can complete a specific task using AI(LLM) and set of tools.
Tools?
Agent uses tool or set of tools to complete the task, tools can be API, a native function or just a pre-engineered prompt etc.
Those who are familiar with these concepts, I hope you are aware there is another component which is called Planner.
Planner?
Planner is the output from LLM, which tells agent in which order and how tools need to be invoked to complete a task.
In last few years, we had various frameworks which helps developer to build agent by using planners. These frameworks help you to create agent, then register some tools to it, and help you to execute the plan given by the LLM while performing a task.
And there are few other agentic frameworks which uses planner, you can always Google them.
I always give this example to explain a single agent system.
Here is a Portfolio Agent, which has 4 tools in its disposal, Market Data, Position, News and Sentiment Analyzer. This can answer your query by invoking correct set of tools, in correct sequence using LLM.
If you are up to here and lost .. This is a good video to understand the Agent, Tools and planner concept. If you are familiar, please skip.
World was a happy place with these single agent multiple tool concept.
Single agent multiple tools approach has major limitations.
- Completely depending on LLM to understand and determine the flow
- Cannot scale beyond 3–4 tools, it starts hallucinating.
- In case LLM creates a wrong plan, iteration is a problem.
As always to solve this complicated problem, we complicate it further by introducing Multi-Agent system.
Fortunately Multi-Agent system is not new, in fact they became popular with launch of ChatGPT. AutoGPT was popular too.
This year old video from Andrej Karpathy describes Multi-Agent system nicely.
If you have seen this video earlier, you can always re-listen from 24 mins onwards for Agent concepts, otherwise spending 40 mins on this video will be a good investment of your time.
I don’t want to show any example through code but want to discuss the concepts of Multi-Agent system, if you get the concept, there are various way of implementing it.
Multi-Agent system
The portfolio agent that I have discussed above can be reimplemented using Multi-Agent system.
Here is a Multi-Agent system, with 4 agents.
- Service Discovery Agent : Based on the requirement, it can invoke enterprise service registry and retrieve the information about the API that need to be invoked. Here it’ll pull Position API and News API from service registry.
- Call API Agent : A simple agent, which can invoke any API, if the API spec is made available to it. Here it’ll invoke Position and News API in sequence(facilitate by Supervisor Agent)
- Utility Agent : It has general purpose tools available to its disposal. It’ll invoke tool Sentiment Analyzer.
- Evaluator Agent : It evaluates user query and final answer and decide if supervisor needs to reiterate the process or more human input required or the answer can be returned to user.
- Supervisor Agent : It takes the query and try to facilitate among the agents to get the response.
Now you see its a major design change, agents are not any more functionality specific, they are more task specific. These set of agents are not only limited to provide the functionality of Portfolio Agent but they can handle any queries where API invocation is required, or any task that can be available over utility agent’s toolset e.g. Draft Mail, Sentiment Analysis etc.
Multi-Agent System Tech Stack :
To build on Multi-Agent System, a quick option is OpenAI Assistant API. Assistant API along with Autogen Framework can help you to get started quickly. Autogen Framework provides all the bells and whistles you need to communicate, state management between Assistant API and your application.
If you are skeptical about uploading / sharing files with Assistant API, then you can build it at your end as well, by using your custom code or with help of any framework like Autogen, Swarm, Langraph etc. IMO all these frameworks are in very early stage when you consider Multi-Agent system, but definitely using a framework always make our implementation much structured.
Common Implementation Patterns :
Supervisor Pattern
This is the same as example I have given earlier
You will find this implementation in multiple places, there will be a central agent who will be assigning task to all agents, and managing the state. You will find this implementation in various blogs and videos, some call it Supervisor, some call it Router, some Manager. Bottom of the story, your manager decides which agent to call next.
Network Pattern
There is no central agent in this pattern, every agent decides which agent to call next.
Hierarchical
This is another version of supervisor pattern, where you have supervisor at multiple level, and they are responsible to assign the task to end agent.
Here is a quick and nice video from langchain, explaining these concepts.
If you want to start some development , here is a good prompt which will help you to get started with a Multi-Agent code. Just open ChatGPT or Gemini and paste the following prompt to start coding.
I want to develop a program using GPT-4 Turbo or GPT-4o that operates on a CSV file. The program should utilize two autonomous agents that interact to perform the desired operation:
Agent 1: Writes Python code to load the CSV file and perform a specific operation as requested by the user.
Agent 2: Executes the generated Python code and checks for any execution errors. In case of an error, Agent 2 should provide detailed feedback to Agent 1 to correct the code.
The agents will run autonomously, interacting and refining the code for a specified number of iterations (n) or until the code executes successfully without errors. The process should stop either when the operation is completed successfully or after reaching the maximum number of iterations.
Happy Prompting …