AI agents are autonomous or human in the loop agents that take human input, process it, take action and provide the result to the end user. Usually, the familiar chatbots are one step which provide answer for a query but agents can do multi step processes as we will see in the examples.

OpenAI operator or use-browser

OpenAI operator takes a user query, opens a browser and performs the task on behalf of the user and provides the end result. This can be anything from shopping on amazon to buying flight tickets or simply applying for jobs. The Agent does it on behalf of the user. The user can interrupt in between which means that the system is human in the loop system.

Manus AI agent

Manus is another general purpose agent that takes user queries and runs those in an Linux VM and provides the completed task to the end user. This could be research reports, presentations or even website code.

Now that we have seen some practical examples, we can go into the AI Agents architecture, i.e. how to make an agent?

AI agents definition and architecture

A dream of some or most of the developers is that the AI does all the work and they never have to work again. AI delivers the tasks. While some think this is good, others think this is a way of replacing software developers with AI. Turing Award Winner and AI pioneer Yoshua Bengio have already warned against the massive replacement of talent. Andrej Karpathy has tweeted that English is the new programming language.

Amidst all these changes in teh software world, the core idea is that of AI agents. They are not like Agent Smith of the Matrix but autonomous programs that can think (reason), plan and act on a given cue to complete tasks either autonomously or with human agency.

Underneath this agency, the brain of the agent is the LLM ( Large Language) model that provides the know how of performing the tasks. There are different frameworks that provide access to the model and the agent that can help in doing different things such as a single agent or a multi agent framework. I have tried langgraph and crewAI from deeplearning.ai.

CrewAI and Langgraph and multi-agent frameworks that can help the developers in making multi agent based applications. For e.g., a writing task can involve a writer agent, editor agent and a publisher agent that coordinate to produce the article.

Then there are evals or evaluations that help in evaluating the agent framework .

The final lower level layer is the memory and database. Memory contains MemGPT and database like pinecone and chroma are used widely.

Depending on the granularity of information required, one could touch all the layer or none and just use agent externally as a tool.

Coding IDE based Agents

For Developers, there are different IDEs that have emerged like Cursor, windsurf, VSCode, replit and firebase studio. Each provides a coding agent that can modify code files, create web apps and run commands on the commandline. Currently, most IDEs use claude 3.7 by default except Firebase studio that uses Gemini.Similarly agents are coming up in other domains as well.

1. Introduction to the AI agents

OpenAI operator or use-browser

Manus AI agent

AI agents definition and architecture

Coding IDE based Agents

Comments