AI Agents: One Ring To Rule Them All

In the ever-evolving landscape of Artificial Intelligence, a fascinating breed of digital entities has emerged: AI Agents. These agents are not mere lines of code; they are pushing the boundaries of how we are using Generative AI. But what exactly are these agents, and why do we need them?

What are Autonomous AI Agents and why do we need them?

In essence, Autonomous AI Agents are like self-directed workers. Once you give an objective to these computer programs, they can autonomously figure out what needs to be done, create a plan to achieve their objective and continuously learn from their experiences while working towards their goals. In order to do so, they might make use of different tools, surf the web or even employ Large Language Models. Just like the Ring of Sauron in the Lord of The Rings saga, AI Agents are often seen as the "missing piece" in the search for AGI as they have the potential to instruct or "rule" any LLM or other tool to achieve their goals. In this post, we'll have a look into these Rulers of the Generative AI World.

Famous AI Agents

First of all, there are a lot of different AI Agents, some of which have become very popular in a very short amount of time. The reason for that is the fact that these agents typically seem to be able to perform an array of tasks autonomously, leading to AI enthusiasts thinking that AGI is "just around the corner". However typically, as time goes by and as the limitations of the agent become more clear, the hype around the agent fades and so does the hype around AGI. That is, until the next big AI Agent makes its introduction to the world, thereby resetting the hopes for AGI. In this section, we'll have a look at some of these agents, how they work and what their limitations are.

One famous AI Agent is AutoGPT. In essence, how AutoGPT works is as follows: after receiving a goal expressed in Natural Language, it will break it down into subtasks and use the internet and other tools in an automatic way in order to achieve it. Unlike working with LLMs like GPT-4 yourself, where you have to keep sending prompts to the model in order to get answers, AutoGPT independently works towards its objective and remembers tasks it still has to finish and then continues working on those. The only thing you have to define as a user is the agent's name, role, objective and up to 5 ways to achieve that objective. One cool demo of AutoGPT shows how it can be used to create its own Python code to solve a problem and then also execute that piece of code on its own. We've added a link to that demo in the resources of this Insight.

Another AI Agent would be SuperAGI, often referred to as "AutoGPT on steroids". That's because there are certain ways in which SuperAGI looks like a step up from AutoGPT. For starters, you can extend it with a bunch of different tools right out of the box. Some of these tools include: Slack, email, Google Search, Github, Dall-E and Power BI. It also has a graphical user interface and you can run agents concurrently. Furthermore, everything is wrapped up in a docker image, making it very easy to install.

Finally, while there are more interesting agents to cover, the last AI Agent we'd like to introduce you to is the Self Operating Computer. This is essentially an open-source system published on Github by HyperwriteAI that uses GPT-4 with vision to use a computer like a human would. Let me explain: let's say that you ask the Self Operating Computer to move some files on your computer from one folder to another one. What it will do is it will take a screenshot of your computer screen, figure out where to click and repeat this process until the task is completed. The model comes in 2 flavours: there's the open-source model on the one hand and a Google Chrome plugin on the other hand. Without diving too much into the nitty-gritty here, the Google Chrome plugin currently performs better than the open-source version as it uses metadata it gets from Google rather than taking screenshots. However, the open-source version has more potential as it is capable of handling your computer beyond the boundaries of the Google Chrome browser.

Demo AI Agents

What's in it for you?

So, reading all this you might wonder: do I have to start using one of these agents? Or do I have to create my own agent? Well, at element61 we really look at "start small, dream big" approach. Because there's actually 2 pieces to the puzzle here. First of all, it's worth understanding that in Generative AI projects, Agents are often implemented without you even realizing it. For example, let's take the example of a "simple" chatbot that allows your employees to ask questions about your own data in the typical RAG setup. Even for these solutions, agents are typically used to manage the flows between the different components in the setup to generate the answers to these questions.

The second piece to the puzzle could involve more complex setups: let's say that you want to automate a more complex workflow within your company: you are a bank and you want to personalize the suggested investments for your customers a bit more. One thing you could do is build your own agent that first uses an LLM to summarize everything you know about a specific customer, then use another LLM to find the best match between that customer profile and the entire range of products your bank has to offer, to then finally use a third LLM that gives suggestions to your investment advisors on how to persuade the customer to buy in on the offer.

So, we encourage you to reflect on your current business processes, dissect them and organize them into a step-by-step flow that an AI Agent could automate. In other words, a lot of food for thought! If you're interested in discovering what Generative AI and Agents can do for your company: make sure to reach out to element61!