Thanks for your response @WhiteFang_Jr . I am pretty sure it does. I mean, it is possible to build a reasoning loop still using llamaindex components or use the Agent component "ReActAgent" that I've testing to.
the problem with "ReActAgent" component is that I can't edit the system prompt to add more reasoning steps or constraints for ex.
I think you'll have to customize this a bit to change the system instruction in ReactAgent
from llama_index.agent.react.formatter import ReActChatFormatter
rchf = ReActChatFormatter()
rchf.system_header = rchf.system_header + "Add your extra instruction here"
agent = ReActAgent.from_tools(
tools=query_engine_tools,
llm=llm,
verbose=False,
react_chat_formatter = rchf
)
This should work
Great, thanks for your help, it seems like could help. Let me try this and I'll come back with any news
@Erick Calle Castillo I’m also working on using Llama for IT purposes and what I think you may end up finding is that it is more suitable to build your own agent based off of the ReAct one
It’ll start by you subclassing the existing one and then you’ll realize you need 100 new functionalities and build it from scratch 😆
Hi @isaackogan thanks for the insights! Lol, that's was I thought, high level of customization is required for this kind of agent (many steps, commands, output files...)
If you want to share some progress or ideas of your project I'll be happy to hear about! I'm working on build tooling to automate IT work and currently building a PoC for the System Design stage (diagrams and IaC) powered by Llms
We’ve got pretty different goals, but basically I’m working on a scalable chatbot system that ingests data, re-syncs it on a daily basis, and lets you perform QA on it
I work at a university in 🇨🇦 and so the idea behind agents is for very large bots we need to split the indexes
So an agent is needed to pick which index is relevant based on a description of its topic 🙂
So a router query engine but with an agent right?
Yes, but using the router would complicate the stack, this was easier
The system is very modular, I.e. a service for managing indexes, and then one for managing bots, etc.
It makes development much easier to decouple, but it also means that I can’t just stick a router in there without breaking that decoupling principle
Interesting!
So it's like a private bot for the University or you do have code available on GitHub
Interested to check the code out
I think the eventual plan is to make it public OSS, but it’s not my call so I can’t riiiiiight now
The stack is pretty cool though, I’ll get some screenshots
I see, I see 👀. No worries
It’s a series of servers to create and many manage chat bots, but it’s not one singular one bot
Ideally it will scale to several hundred bots, managed by individuals at the school
Then use GPT4 VISION to read the code and let gpt4 write the rest of the code 😆
The “agent” we build actually lets combines several “bots” (each bot has their own index) into one contiguous unit that “looks” seamless within the chat UI
Sounds really amazing 🤩 ,
One question how's the speed for response?
Very interesting. I hope to end with this PoC in this week, so if you're interested in change some experiences later it'll be great to stay in touch. In my case the agent ingests from indices with embedded information from many specialized documents sources to generate code for Mermaid diagrams and generate high-quality IaC (currently focused on AWS CDK)
Small overhead for the HTTP request to pick which bot. Other than that, nothing
We use a Qdrant vector DB backend
Nothing is ever stored in memory, so there is no “load phase” for bots. When you use a bot, a transient bot class is made, queried, and quickly discarded
Got that idea when we visited Microsoft and they showed us their semantic kernel
Nooo, Canada has a “Microsoft Technology Centre” in their HQ and they bring in partners to help them get started with Azure OpenAI. The university is a partner and I was very lucky to tag along to that meeting
I have to say your project is getting cool and cool as we speak 🤩
I made a bot for my company as well. I'll start exploring this now 🙌
I’m no expert but I’ve gone through three iterations before landing on my final architecture that we think is ready for piloting
So if you have any questions or ideas let me know cause I love to learn new stuff 😁
Sure, will do.
This is actually interesting!!
I’ve never had as much fun on a project as this one
I hope your project gets live soon and they open source it as well 😅
Same here, but let me write unit tests first LOL
I hope to see that results very soon!
@isaackogan I have some thoughts about the loading stage: my solution needs to adapt to high-compliance environments (banks for example) where teams are always concerned to share information to be stored in some external vector stores. Do you have any experience on constraints when loading data locally?
Thinking in kind of basic business model: the idea is to share sort of community version where you only work and load required knowledge docs in your local environment or local vector store engine. And with the business version customers can load in their projects our “embedded knowledge” (actually stored in the cloud) as exposed tools that enhance the agent responses
Hmm @Erick Calle Castillo we’re not too worried about data security because we have on premises servers
Right now our pilot is running in a VM on a not-so-VM ~300m from my desk
Depending on how strict of a concern, you could implement it so that the vector store is local
For example your app spins up a Vector DB on each client computer running your solution
They’re very lightweight so memory or CPU footprint wouldn’t be an issue
That would be a really decentralized approach
Some centralized data everyone has access to (or even role-based access) and some local data per-client
@isaackogan It sounds really interesting, is there documentation where you can get the basis of your work? Or did you create or implement the code on your own?
It's been my own work since May of this year
@isaackogan Let me see, if I understood you, you have created an immense amount of information stored in indexes, is that in vectors or nodes? Then to find the best match, you search the indexes, creating for each search an agent whose life ends at the end of the search and passing the information to another agent to show it to the user. Of course, since it is a university, the information is divided into multiple parts according to the specialty. My question would be, you say that there are several bots, is a bot (chatbot) an agent for you?
I definitely won't take credit for making the information, it's a huge corpus of Questions mapped to answers on over 550 topics around the university
Then to find the best match, you search the indexes, creating for each search an agent whose life ends at the end of the search and passing the information to another agent to show it to the user.
Correct, sort of.
The design of a "Bot" in our system is that each bot covers 1 topic. That's a design principle we chose
This works great for some cases, but we want to build a general-purpose bot with knowledge on everything about the university, like we currently have using "traditional AI"
But we want to stick to our principle that 1 bot = 1 topic
Under the hood, we're going to route users to the appropriate bot relevant to their topic and present it as if it's just one to the end-user
@isaackogan When you say a topic, it is a simple search or a search by topic and the topic can be electronics, computing, etc.
We use the LLM to decide which topic is relevant, and the structure of a topic is this:
"content_areas": [
{
"name": "Electronics",
"description": Question about electronics at the university
}
]
The general idea is that we use the LLM to perform a ranking of the most relevant topic to the prompt, and then that topic has a bot associated with it
So we feed the user's prompt to the bot that we ranked highest (assuming it meets a minimum threshold) and voila
It's sort of like a LlamaIndex agent in that we use the LLM to pick, but rather than picking a tool, we're picking a topic
@isaackogan That's what I imagined, thanks.
So far the results are freaking insane
But I want to get some quantitative data on how much better this method is than if we were to just dump everything into 1 index
@WhiteFang_Jr sorry I took so long
That's the docker-compose stack
Here's the indexing server, Criadex
And here's the downstream "bot server" that utilizes the indexes and packages them into bots
Criadex is effectively an in-house replica of Azure cog search
Yeah, WhiteFang wanted to see it earlier and I promised I'd send the architecture
And yes it was a shit ton of work over months of trial and error
It's almost offensive that it fits into a simple docker-compose
You can just spin one up in a second in K8s
Sorry bro, I get back to work. Good luck y Congratulations
Thanks for the architecture, it's really something that you have built.
Haha, but I'm sure your learning curve would have grown exponentially working on this bot.
@isaackogan Good morning Isaac, here in my country it is 8 in the morning, I know that you have worked hard on your project and really dedicated a lot of time to it, but have you thought after finishing it about making a voice assistant that connects to your infrastructure and, through voice, ask questions and return answers? This is just a suggestion.
Was just thinking about that yesterday at work; there’s some great TTS offered in Azure
The benefit of the infrastructure design is what you’re suggesting would simply be a new downstream app that hooks into the Bot server
@isaackogan I think it is possible to go through agents
I would want to, I’d like if bots are just bots