yea we are working on making the agent service stateless -- its on the todo list for this wekk
it may require some stateful sets particularly with memory and state. How is llamaindex thinking about it. Like a sidecar pattern?
Something like that. Essentialy, having the agent service store its state+memory in a kvstore. This way, you can horizontaly scale that service
Looked at the Pipeline orchestrator, that also uses internal data structures to store the sequence of module. If the controller service fails, again we will have the same problem. It looks like the framework needs some significant change to include some resiliency pattern, else it will not be usable for a distributed architecture like micro services
This probably may not work, if I have multiple instances of the same agentservice, how will you sequence the writes if you do not use stateful sets. The write need to happen through a master process.
I think this will work fine. If you look at how the state of an agent works, it "forks" the memory, and then "commits" when the agent is done
I think the orchestrator is fine? The state is stored in a kvstore. Your can horizontaly scale it just fine, or reload from some existing state
For example, with kubernetes, assuming state is stored in a mongo or redis kvstore, you can reload and scale the control plane fine
The orchestrator itself is technically stateless. The state is managed by the control plane
Let me take a look again, I thought I saw that the pipeline orchestrator has a data structure where it adds the modules to run . It retrieves through the run_key. If the pipeline orchestrator process fails and if a new process starts , wondering if it will still know what module has completed and what will come next
Yes but that data structure is returned to the control plane, which is storing everything in a kvstore
I looked at the code again and I do see the KV store. But I am still struggling to understand how will this framework help to create production grade service oriented architecture. It looks to me that we are re-inventing something that is working quite well through K8s. Also the current framework looks lacking resiliency. for example ControlPlaneServer looks like a single point of failure. If it fails entire thing fails, and it does not look like I can have a HPA setup with ControlPlaneServer. For that we will need to have a cluster management tool like zookeeper. Would not it be much easier to follow the traditional microservices patterns here.
I'm not sure I follow tbh.
If it fails, you can configure docker or k8s to restart it π€·ββοΈ it can also have replicas
If you have actual suggestions for the code, I welcome a PR π
If the control plane restarted, it would just pick up back where it was according to its state
It's also at version 0.0.5 π
so take that as you will
Thanks Logan, I will continue to deep dive as the feature adds more functionality. I am totally in agreement with the approach and strongly believe that multi-agent should follow a service oriented structure. But I think I am still ignorant on some of the implementation details of the llama-agents. Will try to look into code and learn. Right now trying to figure out what happens when a message is pushed to an agent and the agent fails to process it, will the message get lost. I am trying to find all these things as I am doing an evaluation of the open source multi-agent frameworks like Autogen, Crew AI, Langgraph etc