Find answers from the community

Updated 2 months ago

Add thread-level persistence

Hey, having experimented with langgraph previously, I was hoping/expecting to see something like thread-level memory in LlamaIndex. However, I notice in the create-llama-app example that the entire conversation history is being passed from the frontend to the backend on each interaction. Is that the expected paradigm for chat engines in LlamaIndex?
I did see these docs, but they're not very comprehensive tbh - and don't cover for example how to create a memory per user/conversation.
L
m
3 comments
So, that create-llama example is built to be stateless. The chat history has to go somewhere (redis, mongodb, some db) -- in this case, its just being managed by the frontend

There are many chat storage options when using memory
https://docs.llamaindex.ai/en/stable/module_guides/storing/chat_stores/
If this example was using workflows, you can maintain history by storing stuff in the worklfow context, and serializing/deserializing the context between runs

Plain Text
from llama_index.core.workflow import Workflow
from llama_index.core.workflow.context_serializers import JsonSerializer, JsonPickleSerializer

handler = workflow.run()
result = await handler
serialized_context = handler.ctx.to_dict(serializer=JsonSerializer())

# resume at a later point
ctx = Context.from_dict(serialized_context, serializer=JsonSerializer())
handler = workflow.run(ctx=ctx)
final_result = await handler


This is something that llama-deploy automates for you

If you haven't used workflows yet, this doc would be helpful (its about the equivilant to langgraph)
https://docs.llamaindex.ai/en/stable/module_guides/workflow/#workflows
Thanks again Logan. I haven't looked into llama-deploy, but will check it out soon.
Add a reply
Sign up and join the conversation on Discord