Add thread-level persistence

At a glance

The community member is interested in using thread-level memory in LlamaIndex, but noticed that the create-llama-app example passes the entire conversation history from the frontend to the backend on each interaction. They wonder if this is the expected paradigm for chat engines in LlamaIndex, and find the documentation to be not very comprehensive on this topic.

In the comments, another community member explains that the create-llama example is built to be stateless, and the chat history has to be stored elsewhere (e.g., in a database). They suggest using workflows to maintain history by storing it in the workflow context and serializing/deserializing the context between runs. They mention that llama-deploy automates this process, and provide a link to the workflow documentation.

The original community member thanks the other member and says they will check out llama-deploy soon.

Useful resources

mmarkns

Hey, having experimented with langgraph previously, I was hoping/expecting to see something like thread-level memory in LlamaIndex. However, I notice in the create-llama-app example that the entire conversation history is being passed from the frontend to the backend on each interaction. Is that the expected paradigm for chat engines in LlamaIndex?
I did see these docs, but they're not very comprehensive tbh - and don't cover for example how to create a memory per user/conversation.

3 comments

LLogan M

So, that create-llama example is built to be stateless. The chat history has to go somewhere (redis, mongodb, some db) -- in this case, its just being managed by the frontend

There are many chat storage options when using memory
https://docs.llamaindex.ai/en/stable/module_guides/storing/chat_stores/

LLogan M

If this example was using workflows, you can maintain history by storing stuff in the worklfow context, and serializing/deserializing the context between runs

Plain Text

from llama_index.core.workflow import Workflow
from llama_index.core.workflow.context_serializers import JsonSerializer, JsonPickleSerializer

handler = workflow.run()
result = await handler
serialized_context = handler.ctx.to_dict(serializer=JsonSerializer())

# resume at a later point
ctx = Context.from_dict(serialized_context, serializer=JsonSerializer())
handler = workflow.run(ctx=ctx)
final_result = await handler

This is something that llama-deploy automates for you

If you haven't used workflows yet, this doc would be helpful (its about the equivilant to langgraph)
https://docs.llamaindex.ai/en/stable/module_guides/workflow/#workflows

mmarkns

Thanks again Logan. I haven't looked into llama-deploy, but will check it out soon.

Add a reply

Find answers from the community

Add thread-level persistence