Find answers from the community

Updated 6 months ago

ContextChatEngine

At a glance

The community member has built a LlamaIndex-based app with FastAPI APIs, including one to return a ContextChatEngine object and another to return a response when the user queries. However, the community member is unable to return the ContextChatEngine object because it is not serializable/deserializable, resulting in a "TypeError: cannot pickle 'builtins.CoreBPE' object" error.

The community members discuss potential solutions, such as serializing the main settings needed to reconstruct the chat engine on the client side or providing the chat interface itself over the API. The latter option is suggested as a better design, where the backend manages all the conversations and engines, and the client interacts with API endpoints like "/chat/{user_id}" to post messages to a specific chat engine.

The community members also discuss that this issue is not specific to FastAPI, but would likely occur with any API framework, as serializing complex objects like chat engines or indexes can be challenging. The suggested solution is to either cache active conversations in memory and load/reload the index and chat engine as needed, or send the information required to reconstruct the chat engine on the client side.

Need your help again!! 🥺

So I have built this llamaindex based app and used FastAPI to create the relevant APIs.

My app has a feature to chat with the video transcript. Now for this I have exposed 2 APIs - one to return the ContextChatEngine object and the other to return a response whenever the user types the query, with this object by calling query_engine.query().

But I am not able to return this ContextChatEngine object because the ContextChatEngine class is not serializable/deseriazable.Caling the API is throwing "TypeError: cannot pickle 'builtins.CoreBPE' object" and creating a custom response class is throwing an error too. Any idea how to fix this?
E
r
L
9 comments
hmm, why do you need to return the ContextChatEngine?
could you provide some snippet?
So the experience is like this - The moment you submit the video link, I create the ContextChatEngine object by calling the "/get_chat_engine" endpoint and then on subsequent submissions of query inputs I call the "/chat" endpoint. This is the code :-

@app.get("/get_chat_engine", response_model=None)
def get_chat_engine(yt_video_link: str):
index = initialize_index(yt_video_link)

retriever = VectorIndexRetriever(
index=index,
similarity_top_k=2,
)
response_synthesizer = get_response_synthesizer(
response_mode='tree_summarize', use_async = True, streaming = True)

system_prompt = f""" You are a friendly and helpful mentor whose task is to \
use ONLY the context information and no other sources to answer the question being asked.\
If you don't find an answer within the context, SAY 'Sorry, I could not find the answer within the context.' \
and DO NOT provide a generic response."""

chat_engine = ContextChatEngine.from_defaults(system_prompt = system_prompt, retriever = retriever, response_synthesizer = response_synthesizer)
return chat_engine


@app.get("/chat")
def chat(chat_engine: ContextChatEngine, query: str):
response_stream = chat_engine.stream_chat(query)
return StreamingResponse(response_stream.response_gen)
I think you will have a hard time properly serializing the chat engine or an index.

You could either serialize the main settings needed to re-construct it on the other end, or provide that chat interface itself over the api

Tbh the second option seems like a better design
Didn't understand the second option. What do you mean by "providing the chat interface" over the API?
Also is this a FastAPI restriction, or will I have this problem with any such API framework?
I think you will have this problem with any API framework

By providing the chat interface over the API, I mean the backend should probably be managing all the converstations/engines. Then you could have API endpoints like
@app.post(/chat/{user_id}) -- where messages get posted to a specific chat engine, managed by some ID? 🤔
Ohh so you mean bringing in a storage layer/DB to store the index/engine objects by user_id or something else, right?
yea possibly! or it could even be managed in-memory depending on the scale

Basically, serializing this stuff is pretty hard. It's probably better to cache active convserations in memory, and long term load/re-load the index and chat engine after a certain period of inactivity

Otherwise, you'll need to send over information needed to re-construct the chat engine on the client side
Add a reply
Sign up and join the conversation on Discord