Find answers from the community

Updated 9 months ago

Hi there! Is there any way to get the

Hi there! Is there any way to get the corresponding nodes BEFORE the call to OpenAI is made? Currently, I'm doing this:
Plain Text
query_engine = index.as_chat_engine(chat_mode='context', 
                                                similarity_top_k=similarity_top_k, 
                                                llm=llm_engine,
system_prompt=prepared_system_prompt))
response = query_engine.chat(query_text, chat_history=chat_history)

Thanks!
L
S
22 comments
You could use a custom node postprocessor. Or run retrieval outside of your chat engine
May I ask you, how to run retrieval? any example? like this one?

Plain Text
nodes = index.as_retriever().retrieve("test query str")
yea exactly (you can also pass in the similarity_top_k here as well)
Cool, thanks! One more question, if you mind. How can I pass these nodes to query_engine.chat later (to avoid double retrieveing)?
hmmm, I don't think you can.

Probably, if you want to intercept these nodes, I would use a custom node-postprocessor instead
Otherwise, you'd have to define your own custom chat engine
I found this way, do you think it would work?
Plain Text
nodes = index.as_retriever(similarity_top_k=similarity_top_k).retrieve(query_text)
context_str = "\n\n".join([n.node.get_content() for n in nodes])
full_prompt = system_prompt + 'Below is the provided context: \n\n' + context_str
chat_history.append(ChatMessage(role="system", content=full_prompt))
chat_history.append(ChatMessage(role="user", content=query_text))
response = llm_engine.chat(chat_history)

The main question is is it the same as the original code?
Plain Text
query_engine = index.as_chat_engine(chat_mode='context', 
                                                similarity_top_k=similarity_top_k, 
                                                llm=llm_engine,
system_prompt=prepared_system_prompt))
response = query_engine.chat(query_text, chat_history=chat_history)
Its roughly the same as what the chat engine is doing. The main thing is in your custom version, you need to manage the chat history (either using a memory module, or however you want to manage that)
Thanks! I will look at the memory module.
Just a small clarification... were you talking about ChatMemoryBuffer or something else?
Yes thats what I meant
Thanks, gotcha!
But I can't figure out how to use it in this case. When using query_engine, I just pass it to as_chat_engine, but where I should pass it in my case and how to connect to chat_history?
So, the memory is there as way to manage the chat history, mostly so that it doesn't get too big.

So for example, I might do

Plain Text
memory = ChatMemoryBuffer.from_defaults(token_limit=1500)

...

system_message = ChatMessage(role="system", content=full_prompt)
user_message = ChatMessage(role="user", content=query_text)

prev_messages = memory.get()

response = llm.chat([system_message, *prev_messages, user_message])

memory.put(user_message)
memory.put(response.message)


This way, the chat history is included in each llm.chat() call (up to the token limit), rather than just the most recent message + context
Ahhh interesting thanks! I will try to use this approach
Hi! One more question on this piece of code. What exactly does token_limit here? Thanks!
Plain Text
memory = ChatMemoryBuffer.from_defaults(token_limit=1500)
Its limiting how many tokens the messages can use when running memory.get() -- it will fetch as many of the latest messages that fit into that limit
Well, I'm asking because I noticed even though I pass a pretty big number (like 10,000) there, the context is very small for some reason. I have a feeling it cuts the system message a lot.
πŸ€” It has nothing to do with the system message, since you are creating that outside of the memory
system_message = ChatMessage(role="system", content=full_prompt)
you can change your retriever to retrieve more or less when creating that
Okay, I see it may be my own bug πŸ€¦β€β™€οΈ
Add a reply
Sign up and join the conversation on Discord