I'm encountering an issue where my retrieved context doesn't seem to be being sent to the llm correctly and in order to debug I need to inquire about the entire prompt that was sent to the llm (after the chats are transformed and the context is inserted). I basically want a way to see all of the text that gets sent to the llm when i call stream_chat like this:
query_stream = chat_service.stream_chat(
messages=all_messages,
use_context=True,
)
The response of
chat_service.stream_chat()
is of type
CompletionGen
which only contains a list of the sources. I'd like to keep around a copy of the whole prompt that is sent to the llm for each invocation of
stream_chat
for debugging purposes.
Does anyone know how this might be done in llama-index without serious modifications to the framework code?
Also if theres some way to know for sure how the nodes + messages got composed into the prompt, that would also be sufficient.