Find answers from the community

Updated 3 months ago

GitHub - imartinez/privateGPT: Interact ...

HI everyone

I'm using PrivateGPT https://github.com/imartinez/privateGPT and rn I'm struggling with a weird issue like llama_index.chat_engines.types: Encountered exception writing response to history: 'token'

I'm only getting this when using a model that runs on SageMaker while the exact same configuration runs well locally.

For info on my issue in PrivateGPT https://github.com/imartinez/privateGPT/issues/1367.

Did anyone of you have a lead on this ?
L
L
26 comments
Oof, thats a rough one to debug

The error is somewhere in this block
Attachment
image.png
I don't really know where token is coming from... nothing uses/mentions a variable by that name ๐Ÿค”
is_function() and put_in_queue() and memory.put() are also all very simple 1-2 line functions (and also no mention token)
I'm not 100% sure how privateGPT implements sagemaker LLMs, but it might be related to that? Something to do with how the LLM is streaming is my guess
Did some digging, updated the github issue with the likely source
Iโ€™ve answer on the issue with more details based on your help ๐Ÿ™‚
hmmm, is the input to messages_to_prompt() empty as well?
You can test that messages_to_prompt() function is working by trying

Plain Text
from llama_index.llms import ChatMessage

print(llm.messages_to_prompt([ChatMessage(role="user", content="Test")]))
@Logan M actually messages_to_prompt seems to be a class attribute that seems to be a pydantic Field

Iโ€™ll try your code to see if this output is empty.

If it is empty, do you have an idea on the source ?
If it is empty, it feels like a bug in llama index or the llm implementation in private gpt
Just checked and fairly confident it's not an issue in llama-index, at least not the latest version
It seems that you're right, when I'm running your code I can see (what it seems like) a correct output.

I'll try to dig a bit from the LLM side, but if you have any clue, it'll be great ! ๐Ÿ™‚
(in any case, you already help a lot, thanks !)
FYI : I think I've found the real issue : actually it seems that the user question is simply not passed to the LLM and we only pass the system message (i.e RAG context). That's why the LLM claim an empty input
definnitely an issue on the private GPT side
@Logan M I've digged more and I think I may have found the cause. I've been going into llama_index code used by private gpt and I've been going into llama_index/chat_engine/context.py.

If I understand correctly, this is where the RAG information are retrieved and added to my context before my message, in the stream_chat method : https://github.com/run-llama/llama_index/blob/e7090975a1807b6c30c132c65464cb51dba3804a/llama_index/chat_engine/context.py#L181

However here, I have quite a long context. My context is correctly retrieved but after that, my message can't be retrieved by the self._memory.get because the initial_token_count seems to be greater than the token_limit from the memory.

(I've tried to print out every important details in attachments)
Attachment
image.png
So the result is that the message passed to the LLM contains only the context and not my question anymore
does that make sense ?
If so, is there any options to increase this token_limit ? Or any other solution that could be used ?
I may miss something, I'm not an expert in LLM and even less in web ui
Ah that makes a lot of sense!

You can definitely increase the token limit, however, there is a limit to how much you can increase it

memory = ChatMemoryBuffer.from_defaults(token_limit=3900)

Otherwise, you need to reduce how much context is being retrieved
Do you have any guidance on how to reduce the context retrieved ?
(may be some documentation or so)
hmmm I would need to dive into what private gpt is doing

Generally, the idea would be to either decrease the top k, the chunk size, or both
Ok thanks ๐Ÿ™‚

For now I'll stick with my increased memory and I'll answer on my github issue with all the foundings we have right now.

Thank you very much for your help, really appreciated !
Add a reply
Sign up and join the conversation on Discord