GitHub - imartinez/privateGPT: Interact ...

LLvffY

HI everyone

I'm using PrivateGPT https://github.com/imartinez/privateGPT and rn I'm struggling with a weird issue like llama_index.chat_engines.types: Encountered exception writing response to history: 'token'

I'm only getting this when using a model that runs on SageMaker while the exact same configuration runs well locally.

For info on my issue in PrivateGPT https://github.com/imartinez/privateGPT/issues/1367.

Did anyone of you have a lead on this ?

26 comments

LLogan M

Oof, thats a rough one to debug

The error is somewhere in this block

Attachment

LLogan M

I don't really know where token is coming from... nothing uses/mentions a variable by that name 🤔

LLogan M

is_function() and put_in_queue() and memory.put() are also all very simple 1-2 line functions (and also no mention token)

LLogan M

I'm not 100% sure how privateGPT implements sagemaker LLMs, but it might be related to that? Something to do with how the LLM is streaming is my guess

LLogan M

Did some digging, updated the github issue with the likely source

LLvffY

https://github.com/imartinez/privateGPT/issues/1367#issuecomment-1850460483

LLvffY

I’ve answer on the issue with more details based on your help 🙂

LLogan M

hmmm, is the input to messages_to_prompt() empty as well?

LLogan M

You can test that messages_to_prompt() function is working by trying

Plain Text

from llama_index.llms import ChatMessage

print(llm.messages_to_prompt([ChatMessage(role="user", content="Test")]))

LLvffY

@Logan M actually messages_to_prompt seems to be a class attribute that seems to be a pydantic Field

I’ll try your code to see if this output is empty.

If it is empty, do you have an idea on the source ?

LLogan M

If it is empty, it feels like a bug in llama index or the llm implementation in private gpt

LLogan M

Just checked and fairly confident it's not an issue in llama-index, at least not the latest version

LLvffY

It seems that you're right, when I'm running your code I can see (what it seems like) a correct output.

I'll try to dig a bit from the LLM side, but if you have any clue, it'll be great ! 🙂

LLvffY

(in any case, you already help a lot, thanks !)

LLvffY

FYI : I think I've found the real issue : actually it seems that the user question is simply not passed to the LLM and we only pass the system message (i.e RAG context). That's why the LLM claim an empty input

LLvffY

definnitely an issue on the private GPT side

LLvffY

@Logan M I've digged more and I think I may have found the cause. I've been going into llama_index code used by private gpt and I've been going into llama_index/chat_engine/context.py.

If I understand correctly, this is where the RAG information are retrieved and added to my context before my message, in the stream_chat method : https://github.com/run-llama/llama_index/blob/e7090975a1807b6c30c132c65464cb51dba3804a/llama_index/chat_engine/context.py#L181

However here, I have quite a long context. My context is correctly retrieved but after that, my message can't be retrieved by the self._memory.get because the initial_token_count seems to be greater than the token_limit from the memory.

(I've tried to print out every important details in attachments)

Attachment

LLvffY

So the result is that the message passed to the LLM contains only the context and not my question anymore

LLvffY

does that make sense ?

LLvffY

If so, is there any options to increase this token_limit ? Or any other solution that could be used ?

LLvffY

I may miss something, I'm not an expert in LLM and even less in web ui

LLogan M

Ah that makes a lot of sense!

You can definitely increase the token limit, however, there is a limit to how much you can increase it

memory = ChatMemoryBuffer.from_defaults(token_limit=3900)

Otherwise, you need to reduce how much context is being retrieved

LLvffY

Do you have any guidance on how to reduce the context retrieved ?

LLvffY

(may be some documentation or so)

LLogan M

hmmm I would need to dive into what private gpt is doing

Generally, the idea would be to either decrease the top k, the chunk size, or both

LLvffY

Ok thanks 🙂

For now I'll stick with my increased memory and I'll answer on my github issue with all the foundings we have right now.

Thank you very much for your help, really appreciated !

Add a reply

Find answers from the community

GitHub - imartinez/privateGPT: Interact ...