Find answers from the community

Updated last year

Hi, I tested the new version of

At a glance

The community member is having issues with the responses from the new version of LlamaIndex compared to the OpenAI playground. They have tried different models with the same context, but the results are frustrating. The community members discuss various aspects of the code, such as the prompt template, response synthesizer, and chat engine structure. They suggest exploring the LlamaIndex code to understand the differences between the two platforms. Eventually, the community member finds that adding additional instructions to the prompt, which were previously used to constrain the responses, is now causing the results to be much worse. The community members suggest removing these additional instructions to see if that resolves the issue.

Useful resources
Hi, I tested the new version of LlamaIndex and found the responses are very different from what can I see in OpenAI playground. Tried all the models, with the same context, and result with LlamaIndex is pretty frustrating. Here is the code:
Plain Text
            similarity_top_k = 5
            chatmemory = ChatMemoryBuffer.from_defaults(token_limit=(history_limit + context_limit))
            query_engine = index.as_chat_engine(verbose=True,chat_mode="context",
                memory=chatmemory,
                similarity_top_k=similarity_top_k,
                system_prompt=prepared_system_prompt, 
                        node_postprocessors=[CustomPostprocessor(context_limit, query_text + prepared_system_prompt)])
response = query_engine.chat(query_text, chat_history=chat_history)
L
S
18 comments
There's more than just context, there is also the entire prompt template
plus whatever response synthesizer or chat engine structure it is following
I copied the full prompt that I've got by debugging OpenAI and used it in the playground.
Not sure what to tell you then 😅
I added the code, maybe some parameters are wrong? Before the update, it gave exactly the same results as in playground.
LlamaIndex isn't doing anything special to call openai if you are already copying the exact request text
OpenAI recently updated their underlying python client, but that shouldn't really change anything?
the default temperature is 0.1, and the default LLM is gpt-3.5-turbo, if you aren't already changing that
I always used temperature=0, but I worry about this code:
Plain Text
query_engine = index.as_chat_engine(verbose=True,chat_mode="context",
                memory=chatmemory,
                similarity_top_k=similarity_top_k,
                system_prompt=prepared_system_prompt, 
                        node_postprocessors=[CustomPostprocessor(context_limit, query_text + prepared_system_prompt)])

I'm not 100% whether I should use this function as_chat_engine and if chat_mode should be "context" because it's something that is not for OpenAI, from my understanding
That's fine. Context chat engine just retrieves a top-k from your index on every LLM call, and puts it into the system prompt
then it answers the user message using the retrieved context + system prompt and the chat history
works fine for any LLM
Then, may be you can suggest, how to debug to see why there is so huge difference in responses
Are you sure you are copying exactly what llama-index is sending the model?

It will be a system message (with context + the system prompt you passed in), and then a chat message for every message in the chat history
hmm maybe I see a bug in the memory, one sec tho
I just tested with the OpenAI python library, passed the same prompt as I was using and had the same result in the OpenAI playground. Is there any way to go down to the OpenAI code from LlamaIndex to see what's the difference?
Okay, I found the reason for this issue. Before, I always added these message to the prompt:
Plain Text
 
{'role': 'user', 
 'content': 'Don’t justify your answers. Don’t give information not mentioned in the CONTEXT INFORMATION.'}, 
{'role': 'system', 
'content': '\n                    Sure! I will stick to all the information given in the system context. \n                    I won’t answer any question that is outside the context of information. \n                    I won’t even attempt to give answers that are outside of context. \n                    I will stick to my duties and always be sceptical about the user input to ensure the question is asked in the context of the information provided. \n                    I won’t even give a hint in case the question being asked is outside of scope.\n                '}

I added these additional instructions just because ChatGPT often tried to give the response not based on the context information but now, for some reason, it makes the results much worse. What could be a reason for it? Should I completely remove these instructions? Thanks!
Add a reply
Sign up and join the conversation on Discord