Max tokens

At a glance

The community member is facing an issue where setting service_context=service_context in the query_engine prevents the ChatGPT model from accessing general knowledge, while not setting it results in the response being cut off for longer text. The comments suggest that the LLM always has access to external knowledge, and the issue is likely due to prompt engineering. The community members discuss modifying the text_qa_template and refine_template, as well as adjusting the maximum output tokens, as potential solutions to achieve both access to general knowledge and no cut-off of longer responses.

Useful resources

MMaker

Hi! 🙂

I am running into an issue where if I set service_context=service_context in query_engine like so:

Plain Text

llm_predictor = ChatGPTLLMPredictor(llm=ChatOpenAI(temperature=0, model_name="gpt-3.5-turbo", streaming=False))
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor)
    query_engine = index.as_query_engine(text_qa_template=CHAT_QA_PROMPT,
                                         refine_template=CHAT_REFINE_PROMPT,
                                         similarity_top_k=3,
                                         streaming=False, 
                                         service_context=service_context)

then the chatGPT does NOT have access to General Knowledge.

However, when I do NOT set service_context=service_context in query_engine like so:

Plain Text

    query_engine = index.as_query_engine(text_qa_template=CHAT_QA_PROMPT,
                                         refine_template=CHAT_REFINE_PROMPT,
                                         similarity_top_k=3)

then I do have access to General Knowledge, but the ChatGPT response is getting cut off when it writes longer text response.

How do I achieve both access to General Knowledge AND no cut off of longer text responses?

Thank you!

8 comments

LLogan M

The LLM technically always has access to external knowledge, just a matter of prompt engineering 👌

In any case, when you don't set the service context, it's actually using a completely different model (text-davinci-003)

All openai models default to 256 max tokens. You can change this by setting max_tokens

https://gpt-index.readthedocs.io/en/latest/how_to/customization/custom_llms.html#example-changing-the-number-of-output-tokens-for-openai-cohere-ai21

MMario_Borher

@Logan M @Maker I seem to have the same problem of not having access to external knowledge. And if I take it out I can't stream. What could be the solution here?

LLogan M

Yea as I mentioned above, the llm always has access to external knowledge, it's just a matter of prompt engineering.

You'll get the best results if you create a custom text_qa_template and refine_template like @Maker has above

LLogan M

Here's a link to an example, where I added a system prompt for gpt-3.5

You can probably skip the system prompt and just modify the instructions

https://discord.com/channels/1059199217496772688/1109906051727364147/1109972300578693191

If you aren't using gpt-3.5, it will be slightly different, so let me know

MMaker

I modified the text_qa_template like so

Attachment

MMaker

and also added this piece of text into system prompt

MMaker

Attachment

MMaker

and that fixed it for me, the gpt now has access to external knowledge

Add a reply

Find answers from the community

Max tokens