How can I ensure that the context length

At a glance

The post asks how to ensure that the context length does not exceed the maximum content length when considering chat history. Community members suggest using llama index to manage this, manually adjusting the values, and using a model with a larger context window. The community members share code and discuss issues with exceeding the maximum context length, trying different approaches like reducing the similarity top k and memory token limit, and using the gpt-3.5-turbo-16k model. Eventually, they determine that the issue was with passing the service context correctly to the chat engine.

FFran Piantoni

How can I ensure that the context length does not exceed the maximum content length when considering chat history?

20 comments

bbmax

llama index should manage this?

TTeemu

You can also manually adjust those values (how much is retrieved and how much memory is kept)

FFran Piantoni

I am constantly gertting the exceeded context error.

TTeemu

Could you send your code?

TTeemu

You can try using a model with a larger context window / adjusting the parameters

eelmatero

Here is the code:


llm = OpenAI(
    temperature=0.2,
    model="gpt-4",
    streaming=True,
)

vector_store = FaissVectorStore.from_persist_dir("./faissMarkdown")
storage_context = StorageContext.from_defaults(
    vector_store=vector_store, persist_dir="./faissMarkdown"
)
service_context = ServiceContext.from_defaults(llm=llm)
evaluator = ResponseEvaluator(service_context=service_context)

index = load_index_from_storage(storage_context=storage_context)


retriever = VectorIndexRetriever(
    index=index,
    similarity_top_k=7,
)

response_synthesizer = get_response_synthesizer(
    service_context=service_context,
    response_mode="compact",
    text_qa_template=CHAT_TEXT_QA_PROMPT,
)

chat_engine = ContextChatEngine.from_defaults(
    retriever=retriever,
    verbose=True,
)

# Define a function to choose and use the appropriate chat engine
def chatbot(input_text):
    try:
        response = chat_engine.chat(input_text)

        top_urls = []
        for source in response.source_nodes:
            metadata = source.node.metadata
            if "url" in metadata:
                url = metadata["url"]
                top_urls.append(url)
                print(url, source.score)


        top_urls = "\n".join(top_urls)

        join_response = f"{response.response}\n\n\nFuentes:\n{top_urls}"

        return join_response

    except Exception as e:
        print(f"Error: {e}")
        return ["Error occurred"]

I get

Error: This model's maximum context length is 4097 tokens. However, your messages resulted in 4121 tokens. Please reduce the length of the messages.

using gpt-4

eelmatero

I tried with:


memory = ChatMemoryBuffer.from_defaults(token_limit=3000)

chat_engine = ContextChatEngine.from_defaults(
    retriever=retriever,
    verbose=True,
    memory=memory,
    memory_cls=memory
)

any ideas?? @Logan M @Teemu @bmax ??

TTeemu

looks like you're incorrectly passing the model? gpt-4 has 8k context size

TTeemu

Also you have quite large similarity top k and memory token limit, can try reducing those

eelmatero

I did print(llm._get_model_name()) and recieved gpt-4 log

eelmatero

I also tried with memory = ChatMemoryBuffer.from_defaults(token_limit=500)
I am going to reduce the similarity_top_k now

eelmatero

Is no braking with k=4
I am going to check tomorrow how to use gpt-4 well in the library I have installed

eelmatero

Thanks

eelmatero

Good morning
I ran the following code to try the 16383 tokens model version:

llm = OpenAI(
    model="gpt-3.5-turbo-16k", temperature=0.2, streaming=True, max_tokens=16383
)
print(llm._get_model_name(), llm._get_max_token_for_prompt("hello"))

And I got that gpt-3.5-turbo-16k 16383.
Then I forced the error and still get:

Error: This model's maximum context length is 4097 tokens. However, your messages resulted in 4313 tokens. Please reduce the length of the messages.

This is weird, I am using the latest openai and llama-index versions.
Any ideas?? @Teemu @bmax@Logan M

eelmatero

I think it could be an error from the openai API or from llama-index

TTeemu

You need to pass service context in the query engine like this:

Plain Text

chat_engine = index.as_chat_engine(
                similarity_top_k=3, service_context=service_context)

TTeemu

Then remember to define it here:

Plain Text

service_context = ServiceContext.from_defaults(callback_manager=callback_manager,
    llm=OpenAI(model="gpt-3.5-turbo-16k", temperature=0, max_tokens=1000), chunk_size=1024, node_parser=node_parser
)

eelmatero

Perfect, my mistake

eelmatero

thanks a lot

TTeemu

No worries, happy to help 👍🏼

Add a reply

Find answers from the community

How can I ensure that the context length