Find answers from the community

Updated last year

How can I ensure that the context length

At a glance
The post asks how to ensure that the context length does not exceed the maximum content length when considering chat history. Community members suggest using llama index to manage this, manually adjusting the values, and using a model with a larger context window. The community members share code and discuss issues with exceeding the maximum context length, trying different approaches like reducing the similarity top k and memory token limit, and using the gpt-3.5-turbo-16k model. Eventually, they determine that the issue was with passing the service context correctly to the chat engine.
How can I ensure that the context length does not exceed the maximum content length when considering chat history?
1
b
T
F
20 comments
llama index should manage this?
You can also manually adjust those values (how much is retrieved and how much memory is kept)
I am constantly gertting the exceeded context error.
Could you send your code?
You can try using a model with a larger context window / adjusting the parameters
Here is the code:
llm = OpenAI( temperature=0.2, model="gpt-4", streaming=True, ) vector_store = FaissVectorStore.from_persist_dir("./faissMarkdown") storage_context = StorageContext.from_defaults( vector_store=vector_store, persist_dir="./faissMarkdown" ) service_context = ServiceContext.from_defaults(llm=llm) evaluator = ResponseEvaluator(service_context=service_context) index = load_index_from_storage(storage_context=storage_context) retriever = VectorIndexRetriever( index=index, similarity_top_k=7, ) response_synthesizer = get_response_synthesizer( service_context=service_context, response_mode="compact", text_qa_template=CHAT_TEXT_QA_PROMPT, ) chat_engine = ContextChatEngine.from_defaults( retriever=retriever, verbose=True, ) # Define a function to choose and use the appropriate chat engine def chatbot(input_text): try: response = chat_engine.chat(input_text) top_urls = [] for source in response.source_nodes: metadata = source.node.metadata if "url" in metadata: url = metadata["url"] top_urls.append(url) print(url, source.score) top_urls = "\n".join(top_urls) join_response = f"{response.response}\n\n\nFuentes:\n{top_urls}" return join_response except Exception as e: print(f"Error: {e}") return ["Error occurred"]
I get Error: This model's maximum context length is 4097 tokens. However, your messages resulted in 4121 tokens. Please reduce the length of the messages. using gpt-4
I tried with:
memory = ChatMemoryBuffer.from_defaults(token_limit=3000) chat_engine = ContextChatEngine.from_defaults( retriever=retriever, verbose=True, memory=memory, memory_cls=memory )
any ideas?? @Logan M @Teemu @bmax ??
looks like you're incorrectly passing the model? gpt-4 has 8k context size
Also you have quite large similarity top k and memory token limit, can try reducing those
I did print(llm._get_model_name()) and recieved gpt-4 log
I also tried with memory = ChatMemoryBuffer.from_defaults(token_limit=500)
I am going to reduce the similarity_top_k now
Is no braking with k=4
I am going to check tomorrow how to use gpt-4 well in the library I have installed
Good morning
I ran the following code to try the 16383 tokens model version:
llm = OpenAI( model="gpt-3.5-turbo-16k", temperature=0.2, streaming=True, max_tokens=16383 ) print(llm._get_model_name(), llm._get_max_token_for_prompt("hello"))
And I got that gpt-3.5-turbo-16k 16383.
Then I forced the error and still get: Error: This model's maximum context length is 4097 tokens. However, your messages resulted in 4313 tokens. Please reduce the length of the messages.
This is weird, I am using the latest openai and llama-index versions.
Any ideas?? @Teemu @bmax@Logan M
I think it could be an error from the openai API or from llama-index
You need to pass service context in the query engine like this:

Plain Text
chat_engine = index.as_chat_engine(
                similarity_top_k=3, service_context=service_context)
Then remember to define it here:

Plain Text
service_context = ServiceContext.from_defaults(callback_manager=callback_manager,
    llm=OpenAI(model="gpt-3.5-turbo-16k", temperature=0, max_tokens=1000), chunk_size=1024, node_parser=node_parser
)
Perfect, my mistake
thanks a lot
No worries, happy to help πŸ‘πŸΌ
Add a reply
Sign up and join the conversation on Discord