Find answers from the community

Updated last year

How can I ensure that the context length

How can I ensure that the context length does not exceed the maximum content length when considering chat history?
1
b
T
F
20 comments
llama index should manage this?
You can also manually adjust those values (how much is retrieved and how much memory is kept)
I am constantly gertting the exceeded context error.
Could you send your code?
You can try using a model with a larger context window / adjusting the parameters
Here is the code:
llm = OpenAI( temperature=0.2, model="gpt-4", streaming=True, ) vector_store = FaissVectorStore.from_persist_dir("./faissMarkdown") storage_context = StorageContext.from_defaults( vector_store=vector_store, persist_dir="./faissMarkdown" ) service_context = ServiceContext.from_defaults(llm=llm) evaluator = ResponseEvaluator(service_context=service_context) index = load_index_from_storage(storage_context=storage_context) retriever = VectorIndexRetriever( index=index, similarity_top_k=7, ) response_synthesizer = get_response_synthesizer( service_context=service_context, response_mode="compact", text_qa_template=CHAT_TEXT_QA_PROMPT, ) chat_engine = ContextChatEngine.from_defaults( retriever=retriever, verbose=True, ) # Define a function to choose and use the appropriate chat engine def chatbot(input_text): try: response = chat_engine.chat(input_text) top_urls = [] for source in response.source_nodes: metadata = source.node.metadata if "url" in metadata: url = metadata["url"] top_urls.append(url) print(url, source.score) top_urls = "\n".join(top_urls) join_response = f"{response.response}\n\n\nFuentes:\n{top_urls}" return join_response except Exception as e: print(f"Error: {e}") return ["Error occurred"]
I get Error: This model's maximum context length is 4097 tokens. However, your messages resulted in 4121 tokens. Please reduce the length of the messages. using gpt-4
I tried with:
memory = ChatMemoryBuffer.from_defaults(token_limit=3000) chat_engine = ContextChatEngine.from_defaults( retriever=retriever, verbose=True, memory=memory, memory_cls=memory )
any ideas?? @Logan M @Teemu @bmax ??
looks like you're incorrectly passing the model? gpt-4 has 8k context size
Also you have quite large similarity top k and memory token limit, can try reducing those
I did print(llm._get_model_name()) and recieved gpt-4 log
I also tried with memory = ChatMemoryBuffer.from_defaults(token_limit=500)
I am going to reduce the similarity_top_k now
Is no braking with k=4
I am going to check tomorrow how to use gpt-4 well in the library I have installed
Good morning
I ran the following code to try the 16383 tokens model version:
llm = OpenAI( model="gpt-3.5-turbo-16k", temperature=0.2, streaming=True, max_tokens=16383 ) print(llm._get_model_name(), llm._get_max_token_for_prompt("hello"))
And I got that gpt-3.5-turbo-16k 16383.
Then I forced the error and still get: Error: This model's maximum context length is 4097 tokens. However, your messages resulted in 4313 tokens. Please reduce the length of the messages.
This is weird, I am using the latest openai and llama-index versions.
Any ideas?? @Teemu @bmax@Logan M
I think it could be an error from the openai API or from llama-index
You need to pass service context in the query engine like this:

Plain Text
chat_engine = index.as_chat_engine(
                similarity_top_k=3, service_context=service_context)
Then remember to define it here:

Plain Text
service_context = ServiceContext.from_defaults(callback_manager=callback_manager,
    llm=OpenAI(model="gpt-3.5-turbo-16k", temperature=0, max_tokens=1000), chunk_size=1024, node_parser=node_parser
)
Perfect, my mistake
thanks a lot
No worries, happy to help πŸ‘πŸΌ
Add a reply
Sign up and join the conversation on Discord