Find answers from the community

Updated last year

Can the context retrieved be so large

At a glance

The post asks if the context retrieved can be so large that it exceeds the context window. The comments discuss this issue, with community members suggesting that it should be handled automatically and that Llamaindex can ensure the retrieved context does not overflow the context window. However, some community members are encountering token limit errors, and they share code snippets and discuss potential solutions.

The main points discussed in the comments are:

  • Reducing the amount of context retrieved to avoid exceeding the token limit
  • Trying different chunk sizes and using a reranking step to filter down the retrieved context
  • Considering using a different language model, such as GPT-3.5-turbo-0125, which has a larger context window
  • Exploring the need for a custom retriever or engine if the context window is not being exceeded

There is no explicitly marked answer in the comments, but the community members provide suggestions and discuss potential solutions to the issue of exceeding the context window.

Can the context retrieved be so large that it exceeds the context window?
T
k
L
40 comments
It should be handled automatically
Llamaindex automatically makes sure the retrieved context wont overflow the context window
I am getting token limit error
Could you share your code/error?
Plain Text
initial token count exceed token limit window
sure. just a sec
Plain Text
if st.session_state.messages[-1]["role"] != "assistant":
    with st.chat_message("assistant"):
        with st.spinner("Thinking..."):
                    chat_engine=CondensePlusContextChatEngine.from_defaults(st.session_state.query_engine,memory=memory,system_prompt="You're a helpful and friendly chatbot that uses provided context to answer user queries. Context: {context_str}")
                    response = chat_engine.chat(str(prompt))
                    history=memory.get()
                    st.write(history)
                    validating_prompt = (" Validate the provided response by ensuring it directly addresses the user's query or provides relevant information. Return False if the response does not contribute to answering the user's question."
                                         "Query: {query}"
                                         "Response: {bot_response}")
                    feedback = llm.complete(validating_prompt.format(query=prompt,bot_response=response.response))
                    if feedback==False:
                        st.write("DISTANCE APPROACH")
                        response=answer_question(prompt)
                        st.write(response)
                        message = {"role": "assistant", "content": response}
                        st.session_state.messages.append(message)
                    else:
                        st.write(response.response)
                        message = {"role": "assistant", "content": response.response}
                        st.session_state.messages.append(message)
Plain Text
ValueError: Initial token count exceeds token limit
Ah yea, not as easily handled in a chat engine
Any hacks to solve? I have gone through memory folder on github and found nothing of use.
Reduce how much context you are retrieving
I'm not sure how you created your index or chat engine, but usually this happens if you changed the top k or chunk size
Yeah but if I reduce the top_k I don't get the relevant chunk
Sounds like you should have a reranking step then
That happens after retrieval no?
retrieval -> rerank -> insert into system prompt for context chat engine
the chat engine shoouuuuld have a node_postprocessors kwarg
Yeah. But the chunk does not get retrieved. Reranking would be useless.
? not sure what you mean
its not useless
retrieve top k 15, filter down to 2 or 3 with reranking
works pretty well imo
It exceed with 2 also for some queries
what llm are you using? What is your chunk size?
sounds kinda wacky tbh
Gpt 3.5 turbo. Chunk size 2000
2000 chunk size is not really optimal no? usually 512 or 1024 give best results. Especially if you add reranking
I've tried those. With that chunk size I get poor results. It's html files scrapes from a site
If you use gpt-3.5-turbo-0125, it also has a 16k context window instead of 4k
so theres options
But did you try reranking?
Not that rich mate๐Ÿ˜…
I'll try and let you know.
Pricing also got reduced by 50% today
Attachment
image.png
just a heads up
Hey @Logan M
Re ranking works fine. Thanks for the help. But I don't want to re rank the documents if the context window is not being exceeded. For that will I need custom retriever, engine etc etc?
Hmm yea you'd need a custom retriever then in that case ๐Ÿ‘€
Add a reply
Sign up and join the conversation on Discord