Find answers from the community

Updated 2 months ago

Hi can anyone help me solve the below

Hi can anyone help me solve the below error
openai.error.InvalidRequestError: This model's maximum context length is 4097 tokens. However, your messages resulted in 36614 tokens. Please reduce the length of the messages.
L
l
21 comments
What did you do to get this message? generally this shouldn't happen unless something wacky is going on lol
I have 100 invoices , i indexed then in a single index , and when i try to chat with it, it gave me this error
can you share the code though?
def load_index(collection, user_query, index_id, doc_no, from_ws=False):
memory = ChatMemoryBuffer.from_defaults(token_limit=1500)

chroma_client = get_chroma_client()
collection = chroma_client.get_collection(collection)
load_storage_context = StorageContext.from_defaults(
vector_store=ChromaVectorStore(chroma_collection=collection),
index_store=get_index_store(),
)


llm_predictor = LLMPredictor(llm=ChatOpenAI(
temperature=0.2, max_tokens=512, model_name='gpt-3.5-turbo'))



load_service_context = ServiceContext.from_defaults(
llm_predictor=llm_predictor)
timea = time.time()

load_indexs = load_index_from_storage(service_context=load_service_context,
storage_context=load_storage_context, index_id=index_id)
timeb = time.time()
print("3:", timeb-timea)


query = load_indexs.as_chat_engine(chat_mode="context",
memory=memory, similarity_top_k=doc_no*5, verbose=True)
if from_ws:


return query


return res
similarity_top_k=doc_no*5

So, the context chat engine doesn't do anything special to reduce token usage. It retrieves the top_k, inserts it into the system prompt, and sends a question

This top_k value is likely waaaay too big
a normal query engine would work with that by making multiple LLM calls to refine a response. And you could use that query engine in an agent instead to get chat history if you need the top-k to be that big
So I will use OpenAI agent and QueryEngineTool and query engine
Yea, exactly, might be better πŸ™‚
index = get_indexes(collection).as_query_engine( similarity_top_k=10) query_engine_tool = QueryEngineTool( query_engine=index, metadata=ToolMetadata( name=collection.file_name, description=collection.description + "Use a detailed plain text question as input to the tool.", ), ) indices.append(query_engine_tool) timed = time.time() print("getting index", timed-timec) timee = time.time() print("getting all indices", timee-timeb) return indices


OpenAIAgent.from_tools( self.index, verbose=True) # print("here") res = str(self.agent.chat(message, chat_history=self.messages))
update code will look like this
uhhh maybe, as long as self.index is a list of query engine tools, it should work
It is working but sometimes desired output is not getting
I think i need to finetune it
you may have to configure either the tool description a bit more, or give a system prompt to help guide purpose

OpenAIAgent.from_tools(..., system_prompt="....")
Thanks, I will test and update you
Hi , I am creating a business analyst that uses excel sheet data , do you think rag model will work for it? I doubt because data set is bigger and similarity k is smaller, what architecture should i follow
You'll probably want to use text2sql, especially for highly numerical data
https://gpt-index.readthedocs.io/en/latest/end_to_end_tutorials/structured_data/sql_guide.html
I want to create a chat engine and that should be able to generate charts also
I want to use langchain python repl tool how do i integrate it with llama index
Add a reply
Sign up and join the conversation on Discord