Hi can anyone help me solve the below

At a glance

Hi can anyone help me solve the below error
openai.error.InvalidRequestError: This model's maximum context length is 4097 tokens. However, your messages resulted in 36614 tokens. Please reduce the length of the messages.

21 comments

LLogan M

What did you do to get this message? generally this shouldn't happen unless something wacky is going on lol

llearn.ai

I have 100 invoices , i indexed then in a single index , and when i try to chat with it, it gave me this error

LLogan M

can you share the code though?

llearn.ai

Sure

llearn.ai

def load_index(collection, user_query, index_id, doc_no, from_ws=False):
memory = ChatMemoryBuffer.from_defaults(token_limit=1500)

chroma_client = get_chroma_client()
collection = chroma_client.get_collection(collection)
load_storage_context = StorageContext.from_defaults(
vector_store=ChromaVectorStore(chroma_collection=collection),
index_store=get_index_store(),
)

llm_predictor = LLMPredictor(llm=ChatOpenAI(
temperature=0.2, max_tokens=512, model_name='gpt-3.5-turbo'))

load_service_context = ServiceContext.from_defaults(
llm_predictor=llm_predictor)
timea = time.time()

load_indexs = load_index_from_storage(service_context=load_service_context,
storage_context=load_storage_context, index_id=index_id)
timeb = time.time()
print("3:", timeb-timea)

query = load_indexs.as_chat_engine(chat_mode="context",
memory=memory, similarity_top_k=doc_no*5, verbose=True)
if from_ws:

return query

return res

LLogan M

similarity_top_k=doc_no*5

So, the context chat engine doesn't do anything special to reduce token usage. It retrieves the top_k, inserts it into the system prompt, and sends a question

This top_k value is likely waaaay too big

LLogan M

a normal query engine would work with that by making multiple LLM calls to refine a response. And you could use that query engine in an agent instead to get chat history if you need the top-k to be that big

llearn.ai

So I will use OpenAI agent and QueryEngineTool and query engine

LLogan M

Yea, exactly, might be better 🙂

llearn.ai

index = get_indexes(collection).as_query_engine(
                similarity_top_k=10)


            query_engine_tool = QueryEngineTool(
                query_engine=index,
                metadata=ToolMetadata(
                    name=collection.file_name,
                    description=collection.description +
                    "Use a detailed plain text question as input to the tool.",
                ),
            )
            indices.append(query_engine_tool)
            timed = time.time()
            print("getting index", timed-timec)

    timee = time.time()
    print("getting all indices", timee-timeb)
   

    return indices

OpenAIAgent.from_tools(
                self.index, verbose=True)

        # print("here")
        res = str(self.agent.chat(message, chat_history=self.messages))

llearn.ai

update code will look like this

LLogan M

uhhh maybe, as long as self.index is a list of query engine tools, it should work

llearn.ai

It is working but sometimes desired output is not getting

llearn.ai

I think i need to finetune it

LLogan M

you may have to configure either the tool description a bit more, or give a system prompt to help guide purpose

OpenAIAgent.from_tools(..., system_prompt="....")

llearn.ai

Thanks, I will test and update you

llearn.ai

Hi , I am creating a business analyst that uses excel sheet data , do you think rag model will work for it? I doubt because data set is bigger and similarity k is smaller, what architecture should i follow

LLogan M

You'll probably want to use text2sql, especially for highly numerical data
https://gpt-index.readthedocs.io/en/latest/end_to_end_tutorials/structured_data/sql_guide.html

llearn.ai

I want to create a chat engine and that should be able to generate charts also

llearn.ai

I want to use langchain python repl tool how do i integrate it with llama index

llearn.ai

@Logan M

Add a reply

Find answers from the community

Hi can anyone help me solve the below