Opemai

At a glance

Is there a way to make sure that Llama index does not send a message to ChatGPT that is bigger than the context window allowed for a specific LLM?

For example, we have some queries that have a top_k_similarity thats high which results in a lot of data as context. This works fine for GPT-4 but exceeds the GPT-3-turbo context window.

24 comments

LLogan M

gpt-3.5-turbo-0125 is available with a 16K context window, and comes with that nice price decrease

(I'm assuming this is in a context chat engine?)

NNiels

Yeah but for now that one is not out yet right?

And yeah

It's out actually

👀

Really

(You might have to update llama-index)

NNiels

I thought they said two weeks

NNiels

Okay thats promising

LLogan M

https://platform.openai.com/docs/models/gpt-3-5-turbo

NNiels

Ahh we were using gpt-3.5-turbo-1106 before which has the same context window so this problem will remain\

NNiels

Either way, is it possible to limit the amount of tokens sent to openai?

LLogan M

Are you using a context chat engine? Or what's the setup right now?

NNiels

def initialize_chat_engine(index: VectorStoreIndex, document_uuid: str) -> BaseChatEngine:
filters = MetadataFilters(
filters=[ExactMatchFilter(key="doc_id", value=document_uuid)],
)

return index.as_chat_engine(
chat_mode=ChatMode.CONTEXT,
condense_question_prompt=PromptTemplate(CHAT_PROMPT_TEMPLATE),
chat_history=chat_history,
agent_chat_response_mode="StreamingAgentChatResponse",
similarity_top_k=20,
filters=filters,
)

NNiels

(top_k = 20 is the problem)

LLogan M

Yeaaaaa

Tbh the only option is lowering the top k

My advice would be to keep the top k high, and maybe add reranking to filter it down?

NNiels

Ok! We are working on a solution now to detect very broad summary queries that forced us to use top_k of 20 in the first place and add relevant document keywords to the query instead of using such a high top k 🙂

NNiels

Because if i am correct a lower top_k with a more detailed query that includes keywords to include is probably better, right?

LLogan M

Maybe. Hard to say. Reranking is quite powerful as well

NNiels

Is that something that's easy to implement?

NNiels

As far as I understand it. Top k only refers to the amount of chunks that are similar to the input query that are added to the OpenAI call in our case. If a user says something like "Summarize this for me", we will fetch similar chunks to that specific query, which might not be good since the keywords of that query do not say which parts of the document are relevant to the query.....

LLogan M

It's as easy as inputting node_postprocessors=[reranker] in the kwargs for the query engine

LLogan M

I agree. Something like this is what an agent or router is good for (I.e. to I route to a vector index for top-k, or a summary index/custom query engine that fetches everything)

NNiels

Damn so many options to choose from. We dont even use a query engine but do a chat engine directly from a vector store index

Add a reply

Find answers from the community

Opemai