I m getting high prompt token counts

At a glance

The community member is experiencing high token counts even for small prompts, such as the word "hello", which is resulting in over 1000 tokens. They are using the token counting handler from LLaMA Index. Other community members suggest that this may be due to the community member using the standard query engine, which could be passing similar records from the indexed documents to the language model, leading to the high token count. To reduce the token count, the community members recommend introducing a similarity postprocessor or using an agent for normal chat interactions.

Useful resources

JJames_ws

I'm getting high prompt token counts even for small prompts. for example just passing the word "hello." is giving a token count of over 1000 tokens. surely the token count should be 2 or 3? I'm using the token counting handler from llama index.

4 comments

WWhiteFang_Jr

Are you using Simple context engine ? If not then it must be passing some similar records to LLM from your indexed documents. That is why token size is getting higher.

JJames_ws

I'm using the standard query engine. is there a way to reduce the token count this way?

WWhiteFang_Jr

I'm assuming hello may not be part of your documents. So queries like greeting and conversations beyond the scope of documents will not be a good idea in here.

One way to reduce token count is to introduce similarity postprocessor: https://gpt-index.readthedocs.io/en/stable/core_modules/query_modules/node_postprocessors/usage_pattern.html#using-with-a-query-engine

But this could also make LLM to not generate anything in some cases.

LLogan M

yea @James_ws querying with "hello" is still going to pull relevant documents from the index. If you are trying to add normal chat interactions, you should probably use an agent
https://gpt-index.readthedocs.io/en/stable/examples/agent/openai_agent_with_query_engine.html

Add a reply

Find answers from the community

I m getting high prompt token counts