The community member is experiencing high token counts even for small prompts, such as the word "hello", which is resulting in over 1000 tokens. They are using the token counting handler from LLaMA Index. Other community members suggest that this may be due to the community member using the standard query engine, which could be passing similar records from the indexed documents to the language model, leading to the high token count. To reduce the token count, the community members recommend introducing a similarity postprocessor or using an agent for normal chat interactions.
I'm getting high prompt token counts even for small prompts. for example just passing the word "hello." is giving a token count of over 1000 tokens. surely the token count should be 2 or 3? I'm using the token counting handler from llama index.
Are you using Simple context engine ? If not then it must be passing some similar records to LLM from your indexed documents. That is why token size is getting higher.
I'm assuming hello may not be part of your documents. So queries like greeting and conversations beyond the scope of documents will not be a good idea in here.