Find answers from the community

Updated last year

I m getting high prompt token counts

I'm getting high prompt token counts even for small prompts. for example just passing the word "hello." is giving a token count of over 1000 tokens. surely the token count should be 2 or 3? I'm using the token counting handler from llama index.
W
J
L
4 comments
Are you using Simple context engine ? If not then it must be passing some similar records to LLM from your indexed documents. That is why token size is getting higher.
I'm using the standard query engine. is there a way to reduce the token count this way?
I'm assuming hello may not be part of your documents. So queries like greeting and conversations beyond the scope of documents will not be a good idea in here.

One way to reduce token count is to introduce similarity postprocessor: https://gpt-index.readthedocs.io/en/stable/core_modules/query_modules/node_postprocessors/usage_pattern.html#using-with-a-query-engine

But this could also make LLM to not generate anything in some cases.
yea @James_ws querying with "hello" is still going to pull relevant documents from the index. If you are trying to add normal chat interactions, you should probably use an agent
https://gpt-index.readthedocs.io/en/stable/examples/agent/openai_agent_with_query_engine.html
Add a reply
Sign up and join the conversation on Discord