Hi team llama We have an issue with the

Hi team llama! We have an issue with the tokencountpredictor. Specifically in an environment where we have integrated llamaindex in Fastapi. We use dependency injection (scoped) to create the context with tokencounter.
However when we use the aquery method on the index, and multiple requests are processed at the same time, the tokencounter is not working as expected. The tokencounter returns inconistent counts, where it seems to accumulate token counts from different requests and return them together in one.
E.g. we fire the same request 20 times, it will return for the first few request that zero tokens were used, and then suddenly for the 7th request that it used 15k tokens.
This seems strange to use since the instances of the tokencounter and context are passed into all the pipeline steps scoped with dependency injection.

Is this a familiar sound issue for you? If not I can provide a minimal reproduction and script to send multiple requests with the output.
Thanks in advance, we look out for a reply 🙂

Find answers from the community

Hi team llama We have an issue with the