Find answers from the community

Updated last year

Hi team llama We have an issue with the

Hi team llama! We have an issue with the tokencountpredictor. Specifically in an environment where we have integrated llamaindex in Fastapi. We use dependency injection (scoped) to create the context with tokencounter.
However when we use the aquery method on the index, and multiple requests are processed at the same time, the tokencounter is not working as expected. The tokencounter returns inconistent counts, where it seems to accumulate token counts from different requests and return them together in one.
E.g. we fire the same request 20 times, it will return for the first few request that zero tokens were used, and then suddenly for the 7th request that it used 15k tokens.
This seems strange to use since the instances of the tokencounter and context are passed into all the pipeline steps scoped with dependency injection.

Is this a familiar sound issue for you? If not I can provide a minimal reproduction and script to send multiple requests with the output.
Thanks in advance, we look out for a reply πŸ™‚
P
L
3 comments
@Logan M I saw you left a thinking face behind :p Did you have a chance to verify if this sounds familiar? Or do you need a minimal reproduction? We'd love to hear from you πŸ™‚
Yea I have no idea lol it's on my backlog to look at. But also feel free to tackle this in a PR if it's something you need urgently fixed πŸ˜…
Alright! Thanks for the reply, we'll see who gets to it first πŸ˜›
Add a reply
Sign up and join the conversation on Discord