Find answers from the community

Updated 2 years ago

Hi Logan M Since all llama index

At a glance

The community member is building a cost tracker for their Llama Index application, which uses the OpenAI API and incurs costs. They have questions about the accuracy of the TokenCountingHandler, which they believe provides the exact number of tokens used in a query. They want to know if the TokenCountingHandler also tracks the tokens used during index creation, retrieval, embeddings, and other processes. Another community member confirms that the token counting is accurate in the latest versions of Llama Index, but the community member notes that the token counts reported by Llama Index differ from the OpenAI API usage dashboard, which they find to be less granular. The community members discuss the differences in the order of hundreds for prompt and completion tokens, and thousands for embedding tokens, and suggest that this could be an issue for applications with multiple users, though it can be addressed with a threshold.

Hi @Logan M . Since all llama index application involve usage of the OpenAI api key and therefore, it costs money to develop this kind of applications, I want to build a cost tracker. I've seen in the documentation that the TokenCountingHandler gives the exact number of tokens used in a query. A few days ago, it wasn't exact, it was an estimation. Am I right?

Also, I have this questions related to cost tracking:
  1. During the creation of an Index, this TokenCountingHandler computes the number of tokens used for this index creation? (TreeIndex for example).
  2. If we use the 'tree-summarize' option to obtain a response, does the TokenCountingHandler also takes into account the api calls made during this process?
In the end, what I want to know is if this TokenCountingHandler keeps track of all the OpenAI API calls during a query --> index creation, retrieval, embeddings, node post process, response synthesizer, etc.

Thanks in advanced!
S
L
5 comments
Also, the token_counter.total_embedding_token_count gives always an amount of 0?
I just ran through the token counting demo notebook -- seems to work fine for me. What version of llama-index do you have?

In the latest versions of llama-index, the token counter sees EVERY llm input and output, and every embedding input. The token counting is accurate
Hi @Logan M . Thank you for your fast response 😄 .

The thing is that looking at the OpenAI API, Usage section: the number of tokens Llama Index provides (via Callback tiktoken) is different from the number of tokens that appear on the Usage section of OpenAI API. Why is that?
OpenAI's dashboard is not very... granular? It likes to group multiple requests into single counts, which is a little annoying

How different is it?
A in the order of hundreds in the prompt and completion tokens, and in the order of thousands in the embedding tokens. It is not a lot, I know, but maybe in the long run, with multiple users of some application it could be more expensive that it should be. Nevertheless, this can be solved with some threshold so it is not a very complicated issue. Thank you very much Logan!
Add a reply
Sign up and join the conversation on Discord