----------------------
To count the tokens from a response in LlamaIndex, you can use the
TokenCountingHandler
callback. Here's an example of how you can set it up and use it:
import tiktoken
from llama_index.llms import Anthropic
from llama_index import (
SimpleDirectoryReader,
VectorStoreIndex,
ServiceContext,
set_global_service_context,
)
from llama_index.callbacks import CallbackManager, TokenCountingHandler
import os
os.environ["ANTHROPIC_API_KEY"] = "YOUR_API_KEY"
# Setup the callback and the service context
token_counter = TokenCountingHandler(
tokenizer=tiktoken.encoding_for_model("gpt-3.5-turbo").encode
)
callback_manager = CallbackManager([token_counter])
llm = Anthropic()
service_context = ServiceContext.from_defaults(
llm=llm, callback_manager=callback_manager, embed_model="local"
)
# Set the global default
set_global_service_context(service_context)
After setting up, you can count the tokens from a response as follows:
# Reset the counts
token_counter.reset_counts()
# Query
query_engine = index.as_query_engine(similarity_top_k=4)
response = query_engine.query("Your query here")
# Print the counts
print(
"Embedding Tokens: ",
token_counter.total_embedding_token_count,
"\n",
"LLM Prompt Tokens: ",
token_counter.prompt_llm_token_count,
"\n",
"LLM Completion Tokens: ",
token_counter.completion_llm_token_count,
"\n",
"Total LLM Token Count: ",
token_counter.total_llm_token_count,
"\n",
)
Please replace
"Your query here"
with your actual query.
This code will print the count of embedding tokens, LLM prompt tokens, LLM completion tokens, and the total LLM token count.