when creating new index, I got terminal info like this:...

At a glance

when creating new index, I got terminal info like this: INFO:llama_index.token_counter.token_counter:> [build_index_from_documents] Total LLM token usage: 0 tokens
INFO:llama_index.token_counter.token_counter:> [build_index_from_documents] Total embedding token usage: 212901 tokens my question is: how to store the embedding token usage into a variable?

10 comments

kkapa.ai

Please stand by @V as I search the knowledge base. We are experiencing especially high traffic right now and might take longer than usual. :wave:

kkapa.ai

----------------------
You can store the embedding token usage into a variable by accessing the last_token_usage attribute of the LLM predictor. Here's an example based on the information extracts:

Plain Text

from llama_index import (
    GPTSimpleVectorIndex, 
    MockLLMPredictor, 
    MockEmbedding, 
    SimpleDirectoryReader,
    ServiceContext
)

documents = SimpleDirectoryReader('../paul_graham_essay/data').load_data()
index = GPTSimpleVectorIndex.load_from_disk('../paul_graham_essay/index_simple_vec.json')

# specify both a MockLLMPredictor as well as MockEmbedding
llm_predictor = MockLLMPredictor(max_tokens=256)
embed_model = MockEmbedding(embed_dim=1536)
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, embed_model=embed_model)

response = index.query(
    "What did the author do after his time at Y Combinator?",
    service_context=service_context
)

# Store the embedding token usage into a variable
embedding_token_usage = index.llm_predictor.last_token_usage

In this example, the embedding_token_usage variable will store the total embedding token usage. Note that this example uses MockLLMPredictor and MockEmbedding for demonstration purposes. You can replace them with actual LLM predictors and embedding models as needed.

Source: Cost Analysis - Token Predictor Usage (https://gpt-index.readthedocs.io/en/latest/how_to/analysis/cost_analysis.html)
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------

no, it doesn't work. index.llm_predictor.last_token_usage returns 0

@kapa.ai llm_predictor.last_token_usage returns 0

kkapa.ai

Let me check the knowledge base for you. One moment! :books:

kkapa.ai

----------------------
I apologize for the confusion. It seems I misunderstood your question earlier. If you want to store the embedding token usage into a variable, you should use the last_token_usage attribute of the MockEmbedding object instead of the llm_predictor. Here's the corrected example:

Plain Text

from llama_index import (
    GPTSimpleVectorIndex, 
    MockLLMPredictor, 
    MockEmbedding, 
    SimpleDirectoryReader,
    ServiceContext
)

documents = SimpleDirectoryReader('../paul_graham_essay/data').load_data()
index = GPTSimpleVectorIndex.load_from_disk('../paul_graham_essay/index_simple_vec.json')

# specify both a MockLLMPredictor as well as MockEmbedding
llm_predictor = MockLLMPredictor(max_tokens=256)
embed_model = MockEmbedding(embed_dim=1536)
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, embed_model=embed_model)

response = index.query(
    "What did the author do after his time at Y Combinator?",
    service_context=service_context
)

# Store the embedding token usage into a variable
embedding_token_usage = embed_model.last_token_usage

@kapa.ai during create the index, the code is: """ llm_predictor = LLMPredictor(llm=ChatOpenAI(temperature=0,model_name="gpt-3.5-turbo", max_tokens=num_outputs))
index = GPTSimpleVectorIndex(
documents, llm_predictor=llm_predictor, prompt_helper=prompt_helper
) """ my question is: what is the model used for embedding? gpt-3.5-turbo or ada-002 ? why gpt-3.5-turbo is declared here?

kkapa.ai

Let me check the knowledge base for you. One moment! :books:

kkapa.ai

----------------------
Based on the provided code snippet, it seems that the GPTSimpleVectorIndex is being created using an LLMPredictor with the ChatOpenAI model, specifically the "gpt-3.5-turbo" model. However, the code snippet does not mention the embedding model being used.

To use an embedding model, you need to create an instance of the desired embedding class and pass it to the ServiceContext. For example, if you want to use the "ada-002" model for embeddings, you can use the LangchainEmbedding class with the appropriate embedding model from Langchain.

Here's an example of how to create a GPTSimpleVectorIndex with a specific embedding model:

Plain Text

from gpt_index import (
    GPTSimpleVectorIndex,
    LLMPredictor,
    ChatOpenAI,
    LangchainEmbedding,
    ServiceContext
)
from langchain.embeddings import HuggingFaceEmbeddings

# Create the LLMPredictor with the gpt-3.5-turbo model
llm_predictor = LLMPredictor(llm=ChatOpenAI(temperature=0, model_name="gpt-3.5-turbo", max_tokens=num_outputs))

# Create the LangchainEmbedding with the ada-002 model
embed_model = LangchainEmbedding(HuggingFaceEmbeddings(model_name="ada-002"))

# Create the ServiceContext with the specified embedding model
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, embed_model=embed_model)

# Create the GPTSimpleVectorIndex with the specified LLMPredictor and ServiceContext
index = GPTSimpleVectorIndex(documents, llm_predictor=llm_predictor, prompt_helper=prompt_helper, service_context=service_context)

kkapa.ai

In this example, the "gpt-3.5-turbo" model is used for the LLM predictor, while the "ada-002" model is used for embeddings. The ServiceContext is created with the specified embedding model and passed to the GPTSimpleVectorIndex.
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------

Add a reply

Find answers from the community

when creating new index, I got terminal info like this: INFO:llama_index.token_counter.tok