last_token_usage
attribute of the LLM predictor. Here's an example based on the information extracts:from llama_index import ( GPTSimpleVectorIndex, MockLLMPredictor, MockEmbedding, SimpleDirectoryReader, ServiceContext ) documents = SimpleDirectoryReader('../paul_graham_essay/data').load_data() index = GPTSimpleVectorIndex.load_from_disk('../paul_graham_essay/index_simple_vec.json') # specify both a MockLLMPredictor as well as MockEmbedding llm_predictor = MockLLMPredictor(max_tokens=256) embed_model = MockEmbedding(embed_dim=1536) service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, embed_model=embed_model) response = index.query( "What did the author do after his time at Y Combinator?", service_context=service_context ) # Store the embedding token usage into a variable embedding_token_usage = index.llm_predictor.last_token_usage
embedding_token_usage
variable will store the total embedding token usage. Note that this example uses MockLLMPredictor
and MockEmbedding
for demonstration purposes. You can replace them with actual LLM predictors and embedding models as needed.Note: you can ask me a follow up question by @ mentioning me again
:speech_balloon:last_token_usage
attribute of the MockEmbedding
object instead of the llm_predictor
. Here's the corrected example:from llama_index import ( GPTSimpleVectorIndex, MockLLMPredictor, MockEmbedding, SimpleDirectoryReader, ServiceContext ) documents = SimpleDirectoryReader('../paul_graham_essay/data').load_data() index = GPTSimpleVectorIndex.load_from_disk('../paul_graham_essay/index_simple_vec.json') # specify both a MockLLMPredictor as well as MockEmbedding llm_predictor = MockLLMPredictor(max_tokens=256) embed_model = MockEmbedding(embed_dim=1536) service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, embed_model=embed_model) response = index.query( "What did the author do after his time at Y Combinator?", service_context=service_context ) # Store the embedding token usage into a variable embedding_token_usage = embed_model.last_token_usage
embedding_token_usage
variable will store the total embedding token usage. Note that this example uses MockLLMPredictor
and MockEmbedding
for demonstration purposes. You can replace them with actual LLM predictors and embedding models as needed.Note: you can ask me a follow up question by @ mentioning me again
:speech_balloon:GPTSimpleVectorIndex
is being created using an LLMPredictor
with the ChatOpenAI
model, specifically the "gpt-3.5-turbo" model. However, the code snippet does not mention the embedding model being used.ServiceContext
. For example, if you want to use the "ada-002" model for embeddings, you can use the LangchainEmbedding
class with the appropriate embedding model from Langchain.GPTSimpleVectorIndex
with a specific embedding model:from gpt_index import ( GPTSimpleVectorIndex, LLMPredictor, ChatOpenAI, LangchainEmbedding, ServiceContext ) from langchain.embeddings import HuggingFaceEmbeddings # Create the LLMPredictor with the gpt-3.5-turbo model llm_predictor = LLMPredictor(llm=ChatOpenAI(temperature=0, model_name="gpt-3.5-turbo", max_tokens=num_outputs)) # Create the LangchainEmbedding with the ada-002 model embed_model = LangchainEmbedding(HuggingFaceEmbeddings(model_name="ada-002")) # Create the ServiceContext with the specified embedding model service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, embed_model=embed_model) # Create the GPTSimpleVectorIndex with the specified LLMPredictor and ServiceContext index = GPTSimpleVectorIndex(documents, llm_predictor=llm_predictor, prompt_helper=prompt_helper, service_context=service_context)
ServiceContext
is created with the specified embedding model and passed to the GPTSimpleVectorIndex
.Note: you can ask me a follow up question by @ mentioning me again
:speech_balloon: