index = GPTPineconeIndex( [], pinecone_index=self.pinecone_index, namespace=organisation, ) response = index.query( query_str="What is difference between sparse and dense vectors?", similarity_top_k=3, text_qa_template=load_chat_prompt(), service_context=service_context, optimizer=SentenceEmbeddingOptimizer( percentile_cutoff=0.5, threshold_cutoff=0.7, ), )
optimizer.optimize(QueryBundle(query_str), source_text)
. You can fetch source_text from response.source_nodes
in the above response. This way you can see which specific sources the optimizer uses more tokensfrom gpt_index.optimization.optimizer import SentenceEmbeddingOptimizer import logging logger = logging.getLogger() logger.setLevel(logging.INFO) print("Without optimization") response = city_indices["Boston"].query( "Tell me about the arts and culture of Boston", service_context=service_context ) print(str(response)) print("With optimization") response = city_indices["Boston"].query( "Tell me about the arts and culture of Boston", service_context=service_context, optimizer=SentenceEmbeddingOptimizer(percentile_cutoff=0.5) ) print(str(response))
Without optimization INFO:gpt_index.token_counter.token_counter:> [query] Total LLM token usage: 4213 tokens > [query] Total LLM token usage: 4213 tokens INFO:gpt_index.token_counter.token_counter:> [query] Total embedding token usage: 9 tokens > [query] Total embedding token usage: 9 tokens With optimization INFO:gpt_index.optimization.optimizer:> [optimize] Total embedding token usage: 0 tokens > [optimize] Total embedding token usage: 0 tokens INFO:gpt_index.token_counter.token_counter:> [query] Total LLM token usage: 1940 tokens > [query] Total LLM token usage: 1940 tokens INFO:gpt_index.token_counter.token_counter:> [query] Total embedding token usage: 9 tokens > [query] Total embedding token usage: 9 tokens