Hello everyone! I'm new to using llamaindex. I'm working with a vector index retriever to find similar items in the vector store. My project is about Q&A, and I noticed that the query embedding token is only 25, but the prompt token goes up to 1200. This is increasing the cost when using GPT 3.5. Any tips on how to optimize this would be greatly appreciated. Thanks!
Below are the code snippets.
# Initialize Service Context
service_context = ServiceContext.from_defaults(
llm=llm,
embed_model=embed_model,
chunk_size=256,
chunk_overlap=10,
callback_manager=callback_manager,
)
# configure retriever
retriever = VectorIndexRetriever(
index=index,
similarity_top_k=1,
)
# configure response synthesizer
response_synthesizer = get_response_synthesizer(
streaming=False,
response_mode=ResponseMode.COMPACT,
# verbose=True,
)
# assemble query engine
query_engine = RetrieverQueryEngine(
retriever=retriever,
response_synthesizer=response_synthesizer,
node_postprocessors=[SimilarityPostprocessor(similarity_cutoff=0.7)],
)
return query_engine