Find answers from the community

Updated 12 months ago

Hello everyone! I'm new to using

Hello everyone! I'm new to using llamaindex. I'm working with a vector index retriever to find similar items in the vector store. My project is about Q&A, and I noticed that the query embedding token is only 25, but the prompt token goes up to 1200. This is increasing the cost when using GPT 3.5. Any tips on how to optimize this would be greatly appreciated. Thanks!

Below are the code snippets.

Plain Text
# Initialize Service Context
service_context = ServiceContext.from_defaults(
    llm=llm,
    embed_model=embed_model,
    chunk_size=256,
    chunk_overlap=10,
    callback_manager=callback_manager,
)


Plain Text
# configure retriever
retriever = VectorIndexRetriever(
    index=index,
    similarity_top_k=1,
)
# configure response synthesizer
response_synthesizer = get_response_synthesizer(
    streaming=False,
    response_mode=ResponseMode.COMPACT,
    # verbose=True,
)
# assemble query engine
query_engine = RetrieverQueryEngine(
    retriever=retriever,
    response_synthesizer=response_synthesizer,
    node_postprocessors=[SimilarityPostprocessor(similarity_cutoff=0.7)],
)
return query_engine
L
d
4 comments
The query is only one piece of the total input to the model.

There is also
a) the query prompt template
b) the retrieved data
The entire prompt can be found below.
Plain Text
QUERY_PROMPT_TEMPLATE_STR = """
Context information is below.
---------------------
{context_str}
---------------------
Act like an AI customer support executive 
handling escalations for HDFC Securities which is a stock trading company, 
a stock brokerage firm over text.
You have to respond to customer queries and complaints.
You have to be concise, polite and respectful in your responses.
If the customer is aggravated you have to try to de-escalate the situation as best as possible.
Your responses have to be meaningful and relevant to the information being provided to you.
Customer query: {query_str}
Customer support response: 
"""
Correct. So context_str is filled in by your retrieved nodes, and query_str is filled in by your query
you can check response.source_nodes to see what it grabbed
Add a reply
Sign up and join the conversation on Discord