Hello everyone! I'm new to using

At a glance

Hello everyone! I'm new to using llamaindex. I'm working with a vector index retriever to find similar items in the vector store. My project is about Q&A, and I noticed that the query embedding token is only 25, but the prompt token goes up to 1200. This is increasing the cost when using GPT 3.5. Any tips on how to optimize this would be greatly appreciated. Thanks!

Below are the code snippets.

Plain Text

# Initialize Service Context
service_context = ServiceContext.from_defaults(
    llm=llm,
    embed_model=embed_model,
    chunk_size=256,
    chunk_overlap=10,
    callback_manager=callback_manager,
)

Plain Text

# configure retriever
retriever = VectorIndexRetriever(
    index=index,
    similarity_top_k=1,
)
# configure response synthesizer
response_synthesizer = get_response_synthesizer(
    streaming=False,
    response_mode=ResponseMode.COMPACT,
    # verbose=True,
)
# assemble query engine
query_engine = RetrieverQueryEngine(
    retriever=retriever,
    response_synthesizer=response_synthesizer,
    node_postprocessors=[SimilarityPostprocessor(similarity_cutoff=0.7)],
)
return query_engine

4 comments

LLogan M

The query is only one piece of the total input to the model.

There is also
a) the query prompt template
b) the retrieved data

ddisco.dr

The entire prompt can be found below.

Plain Text

QUERY_PROMPT_TEMPLATE_STR = """
Context information is below.
---------------------
{context_str}
---------------------
Act like an AI customer support executive 
handling escalations for HDFC Securities which is a stock trading company, 
a stock brokerage firm over text.
You have to respond to customer queries and complaints.
You have to be concise, polite and respectful in your responses.
If the customer is aggravated you have to try to de-escalate the situation as best as possible.
Your responses have to be meaningful and relevant to the information being provided to you.
Customer query: {query_str}
Customer support response: 
"""

LLogan M

Correct. So context_str is filled in by your retrieved nodes, and query_str is filled in by your query

LLogan M

you can check response.source_nodes to see what it grabbed

Add a reply

Find answers from the community

Hello everyone! I'm new to using