Find answers from the community

Updated 10 months ago

hmm iam working on a project, where iam

hmm iam working on a project, where iam very limited in GPU/CPU resources, and i want to limit my LLM calls as much as possible, focusing more on retreival.
how can i pass my retrivals to the LLM without usig the response_synthesizer wich calls the LLM several times depending on the retrived reults.
f
h
3 comments
I did this using nodes, the line response = llm.complete, takes in a prompt, which should contain your retrieved infomration
Plain Text
def generate_response(retrieved_nodes, query_str,patient_information, qa_prompt, llm):
    context_str = "\n\n".join([r.get_content() for r in retrieved_nodes])
    fmt_qa_prompt = qa_prompt.format(context_str=context_str, query_str=query_str,information=information)
    response = llm.complete(fmt_qa_prompt)
    return str(response), fmt_qa_prompt
Ty will try this out, any ideas on node compression ? While we are at the topic πŸ˜‰
any idea on how i add the system prompt here ?
Add a reply
Sign up and join the conversation on Discord