Find answers from the community

Updated last year

hmm iam working on a project, where iam

At a glance

hmm iam working on a project, where iam very limited in GPU/CPU resources, and i want to limit my LLM calls as much as possible, focusing more on retreival.
how can i pass my retrivals to the LLM without usig the response_synthesizer wich calls the LLM several times depending on the retrived reults.

3 comments

ffabian

I did this using nodes, the line response = llm.complete, takes in a prompt, which should contain your retrieved infomration

Plain Text

def generate_response(retrieved_nodes, query_str,patient_information, qa_prompt, llm):
    context_str = "\n\n".join([r.get_content() for r in retrieved_nodes])
    fmt_qa_prompt = qa_prompt.format(context_str=context_str, query_str=query_str,information=information)
    response = llm.complete(fmt_qa_prompt)
    return str(response), fmt_qa_prompt

hhansson0728

Ty will try this out, any ideas on node compression ? While we are at the topic 😉

hhansson0728

any idea on how i add the system prompt here ?

Add a reply