I am calling a LLM Provider, via Langchain (CerebriumAI) and for my model endpoint I need to send a prompt and the top vector that would have been retrieved from the vector store. I have the following code snippet:
llm = CerebriumAI(endpoint_url="
https://run.cerebrium.ai/v2/p-ed25ab21/finetuned-lama/predict")
service_context = ServiceContext.from_defaults(llm=llm)
index = VectorStoreIndex.from_documents(all_docs, service_context=service_context)
query_engine = index.as_query_engine()
response = query_engine.query("Can you give me a code snippet of what deploying a hugging face model would look like?")
What I would like to send the Cerebrium endpoint is the following payload:
{
"prompt": "Can you give me a code snippet of what deploying a hugging face model would look like?"
"input": <Vector from vector store based on prompt"
}