Hello, I've had a lot of success using Llama-Index in a RAG context, but now im trying to reuse some of my RAG code to build a simple non-RAG tool to send questions directly to a locally running instance of the new CodeLlama-Instruct model.
So, I do not actually have any underlying index that I want to pull context from, but im trying to reuse the some of my code that uses
index.as_query_engine
but it seems that Im running into issues with an empty index.
I feel like there's a better way to do this, but im a bit stuck. Here's a snippet of my code right now:
index = VectorStoreIndex([])
@app.server.route("/code-llama/streaming-chat", methods=["POST"])
def streaming_chat():
user_prompt = request.json["prompt"]
user_question = request.json["question"]
# Create a system message
user_prompt = ChatMessage(role=MessageRole.USER, content=user_prompt)
text_qa_template = ChatPromptTemplate(message_templates=[user_prompt])
query_engine = index.as_query_engine(streaming=True, text_qa_template=text_qa_template)
def response_stream():
yield from (line for line in query_engine.query(user_question).response_gen)
return Response(response_stream(), mimetype="text/response-stream")
At
query_engine.query(user_question).response_gen
I am getting a
AttributeError: 'Response' object has no attribute 'response_gen'