Find answers from the community

Updated last year

hi guys one question i am building a

hi guys one question, i am building a query composed graph engine but it take a lot of time to respond (20 seconds) it is possible to reduce that time? what are those best pratices? i will provide the code:
L
s
24 comments
thanks logan for the help
what of these 3 you suggest to my case?
i read also about agents
router -- uses the LLM to decide which sub-index to send the query to

retriever router -- uses embeddings to decide which sub-index to send the query to

ensemble retriever -- combines the retrieved nodes from all sub-indexes to write an answer
thanks, i think the ensemle is my way because the index are not so different (are the same data but store in two different sources, one in mongo and some documents)
so maybe the answer must not be in only 1 index but in different indexes
That sounds right πŸ‘
really thanks for the help, for now i have a demo for my company in the next days i will try implement ensable for optimize the all process
have a nice day
good luck! :dotsCATJAM:
sorry, one more question i don't provide
i previous use the graph for condensechat:
Plain Text
from config import load_envs

load_envs()

from flask import Flask, request, jsonify
from index_manager import initialize_index, get_service_context
from flask_cors import CORS, cross_origin
from llama_index.prompts  import PromptTemplate
from llama_index.chat_engine import CondenseQuestionChatEngine
from chat_history_parser import retrieve_chat_history
from mongodb.db import insert_message_in_chat

import os

app = Flask(__name__)

cors = CORS(app)

app.config['CORS_HEADERS'] = 'Content-Type'

query_engine = initialize_index()

custom_prompt = PromptTemplate("""\
    Given a conversation (between Human and Assistant), a context, a history, and a follow up message from Human, \
    rewrite the message to be a standalone question that captures all relevant context \
    from the conversation, Always Reply in italian, not provide response outside the context or the chat history.

    <Chat History> 
    {chat_history}

    <Follow Up Message>
    {question}

    <Standalone question>
    """)

@app.route("/chat/<chatId>/answer", methods=["GET"])
def query_index(chatId):

  query_text = request.args.get("text")

  if query_text is None:
    return "No text found, please include a ?text=example parameter in the URL", 400

  service_context = get_service_context()

  history = retrieve_chat_history(chatId)

  chat_engine = CondenseQuestionChatEngine.from_defaults(
    query_engine=query_engine, 
    condense_question_prompt=custom_prompt,
    chat_history=history,
    service_context=service_context,
    verbose=True
  )

  response = chat_engine.chat(query_text)
  
  insert_message_in_chat(chatId, query_text, 'user')
  insert_message_in_chat(chatId, str(response), 'assistant')

  return jsonify(response = str(response)), 200

if __name__ == "__main__":
    app.config['MONGO_URI'] = os.getenv("MONGODB_URI")

    app.run(host="0.0.0.0", port=5601)
is possible to archive the same result with ensemble retriever
(not the code but only the theory)
and if not what is the best pratice to have a multiIndexChat
Yea it's possible, just create the query engine with ensemble retriever and pass it into CondenseQuestionChatEngine
really thanks again, and the performance (response time) will reduce?
(now with CondenseGraph it take 20 seconds)
I shouuuuld be slightly faster yea
Add a reply
Sign up and join the conversation on Discord