hi guys one question i am building a

At a glance

hi guys one question, i am building a query composed graph engine but it take a lot of time to respond (20 seconds) it is possible to reduce that time? what are those best pratices? i will provide the code:

24 comments

LLogan M

Don't use a graph. Use a router, retriever router, or an ensemble retriever

https://gpt-index.readthedocs.io/en/stable/examples/query_engine/RouterQueryEngine.html

https://gpt-index.readthedocs.io/en/stable/examples/query_engine/RetrieverRouterQueryEngine.html

https://gpt-index.readthedocs.io/en/stable/examples/retrievers/ensemble_retrieval.html

ssl33p

thanks logan for the help

ssl33p

what of these 3 you suggest to my case?

ssl33p

i read also about agents

LLogan M

router -- uses the LLM to decide which sub-index to send the query to

retriever router -- uses embeddings to decide which sub-index to send the query to

ensemble retriever -- combines the retrieved nodes from all sub-indexes to write an answer

ssl33p

thanks, i think the ensemle is my way because the index are not so different (are the same data but store in two different sources, one in mongo and some documents)

ssl33p

so maybe the answer must not be in only 1 index but in different indexes

LLogan M

That sounds right 👍

ssl33p

really thanks for the help, for now i have a demo for my company in the next days i will try implement ensable for optimize the all process

ssl33p

have a nice day

LLogan M

good luck! :dotsCATJAM:

ssl33p

https://tenor.com/view/golden-retriever-backseat-sus-evil-gif-24360971

ssl33p

sorry, one more question i don't provide

ssl33p

i previous use the graph for condensechat:

ssl33p

Plain Text

from config import load_envs

load_envs()

from flask import Flask, request, jsonify
from index_manager import initialize_index, get_service_context
from flask_cors import CORS, cross_origin
from llama_index.prompts  import PromptTemplate
from llama_index.chat_engine import CondenseQuestionChatEngine
from chat_history_parser import retrieve_chat_history
from mongodb.db import insert_message_in_chat

import os

app = Flask(__name__)

cors = CORS(app)

app.config['CORS_HEADERS'] = 'Content-Type'

query_engine = initialize_index()

custom_prompt = PromptTemplate("""\
    Given a conversation (between Human and Assistant), a context, a history, and a follow up message from Human, \
    rewrite the message to be a standalone question that captures all relevant context \
    from the conversation, Always Reply in italian, not provide response outside the context or the chat history.

    <Chat History> 
    {chat_history}

    <Follow Up Message>
    {question}

    <Standalone question>
    """)

@app.route("/chat/<chatId>/answer", methods=["GET"])
def query_index(chatId):

  query_text = request.args.get("text")

  if query_text is None:
    return "No text found, please include a ?text=example parameter in the URL", 400

  service_context = get_service_context()

  history = retrieve_chat_history(chatId)

  chat_engine = CondenseQuestionChatEngine.from_defaults(
    query_engine=query_engine, 
    condense_question_prompt=custom_prompt,
    chat_history=history,
    service_context=service_context,
    verbose=True
  )

  response = chat_engine.chat(query_text)
  
  insert_message_in_chat(chatId, query_text, 'user')
  insert_message_in_chat(chatId, str(response), 'assistant')

  return jsonify(response = str(response)), 200

if __name__ == "__main__":
    app.config['MONGO_URI'] = os.getenv("MONGODB_URI")

    app.run(host="0.0.0.0", port=5601)

ssl33p

is possible to archive the same result with ensemble retriever

ssl33p

(not the code but only the theory)

ssl33p

and if not what is the best pratice to have a multiIndexChat

LLogan M

Yea it's possible, just create the query engine with ensemble retriever and pass it into CondenseQuestionChatEngine

ssl33p

really thanks again, and the performance (response time) will reduce?

ssl33p

(now with CondenseGraph it take 20 seconds)