from llama_index import GPTVectorStoreIndex, SimpleDire...

At a glance

The community member is using the llama_index library to load a PDF document, create a custom summary prompt, and configure a query engine. However, the response time for the query is taking 20-30 minutes, which the community member wants to reduce to 2-3 minutes.

The comments suggest several ways to improve the response time, such as using a smaller language model, reducing the context window, enabling streaming, and using a simpler response mode like "retrieve" instead of a more complex mode like "tree_summarize". The comments note that the actual impact on response time may vary depending on the specific circumstances.

FFtm

from llama_index import GPTVectorStoreIndex, SimpleDirectoryReader, SummaryPrompt
from llama_index import Prompt, ListIndex, SimpleDirectoryReader
Define your custom prompt
SUMMARY_PROMPT_TMPL = (
"We have provided context information below. \n"
"---------------------\n"
"{context_str}"
"\n---------------------\n"
"Given this information, please answer the question: {query_str}\n"
)
Load your documents
documents = SimpleDirectoryReader(input_files=["/content/Example.pdf"]).load_data()

SUMMARY_PROMPT = SummaryPrompt(SUMMARY_PROMPT_TMPL)

Use the custom prompt with TreeIndex
index_with_query = GPTVectorStoreIndex(documents, summary_template=SUMMARY_PROMPT)

Configure the query engine with the custom prompt
query_engine = index_with_query.as_query_engine(text_qa_template=SUMMARY_PROMPT)
response = query_engine.query("based on the given context what is the difference btween Bird and technical documentation?")
print(response)

the reponse is taking 20-30 mins to be generated eventhough the document is only 3 pages. what to do to make the response faster

7 comments

kkapa.ai

Please stand by @Ftm as I search the knowledge base. I normally take about 20 seconds or so. :wave:

kkapa.ai

----------------------
The response time can be influenced by several factors, including the complexity of the document, the size of the index, and the specific configuration of the query engine. Here are a few suggestions to improve the response time:

Use a smaller model: If you're using a large language model like gpt-3.5-turbo, you might want to consider using a smaller one. Smaller models usually generate responses faster, although the quality of the responses might be slightly lower.

Reduce the context window: If you're using a large context window, reducing it might speed up the response time. However, this might also affect the quality of the responses, especially if the relevant information is located outside the reduced context window.

Use streaming: If you're not already using it, consider enabling streaming. This allows the response to be generated in chunks, which might speed up the overall response time. Here's how you can enable streaming:

Plain Text

query_engine = index.as_query_engine(streaming=True)
response = query_engine.query("What did the author do growing up?")
response.print_response_stream()

Source ()

Use a different response mode: If you're using a complex response mode like tree_summarize, consider using a simpler one like retrieve. Simpler response modes usually generate responses faster, although the quality of the responses might be slightly lower.

Plain Text

query_engine = index.as_query_engine(response_mode='retrieve')
response = query_engine.query("What did the author do growing up?")
print(response)

Source ()

kkapa.ai

Please note that these are general suggestions and the actual impact on response time might vary depending on the specific circumstances.
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------

LLogan M

Are you using openai? Or is it defaulting to llama.cpp ?

FFtm

@kapa.ai

from llama_index import GPTVectorStoreIndex, SimpleDirectoryReader, SummaryPrompt
from llama_index import Prompt, ListIndex, SimpleDirectoryReader

Define your custom prompt

SUMMARY_PROMPT_TMPL = (
"We have provided context information below. \n"
"---------------------\n"
"{context_str}"
"\n---------------------\n"
"Given this information, please answer the question: {query_str}\n"
)

Load your documents

documents = SimpleDirectoryReader(input_files=["/content/Example.pdf"]).load_data()

SUMMARY_PROMPT = SummaryPrompt(SUMMARY_PROMPT_TMPL)

Use the custom prompt with TreeIndex

index_with_query = GPTVectorStoreIndex(documents, summary_template=SUMMARY_PROMPT)

Configure the query engine with the custom prompt

query_engine = index_with_query.as_query_engine(text_qa_template=SUMMARY_PROMPT)
query_engine = index_with_query.as_query_engine(streaming=True)
response = query_engine.query("What did the author do growing up?")
response.print_response_stream()

the response is taking 20 mins to be shown how to reduce the response time to 2-3 mins

kkapa.ai

Let me check the knowledge base for you. One moment! :books:

kkapa.ai

Use a smaller model: If you're using a large language model like gpt-3.5-turbo, you might want to consider using a smaller one. Smaller models usually generate responses faster, although the quality of the responses might be slightly lower.

Reduce the context window: If you're using a large context window, reducing it might speed up the response time. However, this might also affect the quality of the responses, especially if the relevant information is located outside the reduced context window.

Use a different response mode: If you're using a complex response mode like tree_summarize, consider using a simpler one like retrieve. Simpler response modes usually generate responses faster, although the quality of the responses might be slightly lower.

Plain Text

query_engine = index_with_query.as_query_engine(response_mode='retrieve')
response = query_engine.query("What did the author do growing up?")
print(response)

Source ()

Please note that these are general suggestions and the actual impact on response time might vary depending on the specific circumstances.
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------

Add a reply

Find answers from the community

from llama_index import GPTVectorStoreIndex, SimpleDirectoryReader, SummaryPromptfrom lla

Define your custom prompt

Load your documents

Use the custom prompt with TreeIndex

Configure the query engine with the custom prompt