Find answers from the community

Updated 2 months ago

from llama_index import GPTVectorStoreIndex, SimpleDirectoryReader, SummaryPromptfrom lla


from llama_index import GPTVectorStoreIndex, SimpleDirectoryReader, SummaryPrompt
from llama_index import Prompt, ListIndex, SimpleDirectoryReader
Define your custom prompt
SUMMARY_PROMPT_TMPL = (
"We have provided context information below. \n"
"---------------------\n"
"{context_str}"
"\n---------------------\n"
"Given this information, please answer the question: {query_str}\n"
)
Load your documents
documents = SimpleDirectoryReader(input_files=["/content/Example.pdf"]).load_data()

SUMMARY_PROMPT = SummaryPrompt(SUMMARY_PROMPT_TMPL)

Use the custom prompt with TreeIndex
index_with_query = GPTVectorStoreIndex(documents, summary_template=SUMMARY_PROMPT)

Configure the query engine with the custom prompt
query_engine = index_with_query.as_query_engine(text_qa_template=SUMMARY_PROMPT)
response = query_engine.query("based on the given context what is the difference btween Bird and technical documentation?")
print(response)

the reponse is taking 20-30 mins to be generated eventhough the document is only 3 pages. what to do to make the response faster
k
L
F
7 comments
Please stand by @Ftm as I search the knowledge base. I normally take about 20 seconds or so. :wave:
----------------------
The response time can be influenced by several factors, including the complexity of the document, the size of the index, and the specific configuration of the query engine. Here are a few suggestions to improve the response time:

  1. Use a smaller model: If you're using a large language model like gpt-3.5-turbo, you might want to consider using a smaller one. Smaller models usually generate responses faster, although the quality of the responses might be slightly lower.
  1. Reduce the context window: If you're using a large context window, reducing it might speed up the response time. However, this might also affect the quality of the responses, especially if the relevant information is located outside the reduced context window.
  1. Use streaming: If you're not already using it, consider enabling streaming. This allows the response to be generated in chunks, which might speed up the overall response time. Here's how you can enable streaming:
Plain Text
query_engine = index.as_query_engine(streaming=True)
response = query_engine.query("What did the author do growing up?")
response.print_response_stream()

Source ()

  1. Use a different response mode: If you're using a complex response mode like tree_summarize, consider using a simpler one like retrieve. Simpler response modes usually generate responses faster, although the quality of the responses might be slightly lower.
Plain Text
query_engine = index.as_query_engine(response_mode='retrieve')
response = query_engine.query("What did the author do growing up?")
print(response)

Source ()
Please note that these are general suggestions and the actual impact on response time might vary depending on the specific circumstances.
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------
Are you using openai? Or is it defaulting to llama.cpp ?
@kapa.ai

from llama_index import GPTVectorStoreIndex, SimpleDirectoryReader, SummaryPrompt
from llama_index import Prompt, ListIndex, SimpleDirectoryReader

Define your custom prompt

SUMMARY_PROMPT_TMPL = (
"We have provided context information below. \n"
"---------------------\n"
"{context_str}"
"\n---------------------\n"
"Given this information, please answer the question: {query_str}\n"
)

Load your documents

documents = SimpleDirectoryReader(input_files=["/content/Example.pdf"]).load_data()

SUMMARY_PROMPT = SummaryPrompt(SUMMARY_PROMPT_TMPL)

Use the custom prompt with TreeIndex

index_with_query = GPTVectorStoreIndex(documents, summary_template=SUMMARY_PROMPT)

Configure the query engine with the custom prompt

query_engine = index_with_query.as_query_engine(text_qa_template=SUMMARY_PROMPT)
query_engine = index_with_query.as_query_engine(streaming=True)
response = query_engine.query("What did the author do growing up?")
response.print_response_stream()

the response is taking 20 mins to be shown how to reduce the response time to 2-3 mins
Let me check the knowledge base for you. One moment! :books:
----------------------
The response time can be influenced by several factors, including the complexity of the document, the size of the index, and the specific configuration of the query engine. Here are a few suggestions to improve the response time:

  1. Use a smaller model: If you're using a large language model like gpt-3.5-turbo, you might want to consider using a smaller one. Smaller models usually generate responses faster, although the quality of the responses might be slightly lower.
  1. Reduce the context window: If you're using a large context window, reducing it might speed up the response time. However, this might also affect the quality of the responses, especially if the relevant information is located outside the reduced context window.
  1. Use a different response mode: If you're using a complex response mode like tree_summarize, consider using a simpler one like retrieve. Simpler response modes usually generate responses faster, although the quality of the responses might be slightly lower.
Plain Text
query_engine = index_with_query.as_query_engine(response_mode='retrieve')
response = query_engine.query("What did the author do growing up?")
print(response)

Source ()

Please note that these are general suggestions and the actual impact on response time might vary depending on the specific circumstances.
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------
Add a reply
Sign up and join the conversation on Discord