Find answers from the community

Updated 3 months ago

Howdy. Is there a way to lower response times and increase the loading speed of document s

Howdy. Is there a way to lower response times and increase the loading speed of document searches?
k
L
4 comments
Please stand by @Dan as I search the knowledge base. We are experiencing especially high traffic right now and might take longer than usual. :wave:
----------------------
Yes, LlamaIndex provides several ways to optimize response times and increase the loading speed of document searches.

  1. Token Usage Optimizers: A token usage optimizer refines the retrieved Nodes to reduce token usage during response synthesis. This can potentially speed up the response times. The optimize interface takes in the QueryBundle and a text chunk str, and outputs a refined text chunk str that yields a more optimized response. An example of this is the Sentence Embedding Optimizer (). source ()
  1. Response Synthesis: LlamaIndex offers different methods of synthesizing a response from relevant context. For instance, the 'compact' mode combines text chunks into larger consolidated chunks that more fully utilize the available context window, then refines answers across them. This mode is faster than 'refine' since it makes fewer calls to the LLM. source ()
  1. Persisting Data: By default, LlamaIndex stores data in-memory, and this data can be explicitly persisted if desired. This can speed up the loading of documents for searches. You can use the storage_context.persist(persist_dir="<persist_dir>") method to persist data to disk. source ()
  1. Customizing Storage: LlamaIndex supports a swappable storage layer that allows you to customize where ingested documents (i.e., Node objects), embedding vectors, and index metadata are stored. This can potentially increase the loading speed of document searches. source ()
Here's an example of how you can use LlamaIndex to query your data:

Plain Text
from llama_index import VectorStoreIndex, SimpleDirectoryReader

documents = SimpleDirectoryReader('data').load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
response = query_engine.query("Summarize the documents.")


Remember to replace 'data' with the path to your documents and "Summarize the documents." with your query.
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------
@Dan a lot of it depends on the setup of your index, and service context settings. Was there anything specific you modified, or just defaults?

Furthermore, enabling streaming often helps responses "feel" faster, since the response usually gets printed out faster than you can read
Add a reply
Sign up and join the conversation on Discord