Find answers from the community

Updated 2 months ago

Hi all I utilize vector store

Hi all, I utilize vector_store = PGVectorStore.from_params(). When working with the Retrieval-Augmented Generation (RAG) model for Q&A, what's the optimal way: sending the full document(s) instead of using a similarity search using similarity_top_k? How should this be implemented, and what's the most effective approach to take? Thank you
L
x
23 comments
I'm not sure what you mean. You want to fetch all data from your vector store?
I have an index for each document (one document have 150-200 pages) and I want to be able to support 2 types of queries:
  1. using embeddings semantic search
  2. fetch all data, and send entire document (150-200 pages)
So for case 2. - what's the optimal way to fetch all data and send entire document to LLM ? what's the most effective approach to use ?
for use-case 2, I would use a SummaryIndex (used to be called ListIndex)

Plain Text
from llama_index import SummaryIndex
index = SummaryIndex.from_documents(documents)
query_engine = index.as_query_engine(response_mode="tree_summarize", use_async=True)


It will send all nodes to the LLM -- but be warned, this will be slow for something that large.
The above is the fastest settings possible for this
In case I will need to query questions over more documents(different index per document) a TreeIndex can be better ?.., building indices on top of other document indices. Will be a good approach to compose a graph made up of indices and use query the graph ? What will do a better job: agent or sub question engine or router engine ?
I think the sub question engine would be the best pick. And then put that in an agent if you need chat history

But again -- sending 150-200 pages to the LLM will not be fast πŸ˜…
there is a better option to do that, to be able to respond to a question that require entire document ?
Using a vector index would be the option for that, with a similarity top k
but, will not cover all relevant nodes because will be limited by similarity top k number configured
if I have index already created as VectorStoreIndex.from_vector_store
how I can use SummaryIndex, I need to recreate a new index ? how to save this new index in database ? Can VectorStoreIndex be used both with similarity top k and as SummaryIndex ?
I think you need to decide whether you want all nodes or only the relevant ones πŸ˜…

You can use a router engine to switch between a summy index or a vector index as needed. Typically you only want to use the summary index for queries that require reading the entire index

You can store the summary index in mongodb, redis, s3, google cloud bucket, etc.

Two main ways -- either using fsspec or a docstore/index_store integration
https://gpt-index.readthedocs.io/en/stable/core_modules/data_modules/storage/save_load.html#using-a-remote-backend

https://gpt-index.readthedocs.io/en/stable/core_modules/data_modules/storage/docstores.html

https://gpt-index.readthedocs.io/en/stable/core_modules/data_modules/storage/index_stores.html
To reduce the time, there is any way to distribute the load, having parallel processing.. sending each chunk as a separate call to LLM and combining the answers ?

use_async=True is not working as expected
We've been avoiding parallel processing, since it generally creates hard to debug/maintain code

use_async=True should be working fine -- there is a ton of room for concurrancy when waiting for API calls
use_async=True it may not work for all models, I'm using Bedrock models ? from langchain.llms.bedrock import Bedrock
ah yea, langchain probably didn't implement async, classic
We should probably properly implement bedrock at some point so that we can properly support async there
do you have a timeline when bedrock will be implemented ?
nope, I think you are the first person to ask about it from what I remember lol
most integrations are community driven
Because of course there about 60000 LLM services lol
Bedrock is in limited preview and not yet GA, when will be.. there will be more interest in using it. Will be good to have native support implemented from my point of view to not have limitations.
Since you have access, if you are up for it, would love a PR. Happy to review/merge πŸ™‚

LLM integrations are fairly easy to add -- basically just a single file to add
Add a reply
Sign up and join the conversation on Discord