Find answers from the community

Updated 4 months ago

Hi All,

At a glance
Hi All,

I am building a hybrid Q&A RAG pipeline (using semantic and keyword search) over a set of documents. Currently, it takes too long to answer a question. I want to store StorageContext in advance to improve processing time. Is that a good practice? What are some things I need to keep in mind for this purpose? Some other questions I have:

1) I understand that StorageContext has 4 components: index_store, vector_store, graph_store, and docstore. For my use case, there's no graph_store. Where can I store the remaining 3 stores? Is it a best practice to store all of them in a vector database?

2) I am using SimpleKeywordTableIndex for keyword search. Where can I store this index if I want to do it in advance? Can this also be stored in a vector database?

I would really appreciate if you can point me to a documentation around this use case. Thanks!
T
A
d
5 comments
Vector DB is generally faster but you can also use other storage methods like S3 buckets.

Have you looked at secinsights infrastructure? The repo is open source and has some good lessons on how to handle these:

https://github.com/run-llama/sec-insights
You can also just use hybrid search through some of the vector DB providers
Thanks, @Teemu!


I referred to secinsights.ai but it uses Postgres for vector database and AWS S3 bucket to store StorageContext. The difference in my use case is that I am using both semantic and keyword search vs just semantic search in secinsights. Will I need to use a vector database for semantic search and store StorageContext separately in a s3 bucket? Is that the most efficient option?

I don't want to use hybrid search options provided by some vector db providers such as pinecone and weaviate because that'll come with extra cost. I want to keep the cost minimum and use open-source and free options as much as I can. Does that make sense?
SimpleKeywordTableIndex keeps the keyword table in the index_store . You can store that either in-memory (then persist to disk) , or cloud storage options (more info here): https://docs.llamaindex.ai/en/stable/core_modules/data_modules/storage/index_stores.html
Add a reply
Sign up and join the conversation on Discord