I am building a hybrid Q&A RAG pipeline (using semantic and keyword search) over a set of documents. Currently, it takes too long to answer a question. I want to store StorageContext in advance to improve processing time. Is that a good practice? What are some things I need to keep in mind for this purpose? Some other questions I have:
1) I understand that StorageContext has 4 components: index_store, vector_store, graph_store, and docstore. For my use case, there's no graph_store. Where can I store the remaining 3 stores? Is it a best practice to store all of them in a vector database?
2) I am using SimpleKeywordTableIndex for keyword search. Where can I store this index if I want to do it in advance? Can this also be stored in a vector database?
I would really appreciate if you can point me to a documentation around this use case. Thanks!
I referred to secinsights.ai but it uses Postgres for vector database and AWS S3 bucket to store StorageContext. The difference in my use case is that I am using both semantic and keyword search vs just semantic search in secinsights. Will I need to use a vector database for semantic search and store StorageContext separately in a s3 bucket? Is that the most efficient option?
I don't want to use hybrid search options provided by some vector db providers such as pinecone and weaviate because that'll come with extra cost. I want to keep the cost minimum and use open-source and free options as much as I can. Does that make sense?