Find answers from the community

M
Mike
Offline, last seen 3 months ago
Joined September 25, 2024
Hello I'm trying to develop an app with RAG pipeline with two parts:
1) Knowledge base builder - intakes documents from a storage andextracts embeddings from the documents.
2) Client part - takes user's question, returns list of the documents )

docstore = SimpleDocumentStore() docstore.add_documents(documents) vector_store = MilvusVectorStore(**config.get("vector_store")) index_store = SimpleIndexStore() storage_context = StorageContext.from_defaults(vector_store=vector_store, index_store=index_store, docstore=docstore) embed_model = OpenAIEmbedding() if os.environ.get("OPENAI_API_KEY") else "local" service_context = ServiceContext.from_defaults(embed_model=embed_model) index = VectorStoreIndex.from_documents(documents=documents, storage_context=storage_context, service_context=service_context, show_progress=True) index.storage_context.persist(config.get("storage_context")) docstore.persist(config.get("documents_store"))
Questions:
1) Do I need to parse my documents into nodes and add both docs and nodes to the docstore?
2.a) Does VectorStoreIndex.from_documents() parse nodes by default?
2.b) Can I force it to use documents?
2.c) Am I right that the only way to keep relations between nodes and documents is to parse them explicitly?
3) Do I need to pass documents to VectorStoreIndex even if storage_context is aware of docstore?
4) Does docstore update documents if I add documents with the same id, but diffennt contents?
5) Am I right that node_id is used to indentify vectors in my MilvusVectorStore?
6) Why do I need to persist storage_context if client part loads index using VectorStoreIndex.from_vector_store() ? What is correct way to save index and use it for quering?
4 comments
W
M
L