Find answers from the community

Updated 3 months ago

Hello! I am having trouble integrating a

Hello! I am having trouble integrating a VectorStore (Milvus) with a document management pipeline. I do not want to store a docstore.json file and load it from disk anytime, I want something closer to a production-ready approach to performing upserts in my index.

As I understand (might be wrong) the VectorStore integration does NOT save the docstore, even though Milvus e.g. saves the text corresponding to each node.

How are people approaching this issue? Any tutorial I see or documentation is concerned about very basic cases (document mgmt when index is not persisted in a vector database, or using in-memory databases like Redis for storing the documents).

Any ideas on how can I tweak Milvus to store my docstore, if viable? If not, any workarounds?

I find the documentation lacks details in the interaction of vector store and document management
L
e
5 comments
there are remote docstores (redis, mongodb, firestore, postgres) -- use one of those
Thanks! I couldn't find any tutorial on persisting index, vectors and docstore separately, is there any reading material or repo I could take a peak for such a complex pipeline?
Also, thank you so much, I see your name popping up on almost every PR on Github and you take the time to answer questions over here
if you are using a remote docstore, its persisted automatically (same with your vector store)

imo I would attach a docstore + vector store to an ingestion pipeline to properly handle upserts, its the most straight forward

Plain Text
pipeline = IngestionPipeline(
  transformations=[SentenceSplitter(), OpenAIEmbedding()]
  docstore=<docstore>,
  vector_store=<vector_store>
)

# very important that documents have a static ID, like a file path
documents = SimpleDirectoryReader(..., filename_as_id=True).load_data()

pipeline.run(documents=documents)

index = VectorStoreIndex.from_vector_store(<vector_store>)
Thank you very much! Have a nice day!
Add a reply
Sign up and join the conversation on Discord