Find answers from the community

Updated 11 months ago

Query

At a glance

The community member has a question about querying leaf nodes from an external vector database in the context of a tutorial on auto-merging retrievers. The comments suggest that typically only the leaf nodes would be in the vector store, while the other nodes are pulled from the docstore when merging needs to happen. However, the process becomes more complicated if the entire thing happens in an IngestionPipeline. Some community members recommend a custom transformation step to sort out the leaf nodes and avoid indexing non-leaf nodes, but others question whether this approach is even worth it, as it may load all the nodes into memory. The comments do not provide a definitive answer, but suggest that the approach depends on whether a remote docstore and vector store are used.

Useful resources
Hey! I have a question, regarding this tutorial: https://docs.llamaindex.ai/en/stable/examples/retrievers/auto_merging_retriever
So, how would I go about querying the leaf nodes from an external vector database?
Query all the nodes, and get the leaf nodes? That sounds a bit expensive in terms of computation.
L
T
7 comments
I think typically only your leaf nodes would be in the vector store
The other nodes are pulled from the docstore when merging needs to happen
Oh, I see. What if the whole thing happens in an IngestionPipeline?
It's a bit more complicated
Yea definitely more complicated -- not sure I would recommend that πŸ˜… you could have a custom transformation step to sort out the leaf nodes vs. Everything else to avoid indexing non-leaf nodes (and would also give you a chance to throw the nodes into a docstore?)
Okay.. but I'm thinking it might not even be worth it to use this approach πŸ˜„
I mean it loads all the node into memory, right?
Depends if you use a remote docstore and vector store or not
Add a reply
Sign up and join the conversation on Discord