Query

At a glance

The community member has a question about querying leaf nodes from an external vector database in the context of a tutorial on auto-merging retrievers. The comments suggest that typically only the leaf nodes would be in the vector store, while the other nodes are pulled from the docstore when merging needs to happen. However, the process becomes more complicated if the entire thing happens in an IngestionPipeline. Some community members recommend a custom transformation step to sort out the leaf nodes and avoid indexing non-leaf nodes, but others question whether this approach is even worth it, as it may load all the nodes into memory. The comments do not provide a definitive answer, but suggest that the approach depends on whether a remote docstore and vector store are used.

Useful resources

TTheDorsan

Hey! I have a question, regarding this tutorial: https://docs.llamaindex.ai/en/stable/examples/retrievers/auto_merging_retriever
So, how would I go about querying the leaf nodes from an external vector database?
Query all the nodes, and get the leaf nodes? That sounds a bit expensive in terms of computation.

7 comments

LLogan M

I think typically only your leaf nodes would be in the vector store

LLogan M

The other nodes are pulled from the docstore when merging needs to happen

TTheDorsan

Oh, I see. What if the whole thing happens in an IngestionPipeline?

TTheDorsan

It's a bit more complicated

LLogan M

Yea definitely more complicated -- not sure I would recommend that 😅 you could have a custom transformation step to sort out the leaf nodes vs. Everything else to avoid indexing non-leaf nodes (and would also give you a chance to throw the nodes into a docstore?)

TTheDorsan

Okay.. but I'm thinking it might not even be worth it to use this approach 😄
I mean it loads all the node into memory, right?

LLogan M

Depends if you use a remote docstore and vector store or not

Add a reply

Find answers from the community

Query