The community member has a question about querying leaf nodes from an external vector database in the context of a tutorial on auto-merging retrievers. The comments suggest that typically only the leaf nodes would be in the vector store, while the other nodes are pulled from the docstore when merging needs to happen. However, the process becomes more complicated if the entire thing happens in an IngestionPipeline. Some community members recommend a custom transformation step to sort out the leaf nodes and avoid indexing non-leaf nodes, but others question whether this approach is even worth it, as it may load all the nodes into memory. The comments do not provide a definitive answer, but suggest that the approach depends on whether a remote docstore and vector store are used.
Yea definitely more complicated -- not sure I would recommend that π you could have a custom transformation step to sort out the leaf nodes vs. Everything else to avoid indexing non-leaf nodes (and would also give you a chance to throw the nodes into a docstore?)