I am trying to implement a sentence window retriever. In the exemple, the indexation of the document and the retriever are created in the same instance. So the output of the indexer is sentence_index and it is passed to the retriever. However in my use case, I would like to create the sentence_index in one instance and the retriever on another one. How can I import the sentence_index for the vector_store ?
I built a script to insert chunks in my vector store with llama index. I would like to use an external API for the embedding. How to do that in Servicecontext ?
But when I query my qdrant client it appears that the collection "RAG_llama_index_small_to_big" doesn't exist. Should I create it before trying to insert, or does QdrantVectorStore create it if it doesn't exist yet ?
before building the VectorStoreIndex with qdrant client, should I create the collection, or does QdrantVectorStore create the collection if it doesn't exist already in the qdrant cluster?
@kapa.ai I use a sentence-window retriever in this method : def retrieve(self, query): return self.query_engine.retrieve(query) It is using a qdrant vector store. I would like apply some filters based on the metadatas. How can I do that?
There is something I don't understand : The meta data are not embedded, so they shouldn't impact the split process. However, I'm trying to implement a small to big retriever but with small chunk size I have this error message :
"Metadata length (130) is longer than chunk size (128). Consider increasing the chunk size or decreasing the size of your metadata to avoid this."
Can you explain the reason why and how to make the metadata not affect the splitting process.
Here is the piece of code I use :
sub_chunk_sizes = [128, 256, 512] sub_node_parsers = [ SentenceSplitter.from_defaults(chunk_size=c,chunk_overlap=20) for c in sub_chunk_sizes ]
all_nodes = [] for base_node in tqdm(base_nodes): for n in sub_node_parsers: sub_nodes = n.get_nodes_from_documents([base_node])