sparse_retriever = BM25Retriever.from_defaults(docstore=docstore, similarity_top_k=5)
BM25Retriever
takes metadata
of nodes into account or not? If yes then how to prevent it and make it to only consider text
of nodes into account.BM25Retriever
from directly pdf documents without chunking at all and also If i add more documents to my system i have to build this retriever from scratch on the entire documents again, is there any better solution to this? I am working on hybrid search maybe qdrant's default hybrid search might help?VectorStoreIndex
for the nodes and their embeddings and I have tried this snippet,nodes = index.vector_store.get_nodes() for node in nodes: print(f"Node ID: {node.node_id}, Embedding: {node.embedding}, Metadata: {node.metadata}")
Embedding: None
in all of them.SimpleDocumentStore
using thisdocstore = SimpleDocumentStore() docstore.add_documents(nodes)
retriever
which takes BM25Retriever
and VectorIndexRetriever
into account. I want to now evaluate this searching mechanism. All evaluations are being done with question and answer pairs right now and do not fit into my case. What are possible solutions of evaluating this in my case? Any suggestions are welcome.index.insert()
method there is parameter of type Document
but I don't know how it is splitting my new document. If I want to manually define chunk_size
and chunk_overlap
for new documents inserted then how to do it? Or will it just take those parameters directly from already defined index
?retrievers
too