Find answers from the community

$
$h0gun
Offline, last seen 3 weeks ago
Joined September 25, 2024
If my embedding model has 512 token input size then what will happen if I will give bigger chunk of text to embed how llamaindex handle it?
1 comment
L
Plain Text
sparse_retriever = BM25Retriever.from_defaults(docstore=docstore, similarity_top_k=5)

Does BM25Retriever takes metadata of nodes into account or not? If yes then how to prevent it and make it to only consider text of nodes into account.
1 comment
W
One more thing I want to build this BM25Retriever from directly pdf documents without chunking at all and also If i add more documents to my system i have to build this retriever from scratch on the entire documents again, is there any better solution to this? I am working on hybrid search maybe qdrant's default hybrid search might help?
I want to look into VectorStoreIndex for the nodes and their embeddings and I have tried this snippet,
Plain Text
nodes = index.vector_store.get_nodes()
for node in nodes:
    print(f"Node ID: {node.node_id}, Embedding: {node.embedding}, Metadata: {node.metadata}")

But I am getting Embedding: None in all of them.
Is there any way I can see the generated embeddings of my nodes? And also I can't add huge amount of documents in SimpleDocumentStore using this
Plain Text
docstore = SimpleDocumentStore()
docstore.add_documents(nodes)

It gives me memory error after running for some time.
2 comments
L
Can you tell me more about the question->document pair evaluation how can I do it?
1 comment
n
I have build a search functionality over my dataset and built a hybrid retriever which takes BM25Retriever and VectorIndexRetriever into account. I want to now evaluate this searching mechanism. All evaluations are being done with question and answer pairs right now and do not fit into my case. What are possible solutions of evaluating this in my case? Any suggestions are welcome.
3 comments
L
Hey there, in index.insert() method there is parameter of type Document but I don't know how it is splitting my new document. If I want to manually define chunk_size and chunk_overlap for new documents inserted then how to do it? Or will it just take those parameters directly from already defined index?
3 comments
W
$
This question is valid for any other retrievers too
1 comment
L