$h0gun

Handling Larger Text Inputs with LlamaIndex

If my embedding model has 512 token input size then what will happen if I will give bigger chunk of text to embed how llamaindex handle it?

1 comment

$$h0gun

Sparse Retriever: Considering Text or Metadata in Retrieval

Plain Text

sparse_retriever = BM25Retriever.from_defaults(docstore=docstore, similarity_top_k=5)

Does BM25Retriever takes metadata of nodes into account or not? If yes then how to prevent it and make it to only consider text of nodes into account.

1 comment

$$h0gun

Improving the Efficiency of BM25Retriever for PDF Documents

One more thing I want to build this BM25Retriever from directly pdf documents without chunking at all and also If i add more documents to my system i have to build this retriever from scratch on the entire documents again, is there any better solution to this? I am working on hybrid search maybe qdrant's default hybrid search might help?

$$h0gun

Node embeddings not visible in VectorStoreIndex

I want to look into VectorStoreIndex for the nodes and their embeddings and I have tried this snippet,

Plain Text

nodes = index.vector_store.get_nodes()
for node in nodes:
    print(f"Node ID: {node.node_id}, Embedding: {node.embedding}, Metadata: {node.metadata}")

But I am getting Embedding: None in all of them.
Is there any way I can see the generated embeddings of my nodes? And also I can't add huge amount of documents in SimpleDocumentStore using this

Plain Text

docstore = SimpleDocumentStore()
docstore.add_documents(nodes)

It gives me memory error after running for some time.

2 comments

$$h0gun

Question-document pair evaluation basics

Can you tell me more about the question->document pair evaluation how can I do it?

1 comment

$$h0gun

Evaluating a Hybrid Search Functionality for a Specific Use Case

I have build a search functionality over my dataset and built a hybrid retriever which takes BM25Retriever and VectorIndexRetriever into account. I want to now evaluate this searching mechanism. All evaluations are being done with question and answer pairs right now and do not fit into my case. What are possible solutions of evaluating this in my case? Any suggestions are welcome.

3 comments

$$h0gun

Index.insert() method document splitting parameters

Hey there, in index.insert() method there is parameter of type Document but I don't know how it is splitting my new document. If I want to manually define chunk_size and chunk_overlap for new documents inserted then how to do it? Or will it just take those parameters directly from already defined index?

3 comments

$$h0gun

This question is valid for any other

This question is valid for any other retrievers too

1 comment

Find answers from the community

Handling Larger Text Inputs with LlamaIndex

Sparse Retriever: Considering Text or Metadata in Retrieval

Improving the Efficiency of BM25Retriever for PDF Documents

Node embeddings not visible in VectorStoreIndex

Question-document pair evaluation basics

Evaluating a Hybrid Search Functionality for a Specific Use Case

Index.insert() method document splitting parameters

This question is valid for any other