Find answers from the community

Updated 9 months ago

Can someone provide some tips around

Can someone provide some tips around working with 100+ documents? I have used SentenceWindowNodeParser and stored them in my vector db.

However this is performing poorly when I ask a question and expect certain sentences to be retrieved.

TIA!
L
g
a
10 comments
Probably using some form of hybrid retrieval, or setting a larger top-k and using a reranker will help
Thanks ill give those a try. This is the right example for hybrid retrieval right, https://docs.llamaindex.ai/en/stable/examples/query_engine/CustomRetrievers/?
I just used this to overcome the problem of the paul graham essay, thanks!
@gamecode8 thats definitely one way of doing it. This is a another slightly (better?) one imo https://docs.llamaindex.ai/en/stable/examples/retrievers/relative_score_dist_fusion/?h=queryfusionre
Can use the above to combine any number of retrievers
Thank you, I will definitely give this a try today
@Logan M I found the bm25 retriever actually retrieves the document im expecting but its pulling the entire document from the doc store which is quite long. Is there a built in solution to provide it with nodes stored in the vector store? Im using postgres for context
hmm not really -- you could chunk it on the fly, or chunk it before storing in the docstore
I see. I was thinking about chunking before storing in the docstore but then the issue is that each node wouldnt have a unique id i can assign since im using the ingestion pipeline.

I was thinking Doc1(id=A123) could become TextNode(id=A123-0), TextNode(id=A123-1)...TextNode(id=A123-N)
but then i could be left with some nodes not getting deduped if the document gets shorter for example
each node is automatically assigned a unique ID πŸ‘€ And then each node also has a node.ref_doc_id that points to the source doucment ID

it also as a node.prev_node and node.next_node attributes pointing to the IDs of the prev/next nodes
Add a reply
Sign up and join the conversation on Discord