Find answers from the community

Updated 11 months ago

Can someone provide some tips around

At a glance

Can someone provide some tips around working with 100+ documents? I have used SentenceWindowNodeParser and stored them in my vector db.

However this is performing poorly when I ask a question and expect certain sentences to be retrieved.

TIA!

10 comments

LLogan M

Probably using some form of hybrid retrieval, or setting a larger top-k and using a reranker will help

ggamecode8

Thanks ill give those a try. This is the right example for hybrid retrieval right, https://docs.llamaindex.ai/en/stable/examples/query_engine/CustomRetrievers/?

aaxentar

I just used this to overcome the problem of the paul graham essay, thanks!

LLogan M

@gamecode8 thats definitely one way of doing it. This is a another slightly (better?) one imo https://docs.llamaindex.ai/en/stable/examples/retrievers/relative_score_dist_fusion/?h=queryfusionre

LLogan M

Can use the above to combine any number of retrievers

ggamecode8

Thank you, I will definitely give this a try today

ggamecode8

@Logan M I found the bm25 retriever actually retrieves the document im expecting but its pulling the entire document from the doc store which is quite long. Is there a built in solution to provide it with nodes stored in the vector store? Im using postgres for context

LLogan M

hmm not really -- you could chunk it on the fly, or chunk it before storing in the docstore

ggamecode8

I see. I was thinking about chunking before storing in the docstore but then the issue is that each node wouldnt have a unique id i can assign since im using the ingestion pipeline.

I was thinking Doc1(id=A123) could become TextNode(id=A123-0), TextNode(id=A123-1)...TextNode(id=A123-N)
but then i could be left with some nodes not getting deduped if the document gets shorter for example

LLogan M

each node is automatically assigned a unique ID 👀 And then each node also has a node.ref_doc_id that points to the source doucment ID

it also as a node.prev_node and node.next_node attributes pointing to the IDs of the prev/next nodes

Add a reply