Find answers from the community

Updated last year

Index

At a glance

The community member created an index and can see the relevant context when manually searching the docstore.json, but when running the query through the retrieval process, it keeps returning that the context is not available. They have tried various approaches, including scaling back to a simpler setup, but the issue persists. The community members discuss potential solutions, such as checking the response.source_nodes, adjusting the top k or chunk size settings, and using a hybrid search approach that combines vector and keyword search. One community member suggests that the issue may be related to the way the documents are created and embedded, and recommends trying a hybrid search approach. The community member agrees to try the hybrid search and report back on their progress.

Useful resources
hello, I created an index and when I manually search the docstore.json I can see reference to the context I am querying, but when I run my question through the retrieval process, it continually insists that what I am asking about is not available in the given context. I am baffled after trying to fix this for a couple days, I could use some help.
L
t
17 comments
How big is your index? Did you check response.source_nodes to see if the proper text was retrieved? Did you play around with top k or chunk size settings?
For bigger indexes, some more complex retrieval methods might be needed
hi Logan - thanks for your reply! I didn't check response.source_nodes - I will do that... right now this isn't working even w/small subsets of data so I am guessing there is a problem w/my index... will circle back after I check response.source_nodes
ok that was illuminating... it's pulling back the wrong context.
I am not sure where to go from here. I am currently using the simplest setup possible. I've scaled back from the Sub Question Query Engine & Chatbot just bc those were also not finding expected context, so I stripped everything back to the most simple code and reduced document volume to try to troubleshoot. Now I can clearly see it's pulling incorrect context, even when I search with a unique name in the query. Any suggestions on where to go from here? Should I re-index/re-embed? and if so, how... all I did was use the standard VectorStoreIndex.from_documents -- guessing that was my error?
Can you give some examples of queries you are trying, that aren't working?
Certain "categories" of queries might require some more config, beyond the basic vector index
"Who is Dr. Alvarez?"
I just made a change to my index setup on the very small test data set.. i added chunk_overlap=100, and now the query works
Interesting, that works! I was going to suggest smaller chunk size or hybrid search lol
thanks - I am not sure that I know what you mean by Hybrid Search, but... I am going to re-embed the full data set with this overlap param and try again in the chatbot w/Sub Question Query Engine
maybe too ambitious, but I will report back either way. thank you so much for being on the other end of this discord chat!
Hybrid search basically means combining vector and keyword search.

We have a few ways of doing this, if you end up needing it. This one is my favorite
https://docs.llamaindex.ai/en/stable/examples/retrievers/reciprocal_rerank_fusion.html#use-in-a-query-engine
And yea, happy to help. Good luck!
thanks, I will take a look!
I said I would report back and so far, not having much luck but I have not tried the hybrid search yet. I am starting to think that doing RAG over emails is really hard and maybe I should start with some other set of data/docs. either that, or I am doing something wrong with creating the embeddings.
Hmm, I guess emails are a little tricky.

I have a feeling hybrid will help quite a bit.

When you create your documents before embedding, is it one document per email? Or does it include an entire chain of emails per document?
Add a reply
Sign up and join the conversation on Discord