Question: I'm finding that I need to ask questions very precisely to get the proper answer. Is there anything I can do to change this?
Example: I've created an index using ~1000 blog posts about legal tech from the past 3ish years. MyCase is a legal tech company that was acquired last year by a company called Affinipay. This is covered in the document corpus.
However you can see that the right answer isn't returned until I ask a pointed question. Is there anything I might be able to try to improve this?
@matt_a in order to debug this you first need to understand if the index retrieves the right paragraphs because that's what the LLM base his answers on (unless you're looking at all paragraphs by using ListIndex)
@yoelk I'm wondering if the issue could have to do with the fact that the topic is brought up in multiple documents. For example, there might be 5 different blog posts that mention MyCase and acquisition. I'm not intimate with how the index works but I assume it's not going so far as to group those documents in the same search space
@mikan If you indexed all of them into one index then it should look for the closest top_k paragraphs. Try setting similarity_top_k=5 (I think default value is one which means it sends the LLM only the single top matching paragraph)
If you had a date you could potentially pick the most recent paragraph out of the top 5 even if it wasn't the highest similar. The risk is that you'll send the LLM an irrelevant paragraph which doesn't even discuss about the acquisition of MyCase.
Hi, I'm working on a similar project. Not using Llama yet. At the moment, I embedded all the docs in pinecone. Over 20 years, over 50k docs, news,... When using cosine sim. The most recent doc, containing the answer, often is buried under 15 old news about the evolution of the subject. And I can barely fit the top 3 in da vinci's context... Any advice on how to make the index more aware of freshness?