Find answers from the community

Updated last year

it s a large legal codex I am not sure

At a glance
it's a large legal codex. I am not sure what the approach to tweak/tune is
1
b
R
l
9 comments
getting wrong results as in hallucation? can share code if you like
no, literally irrelevant results in the retrieval phase. I ask a question, all I get is a few "sort of" matching vectors but none are relevant
I am working with the simple debugger turned on, just to figure out what is happening
I am using OpenAI for embeddings and for generation
is there a way to make sure OpenAI is being used correctly? Like logging the embedding generation phase for the question?
at this point I am almost sure I am just doing something wrong πŸ™‚
Stick to the retrieval, figure out why this is happening. e.g. maybe theres just too much data, maybe the chunk size is not ok etc..
Then start experimenting with other methods, like different indexation, using hierarchy, maximal marginal relevance etc..
Once you have the retrieval solved (and I really recommend using the evaluation framework developped by LlamaIndex to create chunk questions etc..) I would see if gpt is hallucinating or not
Yea, since you have such a large amount of data (35MB is a lot of text lol) you might need to explore different retrieval techniques.

Stuff like hybrid search, splitting the index into multiple indexes per topic and using a router, using a multi step query engine, etc.
Add a reply
Sign up and join the conversation on Discord