Find answers from the community

Updated 3 months ago

Hi,

Hi,
I've built 2 simple RAG script. One in Langchain, one in Llamaindex:

Llamaindex: query_engine = index.as_query_engine()
Langchain:
chain = load_qa_chain(llm, chain_type="stuff") res = chain.run(input_documents=docs, question=prompt)

Then I have a chromadb where docs have been indexed via langchain and the same embedding function.
Finally, in another script, I pass 100 questions, store context retrieved and responses, and have a custom prompt to evaluate the faithfulness of the response given question and context
I got very disturbing result with 80% faithfulness when using Langchain retriever, but only 20% when using Llamaindex.
I would assume that it could be because of the documents structure in chroma, and I'm trying to reindex everything, but the corpus is big and need to wait a day or 2 before having a replica db using llamaindex to index.
Would anyone have experienced the same and could point me in the right direction to get proper faithfulness from Llamaindex. I'm trying to migrate away from Langchain but these results do not help.
L
B
7 comments
I mean, I would probably dive into a handful of questions that aren't performing well

response = query_engine.query("...")

Here, you can check response.source_nodes to see if the retrieved nodes make sense

I'm not sure about langchain, but with llama-index, the default top-k is 2. And of course there's a few other things you can do to tweak the performance, but doing the debugging of
a) do my retrieved nodes make sense?
b) does the response make sense for the given nodes

Will help somewhat to track down the issue

You can also just create a retriever to debug retrieval, if that is the issue

Plain Text
retriever = index.as_retriever(similarity_top_k=2)
nodes = retriever.retrieve("test")
I would suspect that maybe the retrieved nodes are lacking some information? Or if not, there is probably a way to make the settings more similart to langchain.

What LLM are you using?
GPT4

We have logged all steps, and on the 115 different questions that we asked through Llamaindex, it always fetched the same context elements. When asking about Apple or Alphabet or any other stock, it returns the same chunk from Bank of America Which is really weird.
yea that sounds pretty strange.... I would suspect that might have something to do with the vector db being created without llama-index. But hard to say without getting my hands on it.
Generally I've never had issues like that working with llamaindex alone
oh lol, we just found the issue.
We passed the custom prompt directly into the standard query engine query string, which confuses the LLM πŸ˜‚
So focused looking under the hood that we didnt check the basics haha
Oh good find! πŸ‘€πŸ‘
Add a reply
Sign up and join the conversation on Discord