Find answers from the community

Updated last year

Feedback request post I m working on a

Feedback request post: I'm working on a project to pull relevant page references from books based on a question. For example: if I asked "what spells has Harry Potter used?", the AI could respond "check pages 50, 100, and 120 for information on that". I'm using OpenAI to do Text Search Embeddings to create the Vectors for my index and then passing the full questions to the retriever and it works pretty well but I would love feedback on what I could tweak/test to get better results. I've managed to get the retriever to be accurate about 88% of the time (i.e., 12% of the time it misses a key page I would have expected it to return).
L
P
30 comments
are you using pure vector search? Have you considered hybrid search as well?
hmm I'm not sure what those are. I haven't seen those terms referenced in the docs I've been using. Can you expand?
Hybrid search is a term usually used for combining keyword search and vector search.

There are a few ways this can be implemented. Either retrieving nodes using both approaches and applying some fusing scoring function (weaviate does this), or fetching nodes through both methods, and re-ranking them to return the true top k

Most popular vector dbs support some form of this, but if they don't you can take a custom approach as well. Using something like BM25 works well
https://gpt-index.readthedocs.io/en/stable/examples/retrievers/bm25_retriever.html#advanced-hybrid-retriever-re-ranking
I use a pretty outdated re-ranker in that example -- I would recommend BAAI/bge-reranker-base these days
Gotcha. That makes sense. Does it matter what Embedding Mode or Embedding AI I use to generate the vectors that I then plan to use with a hybrid search? I used the Davinci model with OpenAI in text_search mode to generate my most effective vector store index so far. Could I store that vector in Weaviate and go from there (I might be misunderstanding the use of a "vector database" so apologies if my question doesn't make sense)?
Perfect. Super excited to dive into vector databases! I started exploring Pinecone the other day but wasn't quite sure what my use case would be and now I have that. Have you used Pinecone? Weaviate > Pinecone?
Both are pretty comparable I think, up to you πŸ™‚ Both support hybrid search though
Ok so thanks to your great help, I was able to get hybrid search working with a Weaviate vector store. Unfortunately, it's actually less accurate than my default vector retriever 😒 I've messed around with the alpha and that doesn't seem to help. I'm wondering what I can do to debug from here. I've looked at some of the results of the queries that are inaccurate and really don't understand why they aren't retrieving certain pages. I went ahead and tried using a query string that exactly matches a string in the document and it still didn't find the right page. Any idea how I can debug why it's ranking the pages the way it is?
hmm, tbh I'm really not sure πŸ€” I'm not 100% sure how weaviate hybrid search works

One suggestion could be increasing the top k and then adding a re-ranker?
ok I like the idea of a re-ranker. I guess I'll dive down that rabbit hole.
Ok it took a bit of tinkering but I think this reranking update is huge. I'm experimenting with LLMRerank and SentenceTransformerRerank and may look at some others as well, but it's all promising. Do you have any resources you can share that explain how rerank algorithms work? they just seem like complete magic haha
Hmmm I think llm rerank is just a prompt to the llm to re-order

Sentence transformers uses models specifically trained for re-ranking usually. I personally think bge-base-reranker is an ideal option

I think they are also callee cross encoders
Alright, Logan! I'm back. I spent a few days going down a re-ranker rabbit hole and gotta say, it wasn't as promising as I had hoped. Both the LLM and bge-reranker-base rerankers performed significantly worse than my most accurate configuration: Weaviate in hybrid mode (default alpha) with similarity_top_k: 5. With that default, the pages that the retriever grabs contains at least 1 correct page 96% of the time (correct page = I manually found the best page) which is pretty good, but the order is still a big issue. Any thoughts on what else I could do to improve these results?
Interesting, surprised it performed worse. I wonder if it has to do with the length of text (for example, bge cuts off text after 512 tokens)

Anyways -- not really sure how else to improve here.

Maybe can you clarify why the order is an issue if the proper nodes are still in the top 5?
hmm yeah length could be an issue I'm working with pretty large amounts of text. I can look into that.

The reason that the order is an issue is because I really want the list to only contain correct pages. At the moment, when 96% have at least 1 correct page, only 58% of those results list the correct page in the first spot. I'm worried that the issue might be too subjective. In other words, maybe the pages I think are correct, the AI actually doesn't agree with and maybe some of the AI's pages are "technically" better pages. However, from my reviewing of the results, it seems like the pages the retriever is selecting are objectively not the best?
Llama index is chunking your data into chunks (the default size is 1024), which you could lower

And yea that's fair. Other strategies like sentence window or auto merging retirvers may help too (but for now, they only work with the base vector store, so not with weaviate at the moment)

https://gpt-index.readthedocs.io/en/stable/examples/node_postprocessor/MetadataReplacementDemo.html

https://gpt-index.readthedocs.io/en/stable/examples/retrievers/auto_merging_retriever.html
I'm kinda throwing the kitchen sink at you haha
But you seem willing to try out these features πŸ˜…
I am! And keep throwing it! It's definitely appreciated as I just don't have the background to know what to try next.
Where is llama index chunking my data before I pass it to the reranker? Or is llama_index doing that inside the reranker somewhere (if so, I'm not sure where to modify that chunking)?
The data gets chunked when you call from_documents() or instert()

You can adjust the chunk size in the service context (or node parser, but service context is easier)

Plain Text
ServiceContext.from_defaults(chunk_size=512)
So I setup the sentence window but I ended up with exactly the same results which makes me think I'm doing it wrong. The docs only show how to pass the post processor in as_query_engine but I'm not actually doing any querying; I'm just using the vector index retriever. I've passed the node_parser to the Service Context but is there something else I need to do to pass it to the retriever?
Did you create the index from scratch?

Basically, it needs to parse the nodes when you call from_documents

Each node will be a single sentence (or at least it tries its best). Then the metadata contains a larger window around that sentence

During embeddings, only the single sentence is embedded (not the window). So retrieval will retrieve single sentences

But then, the metadata replacement node-postprocessor is run, which replaces each sentence with it's wider window after retrieval.

In a normal query engine, this happens after retrieval, but before response synthesis

Since you are only using the retriever, you would also need to apply the postprocessor yourself

new_nodes = metadata_replacement.postprocess_nodes(retrieved_nodes)
ohhhh... ok that makes a lot more sense. I'll experiment with generating a new index tomorrow.
Thanks for all your help!
Hey Logan! I have a specific example of a retriever issue I'm seeing related to all of this that might help narrow down ways of improving the algorithm. So I have a CS textbook and the prompt talks about "key loggers". Page 843 talks about "keyloggers" but the space difference is causing the retriever to not find page 843 even with a top_k of 20. Thoughts on how tiny differences like that may be reconciled?
oof thats tough haha

What setup were you testing with that example?
Add a reply
Sign up and join the conversation on Discord