hi all llama peeps 🙂 any chromadb

hi all llama peeps 🙂 any chromadb masters out here? What do we do when the nodes that are returned by a query_engine are all from the wrong documents? I'm creating Chromadb backed VectorStoreIndices and when I build the query_engine and pass some sample queries, sometimes the results are from the correct document and sometimes they are from wrong documents. All my documents have a metadata key 'source' with the full url which contains a unique code "CVE-2023-36898" for example. And the very first few sentences at the top of each document that code is mentioned again. I don't understand how with such specific strings in each document, so many returned nodes can be so incorrect.

Do I need to increase the top_k and then use a Reranker?
when I created the chroma collections, I set them to cosine.
is there a way to improve Chroma's accuracy?

I'm using the following:

query_engine = vector_store_indicies['msrc_security_update'].as_query_engine(
    similarity_top_k=5, node_postprocessors=[metadata_replace], response_mode="tree_summarize"
)

Find answers from the community

hi all llama peeps 🙂 any chromadb