Find answers from the community

Updated last year

hi all llama peeps πŸ™‚ any chromadb

hi all llama peeps πŸ™‚ any chromadb masters out here? What do we do when the nodes that are returned by a query_engine are all from the wrong documents? I'm creating Chromadb backed VectorStoreIndices and when I build the query_engine and pass some sample queries, sometimes the results are from the correct document and sometimes they are from wrong documents. All my documents have a metadata key 'source' with the full url which contains a unique code "CVE-2023-36898" for example. And the very first few sentences at the top of each document that code is mentioned again. I don't understand how with such specific strings in each document, so many returned nodes can be so incorrect.
  • Do I need to increase the top_k and then use a Reranker?
  • when I created the chroma collections, I set them to cosine.
  • is there a way to improve Chroma's accuracy?
I'm using the following:
query_engine = vector_store_indicies['msrc_security_update'].as_query_engine( similarity_top_k=5, node_postprocessors=[metadata_replace], response_mode="tree_summarize" )
t
1 comment
its so bonkers. I query for 'CVE-2023-36898' and the nodes that come back are for 'CVE-2023-36899' and 'CVE-2023-36897' like wut?
Add a reply
Sign up and join the conversation on Discord