Find answers from the community

Updated 3 months ago

Hi

Hi,

Please I really need help:

I use llama_index to query my nodes, ANd then verify if it works.

Plain Text
# code 1:

from llama_index import GPTVectorStoreIndex, ServiceContext

# storage_context = mongodb_storage_context()
embed_model = huggingface_embed_model()

# Set up the service context, i.e., the embedding model (and completion if used)
service_context = ServiceContext.from_defaults(embed_model=embed_model)

index_GPTVectorStoreIndex = GPTVectorStoreIndex(nodes=nodes,
                                service_context=service_context,
                                show_progress=True
                                )


# Code 2:

from llama_index import GPTSimpleKeywordTableIndex, ServiceContext

# storage_context = mongodb_storage_context()
embed_model = huggingface_embed_model()

# Set up the service context, i.e., the embedding model (and completion if used)
service_context = ServiceContext.from_defaults(embed_model=embed_model)

index_GPTSimpleKeywordTableIndex = GPTSimpleKeywordTableIndex(nodes=nodes,
                                service_context=service_context,
                                show_progress=True
                                )


# Verification: my metadata has url field

filter_nodes = [x for x in nodes if " " + query_term.lower() in x.text.lower()]
filter_nodes_urls = list(set([x.metadata["url"] for x in filter_nodes]))

retriever_nodes_GPTVectorStoreIndex_urls = []
for each_node in retriever_nodes_GPTVectorStoreIndex:
  for _node in nodes:
    if _node.id_ == each_node.id_:
      retriever_nodes_GPTVectorStoreIndex_urls.append(_node.metadata["url"])
      break

retriever_nodes_GPTVectorStoreIndex_urls = list(set(retriever_nodes_GPTVectorStoreIndex_urls))


retriever_nodes_GPTVectorStoreIndex_urls == filter_nodes_urls
False
T
L
4 comments
Hey, have you considered using a similarity threshold? This is probably the easiest way to achieve what you outlined. There also some other options here:

https://gpt-index.readthedocs.io/en/stable/core_modules/query_modules/node_postprocessors/modules.html#similaritypostprocessor
Yea, there's no way to "return all similar nodes" with a vector index.

Every node is technically similar, but it's across a range of scores.

Setting the top k to something huge and using a threshold can make sense. Keyword index may help too

Taking a step back, embeddings are just numerical representations of text. Using these representations, we can calculate similarity using math, like cosine similarity
Yeah you kind of need to decide what you consider similar
Add a reply
Sign up and join the conversation on Discord