Hi

Hi,

Please I really need help:

I use llama_index to query my nodes, ANd then verify if it works.

Plain Text

# code 1:

from llama_index import GPTVectorStoreIndex, ServiceContext

# storage_context = mongodb_storage_context()
embed_model = huggingface_embed_model()

# Set up the service context, i.e., the embedding model (and completion if used)
service_context = ServiceContext.from_defaults(embed_model=embed_model)

index_GPTVectorStoreIndex = GPTVectorStoreIndex(nodes=nodes,
                                service_context=service_context,
                                show_progress=True
                                )


# Code 2:

from llama_index import GPTSimpleKeywordTableIndex, ServiceContext

# storage_context = mongodb_storage_context()
embed_model = huggingface_embed_model()

# Set up the service context, i.e., the embedding model (and completion if used)
service_context = ServiceContext.from_defaults(embed_model=embed_model)

index_GPTSimpleKeywordTableIndex = GPTSimpleKeywordTableIndex(nodes=nodes,
                                service_context=service_context,
                                show_progress=True
                                )


# Verification: my metadata has url field

filter_nodes = [x for x in nodes if " " + query_term.lower() in x.text.lower()]
filter_nodes_urls = list(set([x.metadata["url"] for x in filter_nodes]))

retriever_nodes_GPTVectorStoreIndex_urls = []
for each_node in retriever_nodes_GPTVectorStoreIndex:
  for _node in nodes:
    if _node.id_ == each_node.id_:
      retriever_nodes_GPTVectorStoreIndex_urls.append(_node.metadata["url"])
      break

retriever_nodes_GPTVectorStoreIndex_urls = list(set(retriever_nodes_GPTVectorStoreIndex_urls))


retriever_nodes_GPTVectorStoreIndex_urls == filter_nodes_urls
False

4 comments

TTeemu

Hey, have you considered using a similarity threshold? This is probably the easiest way to achieve what you outlined. There also some other options here:

https://gpt-index.readthedocs.io/en/stable/core_modules/query_modules/node_postprocessors/modules.html#similaritypostprocessor

TTeemu

Also the keyword table index might fit:

https://gpt-index.readthedocs.io/en/stable/api_reference/indices/table.html

LLogan M

Yea, there's no way to "return all similar nodes" with a vector index.

Every node is technically similar, but it's across a range of scores.

Setting the top k to something huge and using a threshold can make sense. Keyword index may help too

Taking a step back, embeddings are just numerical representations of text. Using these representations, we can calculate similarity using math, like cosine similarity

TTeemu

Yeah you kind of need to decide what you consider similar

Add a reply

Find answers from the community

Hi