Find answers from the community

Home
Members
joe273558
j
joe273558
Offline, last seen 3 months ago
Joined September 25, 2024
Hello everyone,

I recently started working with llama-index and I've encountered a very weird issue that I cannot find the solution.

I have tried three big embedding models, including Salesforce/SFR-Embedding-Mistral model, GritLM/GritLM-7B model and intfloat/e5-mistral-7b-instruct model. It's very confusing that the nodes retrieved (top_k=10) all have the score higher than 0.9999999999. However, when I turned to use a small embedding model like UAE-Large-V1, the highest score is around 0.65, which seems to be okay.

Plus, I tried to modify my prompt. The results remain same (retrieved nodes may be different, but their scores are still very very close to 1), even if my prompt is 'hello' which has nothing to do with the nodes text retrieved and the contents I feed into the model.

I'm confused about where the problem lies. Below is my code snippet:

Plain Text
llm = AzureOpenAI(
    model="gpt-35-turbo",
    deployment_name='xxxx',
    api_key="xxxxx",
    )
embed_model=HuggingFaceEmbedding(model_name="Salesforce/SFR-Embedding-Mistral",cache_folder='model_cache',
                                device='cuda',embed_batch_size=1,max_length=3072)
Settings.llm = llm
Settings.tokenizer = tiktoken.encoding_for_model("gpt-3.5-turbo").encode
Settings.embed_model = embed_model
documents = SimpleDirectoryReader("../2024_report").load_data()
pipeline=IngestionPipeline(transformations=[MarkdownNodeParser(include_metadata=True,include_prev_next_rel=True),
                                            ])
nodes = pipeline.run(documents=documents)
index=VectorStoreIndex(nodes,context,show_progess=True)
query='hello'
retriever=VectorIndexRetriever(index=index,similarity_top_k=10,)
ret_nodes=retriever.retrieve(query)
for ret_node in ret_nodes:
    print(ret_node.score)

I'm reaching out to see if anyone has experienced similar issues. Ant insights or suggestions on how to solve this problem would be greatly appreciated. Thank you.
11 comments
L
j
Hi everyone. There is a max token limit for every embedding model. So if the size of the content I want to embed exceeds that token limit, what will happen? The kapa bot says that the model will only consider the first max_length tokens and ignore the rest. Is that the correct answer?
2 comments
L
W
Hello Everyone, I am currently working on RAG. But the documents (or nodes) I want to retrieve are all very small (like only contains one or two sentences in each node), the result is not very satisfactory. So do you guys have any good suggestions on the choice of embed model and retriever?
1 comment
W