joe273558

·

Hello everyone,

Hello everyone,

I recently started working with llama-index and I've encountered a very weird issue that I cannot find the solution.

I have tried three big embedding models, including Salesforce/SFR-Embedding-Mistral model, GritLM/GritLM-7B model and intfloat/e5-mistral-7b-instruct model. It's very confusing that the nodes retrieved (top_k=10) all have the score higher than 0.9999999999. However, when I turned to use a small embedding model like UAE-Large-V1, the highest score is around 0.65, which seems to be okay.

Plus, I tried to modify my prompt. The results remain same (retrieved nodes may be different, but their scores are still very very close to 1), even if my prompt is 'hello' which has nothing to do with the nodes text retrieved and the contents I feed into the model.

I'm confused about where the problem lies. Below is my code snippet:

Plain Text

llm = AzureOpenAI(
    model="gpt-35-turbo",
    deployment_name='xxxx',
    api_key="xxxxx",
    )
embed_model=HuggingFaceEmbedding(model_name="Salesforce/SFR-Embedding-Mistral",cache_folder='model_cache',
                                device='cuda',embed_batch_size=1,max_length=3072)
Settings.llm = llm
Settings.tokenizer = tiktoken.encoding_for_model("gpt-3.5-turbo").encode
Settings.embed_model = embed_model
documents = SimpleDirectoryReader("../2024_report").load_data()
pipeline=IngestionPipeline(transformations=[MarkdownNodeParser(include_metadata=True,include_prev_next_rel=True),
                                            ])
nodes = pipeline.run(documents=documents)
index=VectorStoreIndex(nodes,context,show_progess=True)
query='hello'
retriever=VectorIndexRetriever(index=index,similarity_top_k=10,)
ret_nodes=retriever.retrieve(query)
for ret_node in ret_nodes:
    print(ret_node.score)

I'm reaching out to see if anyone has experienced similar issues. Ant insights or suggestions on how to solve this problem would be greatly appreciated. Thank you.

11 comments

L

j

jjoe273558

·

Hi everyone. There is a max token limit

Hi everyone. There is a max token limit for every embedding model. So if the size of the content I want to embed exceeds that token limit, what will happen? The kapa bot says that the model will only consider the first max_length tokens and ignore the rest. Is that the correct answer?

2 comments

L

W

jjoe273558

·

Hello Everyone, I am currently working

Hello Everyone, I am currently working on RAG. But the documents (or nodes) I want to retrieve are all very small (like only contains one or two sentences in each node), the result is not very satisfactory. So do you guys have any good suggestions on the choice of embed model and retriever?

1 comment

W

Find answers from the community

Hello everyone,

Hi everyone. There is a max token limit

Hello Everyone, I am currently working