Find answers from the community

Home
Members
icsy7867
i
icsy7867
Offline, last seen 3 months ago
Joined September 25, 2024
i
icsy7867
·

Results

Driving me nuts! I can't figure out why my embeddings and sources/documents returned by llama index are all of a sudden different. The results are bizzare!

I have several articles ingested. The two I will reference as an example is an article about cloud services within my company. The other an article referencing how to install matlab. I will call my company name (abbreviated) XYZ.

  • "How do I install matlab?" - Incorrect sources returned
  • "How do I install matlab? XYZ - Correct sources returned
  • "How do I install matlab? G" - Correct Sources returned
  • "How do I install matlab? Flux Capacitor" - Correct Sources returned
  • "How do I install matlab? How do I install matlab?" - Correct Sources returned
Similarly...

  • "Cloud Services" - Incorrect Sources returned
  • "Cloud Services Cloud Services" - Incorrect Sources returned
  • "Cloud Services. Cloud Services" - Incorrect Sources returned
  • "Cloud Services Cloud Services." - Incorrect Sources returned
  • "Cloud Services. Cloud Services." - Correct Sources returned
Driving. Me. Nuts.
Hope someone has a magical solution 😄

Using ollama and nomic-embed-text for embeddings. Using Llama_index via https://github.com/zylon-ai/private-gpt

And I should note I was using the tool happily before. Something has changed. I even tried loading code from known working code with the same result. I can't figure it out.
17 comments
i
L
Hah... llama index is using client version 1.7.3 apparently
3 comments
i
L
I am having a issue trying to use Open-Orca/OpenOrca-Platypus2-13B. I am gertting [/INST] all over the place and the model keeps chatting with itself. I am using vLLM currently as an "openailike" server.

I looked around the and found an issue where it said to use the STOP command in the API. This made everything work a lot better actually:

Plain Text
curl https://localhost:8000/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "Open-Orca/OpenOrca-Platypus2-13B",
        "stop": ["[INST]", "[/INST]"],
        "messages": [
            {"role": "user", "content": "What is the square root of two"}
        ] }'


But I can't see if there is a way for llamaindex to do this as well? I have read through the docs and looked at the code but couldnt figure out if there was an easier way to do this. Any ideas?
9 comments
i
L
Odd issue with settings ollama kwargs. Looking at the ollama documentation for the possible additional arguments:
https://github.com/ollama/ollama/blob/main/docs/modelfile.md

For num_predict - it tells you the default is 128. However if you dont set this in the additional arguments variable in llamaindex, you get way more than 128 as a response.

However if you set num_predict = 128 as an addition kv arg in llamaindex, it several limits the context of the response. It is easy enough to set this, but I am confused on what this value actually is if you dont set it.
3 comments
i
L
Random question... I see qdrant supports filtering the results by score, and langchain supports returning a score for the qdrant results which is useful. I searched the llama_index docs and didn't see anything similar. Was curios if someone knew if this was something that could be, or was implemented.
5 comments
T
i