icsy7867

Results

Driving me nuts! I can't figure out why my embeddings and sources/documents returned by llama index are all of a sudden different. The results are bizzare!

I have several articles ingested. The two I will reference as an example is an article about cloud services within my company. The other an article referencing how to install matlab. I will call my company name (abbreviated) XYZ.

"How do I install matlab?" - Incorrect sources returned
"How do I install matlab? XYZ - Correct sources returned
"How do I install matlab? G" - Correct Sources returned
"How do I install matlab? Flux Capacitor" - Correct Sources returned
"How do I install matlab? How do I install matlab?" - Correct Sources returned

Similarly...

"Cloud Services" - Incorrect Sources returned
"Cloud Services Cloud Services" - Incorrect Sources returned
"Cloud Services. Cloud Services" - Incorrect Sources returned
"Cloud Services Cloud Services." - Incorrect Sources returned
"Cloud Services. Cloud Services." - Correct Sources returned

Driving. Me. Nuts.
Hope someone has a magical solution 😄

Using ollama and nomic-embed-text for embeddings. Using Llama_index via https://github.com/zylon-ai/private-gpt

And I should note I was using the tool happily before. Something has changed. I even tried loading code from known working code with the same result. I can't figure it out.

17 comments

iicsy7867

Hah... llama index is using client

Hah... llama index is using client version 1.7.3 apparently

3 comments

iicsy7867

I am having a issue trying to use Open-

I am having a issue trying to use Open-Orca/OpenOrca-Platypus2-13B. I am gertting [/INST] all over the place and the model keeps chatting with itself. I am using vLLM currently as an "openailike" server.

I looked around the and found an issue where it said to use the STOP command in the API. This made everything work a lot better actually:

Plain Text

curl https://localhost:8000/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "Open-Orca/OpenOrca-Platypus2-13B",
        "stop": ["[INST]", "[/INST]"],
        "messages": [
            {"role": "user", "content": "What is the square root of two"}
        ] }'

But I can't see if there is a way for llamaindex to do this as well? I have read through the docs and looked at the code but couldnt figure out if there was an easier way to do this. Any ideas?

9 comments

iicsy7867

ollama/docs/modelfile.md at main · ollam...

Odd issue with settings ollama kwargs. Looking at the ollama documentation for the possible additional arguments:
https://github.com/ollama/ollama/blob/main/docs/modelfile.md

For num_predict - it tells you the default is 128. However if you dont set this in the additional arguments variable in llamaindex, you get way more than 128 as a response.

However if you set num_predict = 128 as an addition kv arg in llamaindex, it several limits the context of the response. It is easy enough to set this, but I am confused on what this value actually is if you dont set it.

3 comments

iicsy7867

Random question... I see qdrant supports

Random question... I see qdrant supports filtering the results by score, and langchain supports returning a score for the qdrant results which is useful. I searched the llama_index docs and didn't see anything similar. Was curios if someone knew if this was something that could be, or was implemented.

5 comments

Find answers from the community

Results

Hah... llama index is using client

I am having a issue trying to use Open-

ollama/docs/modelfile.md at main · ollam...

Random question... I see qdrant supports