LlamaIndex

Log inLog into community

Find answers from the community

Updated 6 months ago

Hi, @Logan M

Hi, @Logan M

At a glance

The community members discuss using VectorStore with the llama_index Ollama (LLaMA 3 model) to chat with their own document uploaded on the Quadrant server. The discussion covers the following key points:

1. The community members confirm that VectorStore can be used with the Ollama model, and provide sample code to create a VectorStoreIndex object and pass the Ollama instance as the LLM.

2. They discuss how to insert document data into the vector database, with a community member providing the necessary code.

3. A community member encounters an issue with loading the OpenAI embedding model, and is advised to define their own embedding model and set it in the Settings.

4. The community members discuss ways to limit the LLM response, such as modifying the prompt or using a SimilarityPostprocessor, but the suggested solutions do not work for the user.

5. The community members discuss the slow performance of the query_engine.query operation, and suggest running the Ollama model on a Colab environment with a GPU to improve the response time.

There is no explicitly marked answer in the provided information.

Useful resources

·

Hi,
Can we use VectorStore in the llama_index Ollama (LLaMA 3 model)? using Qdrant
My task is to chat with my own document, which is uploaded on the Quadrant server, chat using Ollama (LLaMA 3 model).

W

V

L

26 comments

Yes you can:
https://docs.llamaindex.ai/en/stable/examples/llm/ollama/?h=ollama

Once you have the llm, create the VectorStoreIndex object like you do, pass the llm or add it to Settings.

Plain Text

from llama_index.core import VectorStoreIndex, Settings
llm = ollama instance
Settings.llm = llm
# create instance for qdrant
vector_store = Qdrant vector store instance
# pass it to index
index = VectorStoreIndex.from_vector_store(vector_store=vector_store)

Thanks @WhiteFang_Jr ,
but how can we insert the document file data to our Vector DB.

yes, once you have the Document object you can simply insert them into index.

Plain Text

for doc in docs:
  index.insert(doc)

https://docs.llamaindex.ai/en/stable/module_guides/indexing/document_management/?h=insertion#insertion

Hi @WhiteFang_Jr
When I execute this code, I encounter an issue with the following error:
ValueError:
**
Could not load OpenAI embedding model. If you intended to use OpenAI, please check your OPENAI_API_KEY.
Original error:
No API key found for OpenAI.
Please set either the OPENAI_API_KEY environment variable or openai.api_key prior to initialization.
API keys can be found or created at https://platform.openai.com/account/api-keys

Consider using embed_model='local'.
Visit our documentation for more embedding options: https://docs.llamaindex.ai/en/stable/module_guides/models/embeddings.html#modules
**

You'll have to define your embedding model. Either pass it down or set it with Settings at the top ( i prefer this )

Plain Text

from llama_index.core import Settings
# After defining your embedding model
Settings.embed_model = embed_model # Your embed model instance here

Thanks @WhiteFang_Jr embedding successfully created!

@WhiteFang_Jr
can you please help me for one more thing .
actually now i want to search a query regarding of the my uploaded document(embedding). When it find exact match then give a result else say nothing. So how can i do this i'll tryed multiple ways but didn't anyone works.

Did you try modifying the prompt?
Or adding a SimilarityPostprocessor in your query_engine ?
This will filter out the below set threshold values nodes.

These options will help you to limit LLM to respond

I used this, but it did not provide the correct answer.

and one more thing why it will take a lots of time for performing all the operations?

Which step?

query_engine.query

its take a lots of time for generating a query response.

If you are using a local model, it will be quite slow

So the llama3 is local model ?

If the llama3 is the local model, so instead of this which one i use ?

How are you running llama3 model? If you have kept it running on your machine then yes it is locally running. If you are using service like replicate or grok then its not

yes so it's local

Using ollama run llama3

Yeah that's why it is taking time

Can you please suggest me a correct way to reduce the time taken to query response.

You can try running this on colab using Ollama. It will respond faster as it will have around 16GB GPU over there

Do you have any idea how I can do this?

I am following this docs https://docs.llamaindex.ai/en/stable/examples/llm/ollama/

Hy, @WhiteFang_Jr

Add a reply

Sign up and join the conversation on Discord