Ollama

At a glance

The post is about a community member's experience trying to implement a local RAG (Retrieval Augmented Generation) with Ollama and Llama-Index. They found the Llama-Index documentation to be limited and not very useful.

The comments provide some guidance and suggestions from other community members. They recommend following the starter example in the Llama-Index documentation, which worked for one community member. However, another community member had trouble following the RAG agent documentation and encountered issues with using the OpenAI API without a paid subscription.

The community members discuss potential solutions, such as modifying the local Ollama code to avoid using the OpenAI API, and suggest contributing to the project by making a pull request. They also mention issues with the Llama.cpp library and recommend using Ollama instead, as it can run any model and automates more tasks.

The community members also discuss issues with the VLLM (Very Large Language Model) example in the Llama-Index documentation, which they found to be poorly documented and difficult to run without making changes.

There is no explicitly marked answer, but the community members provide suggestions and guidance to help the original poster resolve their issues.

Useful resources

llucaswillkill

anyone tried to implement a local rag with ollama and llama-index? i personally found llama-index document is very limited and not very useful

30 comments

LLogan M

It's in the quicstart yo
https://docs.llamaindex.ai/en/stable/getting_started/starter_example_local/

LLogan M

Theres like, everything in the docs

LLogan M

If you are new to llama index, I also recommend the entire learn section after that
https://docs.llamaindex.ai/en/stable/understanding/

It's written as an end to end tutorial

llucaswillkill

i just followed the starter example, it worked on my machine, thank you very much

llucaswillkill

however, what i had troubles earlier was following this link, https://docs.llamaindex.ai/en/stable/understanding/agent/rag_agent/

llucaswillkill

this is very limited because, i was natrually following the previous link, which was building an agent from local ollama, and it worked,

llucaswillkill

I simply could not figure out how i modify the local ollama code to add this RAG query just by reading the post, because clearly no matter how I set up which llm to use, the program will use openai api nevertheless

llucaswillkill

which triggered an error where i do not have an paid subscription from openai, well, I do not.

llucaswillkill

I think from this starter example, i might be able to alter the code from the link i sent to do the same without openai API

llucaswillkill

However, what would be the best way to communicate with the community or even contribute? which channel would be most active? with most people who are constantly developing llama-index?

LLogan M

The best option is to just make a PR tbh 😁 I'm the main maintainer. Happy to review and get stuff merged

LLogan M

There are two models, the embed model and llm. They both default to OpenAI unless you modify the Settings or pass them in

llucaswillkill

i see, well, i found a age old post where you guys had a conversation about exact error i am having right now from this post: https://docs.llamaindex.ai/en/stable/examples/llm/llama_2_llama_cpp/

llucaswillkill

the code simply wont run and throws the error: gguf_init_from_file: invalid magic characters 'tjgg'
llama_model_load: error loading model: llama_model_loader: failed to load model from /tmp/llama_index/models/llama-2-13b-chat.ggmlv3.q4_0.bin

llama_load_model_from_file: failed to load model
Traceback (most recent call last):
File "/home/alice7/.dev/llama-index/llama-cpp.py", line 7, in <module>
llm = LlamaCPP(
^^^^^^^^^
File "/home/alice7/.conda/lib/python3.12/site-packages/llama_index/llms/llama_cpp/base.py", line 173, in init
model = Llama(model_path=model_path, **model_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/alice7/.conda/lib/python3.12/site-packages/llama_cpp/llama.py", line 371, in init
_LlamaModel(
File "/home/alice7/.conda/lib/python3.12/site-packages/llama_cpp/_internals.py", line 55, in init
raise ValueError(f"Failed to load model from file: {path_model}")
ValueError: Failed to load model from file: /tmp/llama_index/models/llama-2-13b-chat.ggmlv3.q4_0.bin

llucaswillkill

https://github.com/run-llama/llama_index/issues/7547

llucaswillkill

and clearly this is the issue, i am not sure how to solve it other than trying to download the model manually, what is going on?

LLogan M

ggml is not supported anymore my llamacpp. Only gguf

LLogan M

Not sure why you'd use llama.cpp though when ollama exists. Ollama can run any model, and automates a ton of stuff for you

llucaswillkill

because i want more control? and potentially learning the framework?

llucaswillkill

this is the last post i want to go through: https://docs.llamaindex.ai/en/latest/examples/llm/vllm/

llucaswillkill

every previously attempted posts except the starter example was awfully bad documented and require here and there changes to be made before it can run.

llucaswillkill

this one, unfortunate, is no exception

llucaswillkill

it firstly asked me: pip install vllm,

llucaswillkill

then pip install llama-index-llms-vllm

llucaswillkill

and this is what i got:

llucaswillkill

Traceback (most recent call last):
File "/home/alice7/.conda/lib/python3.12/site-packages/llama_index/llms/vllm/base.py", line 202, in init
from vllm import LLM as VLLModel
ImportError: cannot import name 'LLM' from partially initialized module 'vllm' (most likely due to a circular import) (/home/alice7/.dev/llama-index/vllm.py)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/alice7/.conda/lib/python3.12/site-packages/llama_index/llms/vllm/base.py", line 202, in init
from vllm import LLM as VLLModel
File "/home/alice7/.dev/llama-index/vllm.py", line 6, in <module>
llm = Vllm(
^^^^^
File "/home/alice7/.conda/lib/python3.12/site-packages/llama_index/llms/vllm/base.py", line 204, in init
raise ImportError(
ImportError: Could not import vllm python package. Please install it with pip install vllm.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/alice7/.dev/llama-index/vllm.py", line 6, in <module>
llm = Vllm(
^^^^^
File "/home/alice7/.conda/lib/python3.12/site-packages/llama_index/llms/vllm/base.py", line 204, in init
raise ImportError(
ImportError: Could not import vllm python package. Please install it with pip install vllm.
/home/alice7/.conda/lib/python3.12/site-packages/torch/cuda/init.py:129: UserWarning: CUDA initialization: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero. (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.)
return torch._C._cuda_getDeviceCount() > 0

LLogan M

Seems like vllm/cuda doesn't see your gpus 🤷‍♂️ either that, or vllm is not installed properly (if you are in a notebook, you'll need to restart it)

LLogan M

Also, I know the docs aren't great, but it's free software lol so take it with a grain of salt my guy

If you see issues, contribute a pr, that's the best way to make a difference 😁

LLogan M

Most people don't install/use vllm directly -- they launch at as a openai server and use it with something like OpenAILike

LLogan M

Related to LlamaCPP, I contributed some updated docs this morning
https://github.com/run-llama/llama_index/pull/17647

Add a reply

Find answers from the community

Ollama