Find answers from the community

Updated 2 months ago

hi

hi
i am trying to use local llm
service_context = ServiceContext.from_defaults(chunk_size=512, chunk_overlap=10,embed_model='local',llm='local')

but i am getting this error
File ~/miniconda3/lib/python3.11/site-packages/llama_index/llms/llama_cpp.py:168, in LlamaCPP.metadata(self)
164 @property
165 def metadata(self) -> LLMMetadata:
166 """LLM metadata."""
167 return LLMMetadata(
--> 168 context_window=self._model.context_params.n_ctx,
169 num_output=self.max_new_tokens,
170 model_name=self.model_path,
171 )

AttributeError: 'Llama' object has no attribute 'context_params'
L
i
t
54 comments
Try updating your llama-cpp-python installation
which version should i use?
Whatever is the latest/newest
It was added a few weeks ago
got the same error
are you in a notebook? try restarting the kernel
File ~/miniconda3/lib/python3.11/site-packages/llama_cpp/llama.py:2111, in Llama.n_ctx(self)
2109 def n_ctx(self) -> int:
2110 """Return the context window size."""
-> 2111 return self._ctx.n_ctx()

File ~/miniconda3/lib/python3.11/site-packages/llama_cpp/llama.py:428, in _LlamaContext.n_ctx(self)
427 def n_ctx(self) -> int:
--> 428 assert self.ctx is not None
429 return llama_cpp.llama_n_ctx(self.ctx)

AssertionError:
so not the same error πŸ˜‰
not sure how n_ctx is none, it's definitely defaulting to a value
pip show llama-cpp-python what do you see?
Name: llama_cpp_python
Version: 0.2.18
Summary: Python bindings for the llama.cpp library
Home-page:
Author:
Author-email: Andrei Betlen <abetlen@gmail.com>
License: MIT
Location: /Users/ilanpinto/miniconda3/lib/python3.11/site-packages
Requires: diskcache, numpy, typing-extensions
Required-by:
did a kernal restart in between
@Logan M i am out for 1H but feel free to replay back
i am stuck with issue 😦
You could just setup the LLM manually, not sure why it's giving issues with the shorthand "local"

https://docs.llamaindex.ai/en/stable/examples/llm/llama_2_llama_cpp.html
now i am getting this
gguf_init_from_file: invalid magic characters tjgg.
AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 |
error loading model: llama_model_loader: failed to load model from /Users/ilanpinto/Library/Caches/llama_index/models/llama-2-13b-chat.ggmlv3.q4_0.bin

llama_load_model_from_file: failed to load model
maybe delete the cache folder
notice that the downloaded model is ggml, but we want a gguf model actually
this /Users/ilanpinto/Library/Caches/llama_index/models one?
yea delete that
and change the url to
what the diffrent between gguf and ggml?
Ggml was an old file type they stopped supporting
After 0.1.79 only gguf works
my colleuge Erik Jacobs says hi πŸ™‚
Haha hey Erik!
seems to work thank you!!!
got the same err 😦
File ~/miniconda3/lib/python3.11/site-packages/llama_cpp/llama.py:2111, in Llama.n_ctx(self)
2109 def n_ctx(self) -> int:
2110 """Return the context window size."""
-> 2111 return self._ctx.n_ctx()

File ~/miniconda3/lib/python3.11/site-packages/llama_cpp/llama.py:428, in _LlamaContext.n_ctx(self)
427 def n_ctx(self) -> int:
--> 428 assert self.ctx is not None
429 return llama_cpp.llama_n_ctx(self.ctx)

AssertionError:
after kernel restart
mmmmm but why πŸ˜…
ugh I hate llama-cpp lol
I have no idea anymore tbh, I would have to spin up llama-cpp and debug, but I don't really have time at the moment πŸ˜… I suggest getting it working without llama-index (loading and creating the model with llama-cpp directly), and then we can figure it out the differences from there
could that be related to jupyter?
I mean... maybe? Maybe try running in a .py script and see if that makes a difference πŸ˜…
checking , in the meamwhile
another question
does the below version comaptible ?
llama-index==0.9.0
llama_cpp_python==0.2.18
it should be fine yea
ok just to santiy check, running this myself now. Downloading the model file currently
in a fresh env I installed llama-index and llama-cpp-python, and this code ran fine (abbreivated for reading)

Plain Text
>>> from llama_index.llms.utils import resolve_llm
>>> llm = resolve_llm("local")
Downloading url https://huggingface.co/TheBloke/Llama-2-13B-chat-GGUF/resolve/main/llama-2-13b-chat.Q4_0.gguf to path /tmp/llama_index/models/llama-2-13b-chat.Q4_0.gguf
total size (MB): 7365.83
7025it [03:46, 31.02it/s]                                                       
llama_model_loader: loaded meta data with 19 key-value pairs and 363 tensors from /tmp/llama_index/models/llama-2-13b-chat.Q4_0.gguf (version GGUF V2)
...
AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | 
Then

Plain Text
>>> llm.complete("Hello!")

llama_print_timings:        load time =    9481.09 ms
llama_print_timings:      sample time =      23.21 ms /    78 runs   (    0.30 ms per token,  3361.05 tokens per second)
llama_print_timings: prompt eval time =    9480.97 ms /    67 tokens (  141.51 ms per token,     7.07 tokens per second)
llama_print_timings:        eval time =   29942.15 ms /    77 runs   (  388.86 ms per token,     2.57 tokens per second)
llama_print_timings:       total time =   39609.32 ms
CompletionResponse(text="  Hello! I'm here to assist you with any questions or tasks you may have. Please feel free to ask me anything, and I will do my best to provide a helpful and accurate response. I am programmed to be respectful, honest, and to follow all given instructions. Please go ahead and ask your question or provide the task you would like me to complete.", additional_kwargs={}, raw={'id': 'cmpl-90b687a9-13f8-4428-ac0f-2437c7cd173b', 'object': 'text_completion', 'created': 1700170879, 'model': '/tmp/llama_index/models/llama-2-13b-chat.Q4_0.gguf', 'choices': [{'text': "  Hello! I'm here to assist you with any questions or tasks you may have. Please feel free to ask me anything, and I will do my best to provide a helpful and accurate response. I am programmed to be respectful, honest, and to follow all given instructions. Please go ahead and ask your question or provide the task you would like me to complete.", 'index': 0, 'logprobs': None, 'finish_reason': 'stop'}], 'usage': {'prompt_tokens': 67, 'completion_tokens': 77, 'total_tokens': 144}}, delta=None)
Works πŸ€”
Maybe start over with a fresh venv πŸ˜…
python version?
now i have other errors but i think i can handle it
thank you very much!!! @Logan M
Awesome, sounds good! (Well, not good, but better!)
Add a reply
Sign up and join the conversation on Discord