LlamaIndex

Log inLog into community

Find answers from the community

Updated 6 months ago

hi

hi

At a glance

·

hi
i am trying to use local llm
service_context = ServiceContext.from_defaults(chunk_size=512, chunk_overlap=10,embed_model='local',llm='local')

but i am getting this error
File ~/miniconda3/lib/python3.11/site-packages/llama_index/llms/llama_cpp.py:168, in LlamaCPP.metadata(self)
164 @property
165 def metadata(self) -> LLMMetadata:
166 """LLM metadata."""
167 return LLMMetadata(
--> 168 context_window=self._model.context_params.n_ctx,
169 num_output=self.max_new_tokens,
170 model_name=self.model_path,
171 )

AttributeError: 'Llama' object has no attribute 'context_params'

L

i

t

54 comments

Try updating your llama-cpp-python installation

which version should i use?

Whatever is the latest/newest

It was added a few weeks ago

It's in the current source code for llama-cpp-python
https://github.com/abetlen/llama-cpp-python/blob/96a377648c97113f443cafd41b6b9ae7f0e4e5ef/llama_cpp/llama.py#L844

got the same error

are you in a notebook? try restarting the kernel

File ~/miniconda3/lib/python3.11/site-packages/llama_cpp/llama.py:2111, in Llama.n_ctx(self)
2109 def n_ctx(self) -> int:
2110 """Return the context window size."""
-> 2111 return self._ctx.n_ctx()

File ~/miniconda3/lib/python3.11/site-packages/llama_cpp/llama.py:428, in _LlamaContext.n_ctx(self)
427 def n_ctx(self) -> int:
--> 428 assert self.ctx is not None
429 return llama_cpp.llama_n_ctx(self.ctx)

AssertionError:

so not the same error 😉

not sure how n_ctx is none, it's definitely defaulting to a value

pip show llama-cpp-python what do you see?

Name: llama_cpp_python
Version: 0.2.18
Summary: Python bindings for the llama.cpp library
Home-page:
Author:
Author-email: Andrei Betlen <abetlen@gmail.com>
License: MIT
Location: /Users/ilanpinto/miniconda3/lib/python3.11/site-packages
Requires: diskcache, numpy, typing-extensions
Required-by:

did a kernal restart in between

@Logan M i am out for 1H but feel free to replay back
i am stuck with issue 😦

You could just setup the LLM manually, not sure why it's giving issues with the shorthand "local"

https://docs.llamaindex.ai/en/stable/examples/llm/llama_2_llama_cpp.html

now i am getting this
gguf_init_from_file: invalid magic characters tjgg.
AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 |
error loading model: llama_model_loader: failed to load model from /Users/ilanpinto/Library/Caches/llama_index/models/llama-2-13b-chat.ggmlv3.q4_0.bin

llama_load_model_from_file: failed to load model

hmm closer!

maybe delete the cache folder

notice that the downloaded model is ggml, but we want a gguf model actually

this /Users/ilanpinto/Library/Caches/llama_index/models one?

yea delete that

and change the url to

"https://huggingface.co/TheBloke/Llama-2-13B-chat-GGUF/resolve/main/llama-2-13b-chat.Q4_0.gguf"

downloading!!

what the diffrent between gguf and ggml?

?

Ggml was an old file type they stopped supporting

After 0.1.79 only gguf works

my colleuge Erik Jacobs says hi 🙂

Haha hey Erik!

:BOUNCE:

Red Hatters

seems to work thank you!!!

got the same err 😦
File ~/miniconda3/lib/python3.11/site-packages/llama_cpp/llama.py:2111, in Llama.n_ctx(self)
2109 def n_ctx(self) -> int:
2110 """Return the context window size."""
-> 2111 return self._ctx.n_ctx()

File ~/miniconda3/lib/python3.11/site-packages/llama_cpp/llama.py:428, in _LlamaContext.n_ctx(self)
427 def n_ctx(self) -> int:
--> 428 assert self.ctx is not None
429 return llama_cpp.llama_n_ctx(self.ctx)

AssertionError:

after kernel restart

mmmmm but why 😅

ugh I hate llama-cpp lol

I have no idea anymore tbh, I would have to spin up llama-cpp and debug, but I don't really have time at the moment 😅 I suggest getting it working without llama-index (loading and creating the model with llama-cpp directly), and then we can figure it out the differences from there

👋

could that be related to jupyter?

@Logan M

I mean... maybe? Maybe try running in a .py script and see if that makes a difference 😅

checking , in the meamwhile
another question
does the below version comaptible ?
llama-index==0.9.0
llama_cpp_python==0.2.18

it should be fine yea

ok just to santiy check, running this myself now. Downloading the model file currently

in a fresh env I installed llama-index and llama-cpp-python, and this code ran fine (abbreivated for reading)

Plain Text

>>> from llama_index.llms.utils import resolve_llm
>>> llm = resolve_llm("local")
Downloading url https://huggingface.co/TheBloke/Llama-2-13B-chat-GGUF/resolve/main/llama-2-13b-chat.Q4_0.gguf to path /tmp/llama_index/models/llama-2-13b-chat.Q4_0.gguf
total size (MB): 7365.83
7025it [03:46, 31.02it/s]                                                       
llama_model_loader: loaded meta data with 19 key-value pairs and 363 tensors from /tmp/llama_index/models/llama-2-13b-chat.Q4_0.gguf (version GGUF V2)
...
AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 |

Then

Plain Text

>>> llm.complete("Hello!")

llama_print_timings:        load time =    9481.09 ms
llama_print_timings:      sample time =      23.21 ms /    78 runs   (    0.30 ms per token,  3361.05 tokens per second)
llama_print_timings: prompt eval time =    9480.97 ms /    67 tokens (  141.51 ms per token,     7.07 tokens per second)
llama_print_timings:        eval time =   29942.15 ms /    77 runs   (  388.86 ms per token,     2.57 tokens per second)
llama_print_timings:       total time =   39609.32 ms
CompletionResponse(text="  Hello! I'm here to assist you with any questions or tasks you may have. Please feel free to ask me anything, and I will do my best to provide a helpful and accurate response. I am programmed to be respectful, honest, and to follow all given instructions. Please go ahead and ask your question or provide the task you would like me to complete.", additional_kwargs={}, raw={'id': 'cmpl-90b687a9-13f8-4428-ac0f-2437c7cd173b', 'object': 'text_completion', 'created': 1700170879, 'model': '/tmp/llama_index/models/llama-2-13b-chat.Q4_0.gguf', 'choices': [{'text': "  Hello! I'm here to assist you with any questions or tasks you may have. Please feel free to ask me anything, and I will do my best to provide a helpful and accurate response. I am programmed to be respectful, honest, and to follow all given instructions. Please go ahead and ask your question or provide the task you would like me to complete.", 'index': 0, 'logprobs': None, 'finish_reason': 'stop'}], 'usage': {'prompt_tokens': 67, 'completion_tokens': 77, 'total_tokens': 144}}, delta=None)

Works 🤔

Maybe start over with a fresh venv 😅

python version?

3.11

now i have other errors but i think i can handle it

thank you very much!!! @Logan M

Awesome, sounds good! (Well, not good, but better!)

Add a reply

Sign up and join the conversation on Discord