Find answers from the community

s
F
Y
a
P
Updated 2 years ago

Llamacpp

Hi, I'm trying to get llama-index working with llamacpp to query documents entirely locally.
I have the code working for openai, but when I pass llamacpp as the LLMPredictor, my cpu just ramps to 100% and hangs for hours. Anyone have any idea how to proceed? M1 MBP 32G, normal llamacpp works great.
L
V
l
18 comments
No change in behavior :\
Super weird πŸ€” can you turn debug logs on? Anything helpful gets printed?

Plain Text
import logging
import sys

logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))


Besides that, if it was me, I would use a debugger (pdb or pycharm) and step into the llama index code and see where the hold up is πŸ€”

I can try and find some time to run it myself, but it might take a day or two to find the time to set it up lol
Yeah I'm in pycharm. Nothing really coming out of the the increased logging level, but pausing it while it hung was a good idea, it is hanging on the return here in llama_cpp.py:
Plain Text
def llama_eval(
    ctx: llama_context_p,
    tokens,  # type: Array[llama_token]
    n_tokens: c_int,
    n_past: c_int,
    n_threads: c_int,
) -> c_int:
    return _lib.llama_eval(ctx, tokens, n_tokens, n_past, n_threads)
Hmmm yea I'm not totally familiar with llamacpp yet. My best guess is that it still has something to do with context length, but I'm not sure πŸ€”
Hmmmm I'll play around with it and let you know what I figure out! Thanks for thinking about this. I wasn't manually defining n_ctx, so I might play around there...
btw how often do you update the pip package? I saw the github master branch had some changes pushed today but no update to the pypi package.
You can clone/install directly from github to get the latest changes πŸ’ͺ

pip install --upgrade git+https://github.com/jerryjliu/llama_index.git

(I think that's the right command lol)
I'm loading the model with a simple:
Plain Text
llm_predictor = LLMPredictor(llm = LlamaCpp(model_path="~/Code/llamacpp/llama.cpp/models/7B/ggml-model-q4_0.bin", n_threads=20, n_ctx=512))
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, prompt_helper=prompt_helper)

Should I instead be using the full custom class implementation you have?
Is llamaCpp from langchain? If so, then that should be enough πŸ€”
Yeah, it is the langchain loader
Dang lol yea I think this just needs a serious debugging session. I can download the weights and give this a spin tomorrow
I'm out of suggestions at the moment πŸ˜… but looks easy enough to try
πŸ™ Thank you!
I'll keep playing and update this thread if I figure it out
Sounds good! I'll let you know once I try it out as well πŸ’ͺ
Just an update, same behavior across all llama ggml models and gpt4all ggml
Hmm very weird. I know one guy had gpt4all working, but they used the custom LLM approach

https://github.com/autratec/llm/blob/main/GPT4ALL_Indexing.ipynb
Hi, I'm trying to use LlamaCPP too, and it works with a small test dataset (just one text file with one sentence !). When I load more data, I always get a llama_tokenize: too many tokens error

Here is more context :
Plain Text
`
File ~/.pyenv/versions/3.11.2/lib/python3.11/site-packages/llama_index/indices/base.py:100, in BaseGPTIndex.from_documents(cls, documents, docstore, service_context, **kwargs)
     96     docstore.set_document_hash(doc.get_doc_id(), doc.get_doc_hash())
     98 nodes = service_context.node_parser.get_nodes_from_documents(documents)
--> 100 return cls(
    101     nodes=nodes,
    102     docstore=docstore,
    103     service_context=service_context,
    104     **kwargs,
    105 )

File ~/.pyenv/versions/3.11.2/lib/python3.11/site-packages/llama_index/indices/tree/base.py:72, in GPTTreeIndex.__init__(self, nodes, index_struct, service_context, summary_template, insert_prompt, num_children, build_tree, use_async, **kwargs)
     70 self.build_tree = build_tree
...
    114 if int(n_tokens) < 0:
--> 115     raise RuntimeError(f'Failed to tokenize: text="{text}" n_tokens={n_tokens}')
    116 return list(tokens[:n_tokens])
`
Add a reply
Sign up and join the conversation on Discord