Llamacpp

At a glance

Hi, I'm trying to get llama-index working with llamacpp to query documents entirely locally.
I have the code working for openai, but when I pass llamacpp as the LLMPredictor, my cpu just ramps to 100% and hangs for hours. Anyone have any idea how to proceed? M1 MBP 32G, normal llamacpp works great.

18 comments

LLogan M

Try using the prompt helper, and set the max_input_size to 2048. See example here: https://gpt-index.readthedocs.io/en/latest/how_to/customization/custom_llms.html#example-using-a-custom-llm-model

VVoldev

No change in behavior :\

LLogan M

Super weird 🤔 can you turn debug logs on? Anything helpful gets printed?

Plain Text

import logging
import sys

logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

Besides that, if it was me, I would use a debugger (pdb or pycharm) and step into the llama index code and see where the hold up is 🤔

I can try and find some time to run it myself, but it might take a day or two to find the time to set it up lol

VVoldev

Yeah I'm in pycharm. Nothing really coming out of the the increased logging level, but pausing it while it hung was a good idea, it is hanging on the return here in llama_cpp.py:

Plain Text

def llama_eval(
    ctx: llama_context_p,
    tokens,  # type: Array[llama_token]
    n_tokens: c_int,
    n_past: c_int,
    n_threads: c_int,
) -> c_int:
    return _lib.llama_eval(ctx, tokens, n_tokens, n_past, n_threads)

LLogan M

Hmmm yea I'm not totally familiar with llamacpp yet. My best guess is that it still has something to do with context length, but I'm not sure 🤔

VVoldev

Hmmmm I'll play around with it and let you know what I figure out! Thanks for thinking about this. I wasn't manually defining n_ctx, so I might play around there...

VVoldev

btw how often do you update the pip package? I saw the github master branch had some changes pushed today but no update to the pypi package.

LLogan M

You can clone/install directly from github to get the latest changes 💪

pip install --upgrade git+https://github.com/jerryjliu/llama_index.git

(I think that's the right command lol)

VVoldev

I'm loading the model with a simple:

Plain Text

llm_predictor = LLMPredictor(llm = LlamaCpp(model_path="~/Code/llamacpp/llama.cpp/models/7B/ggml-model-q4_0.bin", n_threads=20, n_ctx=512))
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, prompt_helper=prompt_helper)

Should I instead be using the full custom class implementation you have?

LLogan M

Is llamaCpp from langchain? If so, then that should be enough 🤔

VVoldev

Yeah, it is the langchain loader

LLogan M

Dang lol yea I think this just needs a serious debugging session. I can download the weights and give this a spin tomorrow

LLogan M

I'm out of suggestions at the moment 😅 but looks easy enough to try

VVoldev

🙏 Thank you!
I'll keep playing and update this thread if I figure it out

LLogan M

Sounds good! I'll let you know once I try it out as well 💪

VVoldev

Just an update, same behavior across all llama ggml models and gpt4all ggml

LLogan M

Hmm very weird. I know one guy had gpt4all working, but they used the custom LLM approach

https://github.com/autratec/llm/blob/main/GPT4ALL_Indexing.ipynb

llucast

Hi, I'm trying to use LlamaCPP too, and it works with a small test dataset (just one text file with one sentence !). When I load more data, I always get a llama_tokenize: too many tokens error

Here is more context :

Plain Text

`
File ~/.pyenv/versions/3.11.2/lib/python3.11/site-packages/llama_index/indices/base.py:100, in BaseGPTIndex.from_documents(cls, documents, docstore, service_context, **kwargs)
     96     docstore.set_document_hash(doc.get_doc_id(), doc.get_doc_hash())
     98 nodes = service_context.node_parser.get_nodes_from_documents(documents)
--> 100 return cls(
    101     nodes=nodes,
    102     docstore=docstore,
    103     service_context=service_context,
    104     **kwargs,
    105 )

File ~/.pyenv/versions/3.11.2/lib/python3.11/site-packages/llama_index/indices/tree/base.py:72, in GPTTreeIndex.__init__(self, nodes, index_struct, service_context, summary_template, insert_prompt, num_children, build_tree, use_async, **kwargs)
     70 self.build_tree = build_tree
...
    114 if int(n_tokens) < 0:
--> 115     raise RuntimeError(f'Failed to tokenize: text="{text}" n_tokens={n_tokens}')
    116 return list(tokens[:n_tokens])

Add a reply

Find answers from the community

Llamacpp