LlamaIndex

Log inLog into community

Find answers from the community

Updated 2 years ago

Loading model

Loading model

At a glance

The community member is trying to load and use a GPT4All model outside of the LlamaIndex library. They have provided code snippets showing how they are currently loading the model and tokenizer, as well as a custom LLM class they have created. The comments suggest that the community member should try printing the prompt before passing it to the model, and then running the prompt through the model pipeline separately to debug any issues. The community members are also discussing potential problems with token lengths and the difficulty of the input questions leading to empty responses from the model.

·

oh ok! how do I do that with the model that llama index downloads ( I can set a cache dir, etc.)

L

n

22 comments

You said you are using some variant of gpt4all right? So you are using a customLLM class with it?

You can try printing the prompt just before running it through the model in the _call function

Then once you have the prompt, you can try running it through the model pipeline on its own to try and debug issues

Oh, I guess I should have said how do I load the model pipeline outside of llama-index (on its own)

How are you loading it now?

Plain Text

def initialize_index():

    global index, stored_docs

    llm_predictor = LLMPredictor(llm=Gpt4AllLlm())
    embed_model = LangchainEmbedding(HuggingFaceEmbeddings())
    prompt_helper = PromptHelper(max_input_size, num_output, max_chunk_overlap, chunk_size_limit=chunk_size_limit)

    service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor,
                                                   embed_model=embed_model,
                                                   prompt_helper=prompt_helper,
                                                   chunk_size_limit=chunk_size_limit)

    documents = SimpleDirectoryReader('../data/documents').load_data()
    index = GPTListIndex.from_documents(documents, service_context=service_context)

and then ...

Plain Text

peft_model_id = "nomic-ai/gpt4all-lora"
config = PeftConfig.from_pretrained(peft_model_id)
model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path, cache_dir="/Users/../PycharmProjects/jtcPoc/data/model")
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path, cache_dir="/Users/../PycharmProjects/jtcPoc/data/tokenizer")
gpt4all_model = PeftModel.from_pretrained(model, peft_model_id, cache_dir="/Users/../PycharmProjects/jtcPoc/data/model")

JTC_QA_PROMPT = (
    "Perform the following instructions: \n"
    .... blah blah .... 
    "Please return only the return_object in desiredObjectFormat JSON format.")

FULL_PROMPT = QuestionAnswerPrompt(JTC_QA_PROMPT)


class Gpt4AllLlm(LLM):
    def _call(self, prompt: str, stop: Optional[List[str]] = None, **kwargs) -> str:
        print('--- prompt was: ---- ')
        print(prompt)
        print('***** end prompt **** ')

        inputs = tokenizer(prompt, return_tensors="pt", )
        input_ids = inputs["input_ids"]
        generation_config = GenerationConfig(
            temperature=0.1,
            top_p=0.95,
            repetition_penalty=1.2,
        )
        generation_output = gpt4all_model.generate(
            input_ids=input_ids,
            generation_config=generation_config,
            output_scores=True,
            max_new_tokens=num_output
        )
        response = tokenizer.decode(generation_output[0], skip_special_tokens=True).strip()
        return response[len(prompt):]

    @property
    def _identifying_params(self) -> Mapping[str, Any]:
        return {"name_of_model": "GPT4ALL"}

    @property
    def _llm_type(self) -> str:
        return "custom"

I'm guessing here? model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path,

Yea so basically tokenize the prompt and pass it to the model, and then decode, but all in its own little test script

Just to ensure it works 😅

Just need to hardcode the prompt to test it with

I guess you anticipated this error:

ValueError: text input must of type str (single example), List[str] (batch or single pretokenized example) or List[List[str]] (batch of pretokenized examples).

I'm just passing a FULL_PROMPT string which has the

"{context_str} \n"

I guess I just have to fake that along with the query str?

Yea, you'll have to pass that as well

You could extract a real example from llama index by adding a debug print statement as i suggested earlier, or just make up your own example 👍

Although make sure you cast it as a string or something (that's what the error is complaining about)

This is just to test that the inputs generated by llama index can be properly run through the model, so as real as possible examples are best

right

I do find Empty Response for more inputs than I thought

going directly to the model even.

I haven't tracked if its a symptom of the token length or the difficulty of the question

its hard to bifurcate

I'm trying but no avail yet

Interesting 🤔 at least it's happening directly with the model, so that narrows down the scope of debugging.

You can check token lengths by printing the length of the input_ids

Add a reply

Sign up and join the conversation on Discord