Find answers from the community

Updated 2 years ago

Loading model

oh ok! how do I do that with the model that llama index downloads ( I can set a cache dir, etc.)
L
n
22 comments
You said you are using some variant of gpt4all right? So you are using a customLLM class with it?

You can try printing the prompt just before running it through the model in the _call function

Then once you have the prompt, you can try running it through the model pipeline on its own to try and debug issues
Oh, I guess I should have said how do I load the model pipeline outside of llama-index (on its own)
How are you loading it now?
Plain Text
def initialize_index():

    global index, stored_docs

    llm_predictor = LLMPredictor(llm=Gpt4AllLlm())
    embed_model = LangchainEmbedding(HuggingFaceEmbeddings())
    prompt_helper = PromptHelper(max_input_size, num_output, max_chunk_overlap, chunk_size_limit=chunk_size_limit)

    service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor,
                                                   embed_model=embed_model,
                                                   prompt_helper=prompt_helper,
                                                   chunk_size_limit=chunk_size_limit)

    documents = SimpleDirectoryReader('../data/documents').load_data()
    index = GPTListIndex.from_documents(documents, service_context=service_context)
and then ...
Plain Text
peft_model_id = "nomic-ai/gpt4all-lora"
config = PeftConfig.from_pretrained(peft_model_id)
model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path, cache_dir="/Users/../PycharmProjects/jtcPoc/data/model")
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path, cache_dir="/Users/../PycharmProjects/jtcPoc/data/tokenizer")
gpt4all_model = PeftModel.from_pretrained(model, peft_model_id, cache_dir="/Users/../PycharmProjects/jtcPoc/data/model")

JTC_QA_PROMPT = (
    "Perform the following instructions: \n"
    .... blah blah .... 
    "Please return only the return_object in desiredObjectFormat JSON format.")

FULL_PROMPT = QuestionAnswerPrompt(JTC_QA_PROMPT)


class Gpt4AllLlm(LLM):
    def _call(self, prompt: str, stop: Optional[List[str]] = None, **kwargs) -> str:
        print('--- prompt was: ---- ')
        print(prompt)
        print('***** end prompt **** ')

        inputs = tokenizer(prompt, return_tensors="pt", )
        input_ids = inputs["input_ids"]
        generation_config = GenerationConfig(
            temperature=0.1,
            top_p=0.95,
            repetition_penalty=1.2,
        )
        generation_output = gpt4all_model.generate(
            input_ids=input_ids,
            generation_config=generation_config,
            output_scores=True,
            max_new_tokens=num_output
        )
        response = tokenizer.decode(generation_output[0], skip_special_tokens=True).strip()
        return response[len(prompt):]

    @property
    def _identifying_params(self) -> Mapping[str, Any]:
        return {"name_of_model": "GPT4ALL"}

    @property
    def _llm_type(self) -> str:
        return "custom"
I'm guessing here? model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path,
Yea so basically tokenize the prompt and pass it to the model, and then decode, but all in its own little test script

Just to ensure it works πŸ˜…
Just need to hardcode the prompt to test it with
I guess you anticipated this error:

ValueError: text input must of type str (single example), List[str] (batch or single pretokenized example) or List[List[str]] (batch of pretokenized examples).
I'm just passing a FULL_PROMPT string which has the

"{context_str} \n"
I guess I just have to fake that along with the query str?
Yea, you'll have to pass that as well

You could extract a real example from llama index by adding a debug print statement as i suggested earlier, or just make up your own example πŸ‘
Although make sure you cast it as a string or something (that's what the error is complaining about)
This is just to test that the inputs generated by llama index can be properly run through the model, so as real as possible examples are best
I do find Empty Response for more inputs than I thought
going directly to the model even.
I haven't tracked if its a symptom of the token length or the difficulty of the question
its hard to bifurcate
I'm trying but no avail yet
Interesting πŸ€” at least it's happening directly with the model, so that narrows down the scope of debugging.

You can check token lengths by printing the length of the input_ids
Add a reply
Sign up and join the conversation on Discord