The default model that loads for the

At a glance

The default model that loads for the huggingface embeddings in the docs page that i sent usually works well

For LLMs, vicuna seems to be good (but it's also non-commericial). I like camel for commercial models so far

67 comments

nnbulkz

Where do you find the Camel ones?

LLogan M

Camel is here https://huggingface.co/Writer/camel-5b-hf

nnbulkz

why this one over say, mname = "databricks/dolly-v2-12b"
?

LLogan M

Meh, I've tried dolly with llama index. This one is smaller and seems to give better responses

LLogan M

You can definitely try dolly though

nnbulkz

ok interesting

nnbulkz

in what ways was it better for you?

nnbulkz

ooo I like that it is instruction following trained

nnbulkz

that is what I am looking for ....

LLogan M

For the refine prompt in llama index, camel worked much better. This is a super difficult prompt for most models

Plus it uses less resources, and is faster because of that

nnbulkz

yeahhh

nnbulkz

I watched that video by Andrew Ng and Open AI and I got jaded

nnbulkz

about how easy it should be to string together instructinos

LLogan M

https://github.com/jerryjliu/llama_index/blob/b2e228d56dcd95ea57dd31c621772d258aca57e6/llama_index/prompts/default_prompts.py#L90

This prompt is like the ultimate LLM test hahaha

LLogan M

Even gpt-3.5 kinda stinks at it

nnbulkz

ah good to know

nnbulkz

I think the generality is not something I actually need - I have a very specific task that I bet I could just train on

nnbulkz

I just don't feel like collecting all the data and formatting it correctly

nnbulkz

how do I know whether to use GPTNeoXTokenizerFast

nnbulkz

or LlamandexTokenizer

nnbulkz

or whatever

LLogan M

Usually you can just use AutoTokenizer and pass the model name

Otherwise, the model card on huggingface should have an explicit demo

nnbulkz

is there a way from huggingface to see the default number of support input tokens?

nnbulkz

or max number?

LLogan M

Eh it's kind of convoluted. You usually have to look at the config json or read the model card

I wish there was an easier way though

LLogan M

Using the config will have something like max_possition_embeddings, or something like that

LLogan M

I can look at model in particular if you cant find it

nnbulkz

I thought it was a token limit thing but now

nnbulkz

I'm seeing something different

nnbulkz

Plain Text

                    raise ValueError("If `eos_token_id` is defined, make sure that `pad_token_id` is defined.")

nnbulkz

no idea how to fix that

LLogan M

That's.. interesting lol

What model is this?

nnbulkz

Camel

LLogan M

Can i see how you set it up? I never got that error when I tested a few days ago

nnbulkz

Plain Text

mname = "Writer/camel-5b-hf"
tokenizer = AutoTokenizer.from_pretrained(mname, cache_dir="../camel/tokenizer")
model = AutoModelForCausalLM.from_pretrained(mname, device_map="auto", cache_dir="../camel/model")
FULL_PROMPT = QuestionAnswerPrompt(JTC_QA_PROMPT2)


class CamelLLM(LLM):

    def _call(self, prompt: str, stop: Optional[List[str]] = None) -> str:

        print(prompt)
        generation_config = GenerationConfig(
            max_new_tokens=5000,
            temperature=0.1,
            repetition_penalty=1.0,
        )
        inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

        with torch.inference_mode():
            tokens = model.generate(**inputs, generation_config=generation_config, output_scores=True, max_new_tokens=num_output)

        response = tokenizer.decode(tokens[0], skip_special_tokens=True).strip()
        return response[len(prompt):]

    @property
    def _identifying_params(self) -> Mapping[str, Any]:
        return {"name_of_model": model}

    @property
    def _llm_type(self) -> str:
        return "custom"

nnbulkz

I've been using the same template for a class and over writing/copy pasting

nnbulkz

so it could be a mismatch of parameters or something obscure

LLogan M

Very sus haha

One note, the max new tokens is too big. Try like 256 or 512

Do you know which line of code raises that error?

LLogan M

Also last note, llama index has native support for huggingface, i would trust it a bit more
https://gpt-index.readthedocs.io/en/latest/examples/customization/llms/SimpleIndexDemo-Huggingface_camel.html

oh wow

hmm

I gte this ...

Plain Text

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.

LLogan M

That's fine

nnbulkz

what does that mean?

LLogan M

Just that the model will talk until it predicts a special token (or it runs out of room)

nnbulkz

OK I think I just was mixing up two different custom classes

nnbulkz

Plain Text

        response = tokenizer.decode(generation_output[0], skip_special_tokens=True).strip()
        return response[len(prompt):]

nnbulkz

here what is the return response[len(prompt):] mean?

nnbulkz

becuase for Camel on Hugging face, it looks different

nnbulkz

Plain Text

model_inputs = tokenizer(text, return_tensors="pt").to("cuda")
output_ids = model.generate(
    **model_inputs,
    max_length=256,
)
output_text = tokenizer.batch_decode(output_ids, skip_special_tokens=True)[0]
clean_output = output_text.split("### Response:")[1].strip()

print(clean_output)

LLogan M

So the input is the prompt, and the output is the prompt PLUS newly generated words

We only want to return the new words

And correct, that's just another way of returning only the new words

nnbulkz

oh because Prompt has a length

nnbulkz

got it

LLogan M

Yea! And for camel, we know everything after that response string is new, so it splits on that to get the new stuff

nnbulkz

if you use that Prompt

nnbulkz

which I wasn't

nnbulkz

but that was easy enough to see