Find answers from the community

Updated 3 months ago

The default model that loads for the

The default model that loads for the huggingface embeddings in the docs page that i sent usually works well

For LLMs, vicuna seems to be good (but it's also non-commericial). I like camel for commercial models so far
n
L
67 comments
Where do you find the Camel ones?
why this one over say, mname = "databricks/dolly-v2-12b"
?
Meh, I've tried dolly with llama index. This one is smaller and seems to give better responses
You can definitely try dolly though
ok interesting
in what ways was it better for you?
ooo I like that it is instruction following trained
that is what I am looking for ....
For the refine prompt in llama index, camel worked much better. This is a super difficult prompt for most models

Plus it uses less resources, and is faster because of that
I watched that video by Andrew Ng and Open AI and I got jaded
about how easy it should be to string together instructinos
Even gpt-3.5 kinda stinks at it
ah good to know
I think the generality is not something I actually need - I have a very specific task that I bet I could just train on
I just don't feel like collecting all the data and formatting it correctly
how do I know whether to use GPTNeoXTokenizerFast
or LlamandexTokenizer
Usually you can just use AutoTokenizer and pass the model name

Otherwise, the model card on huggingface should have an explicit demo
is there a way from huggingface to see the default number of support input tokens?
or max number?
Eh it's kind of convoluted. You usually have to look at the config json or read the model card

I wish there was an easier way though
Using the config will have something like max_possition_embeddings, or something like that
I can look at model in particular if you cant find it
I thought it was a token limit thing but now
I'm seeing something different
Plain Text
                    raise ValueError("If `eos_token_id` is defined, make sure that `pad_token_id` is defined.")
no idea how to fix that
That's.. interesting lol

What model is this?
Can i see how you set it up? I never got that error when I tested a few days ago
Plain Text
mname = "Writer/camel-5b-hf"
tokenizer = AutoTokenizer.from_pretrained(mname, cache_dir="../camel/tokenizer")
model = AutoModelForCausalLM.from_pretrained(mname, device_map="auto", cache_dir="../camel/model")
FULL_PROMPT = QuestionAnswerPrompt(JTC_QA_PROMPT2)


class CamelLLM(LLM):

    def _call(self, prompt: str, stop: Optional[List[str]] = None) -> str:

        print(prompt)
        generation_config = GenerationConfig(
            max_new_tokens=5000,
            temperature=0.1,
            repetition_penalty=1.0,
        )
        inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

        with torch.inference_mode():
            tokens = model.generate(**inputs, generation_config=generation_config, output_scores=True, max_new_tokens=num_output)

        response = tokenizer.decode(tokens[0], skip_special_tokens=True).strip()
        return response[len(prompt):]

    @property
    def _identifying_params(self) -> Mapping[str, Any]:
        return {"name_of_model": model}

    @property
    def _llm_type(self) -> str:
        return "custom"
I've been using the same template for a class and over writing/copy pasting
so it could be a mismatch of parameters or something obscure
Very sus haha

One note, the max new tokens is too big. Try like 256 or 512

Do you know which line of code raises that error?
Also last note, llama index has native support for huggingface, i would trust it a bit more
https://gpt-index.readthedocs.io/en/latest/examples/customization/llms/SimpleIndexDemo-Huggingface_camel.html
I gte this ...
Plain Text
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
what does that mean?
Just that the model will talk until it predicts a special token (or it runs out of room)
OK I think I just was mixing up two different custom classes
Plain Text
        response = tokenizer.decode(generation_output[0], skip_special_tokens=True).strip()
        return response[len(prompt):]
here what is the return response[len(prompt):] mean?
becuase for Camel on Hugging face, it looks different
Plain Text
model_inputs = tokenizer(text, return_tensors="pt").to("cuda")
output_ids = model.generate(
    **model_inputs,
    max_length=256,
)
output_text = tokenizer.batch_decode(output_ids, skip_special_tokens=True)[0]
clean_output = output_text.split("### Response:")[1].strip()

print(clean_output)
So the input is the prompt, and the output is the prompt PLUS newly generated words

We only want to return the new words

And correct, that's just another way of returning only the new words
oh because Prompt has a length
Yea! And for camel, we know everything after that response string is new, so it splits on that to get the new stuff
if you use that Prompt
which I wasn't
but that was easy enough to see
Yea. So for camel, it's trained for that specific prompt, so using it helps it work better πŸ’ͺ
Every model will be a little different, but yea lol
that makes sense but I would have no idea how to know that
Yea... it takes some experience haha
Ive been working with huggingface stuff for a few years, definitely not intuitive
Add a reply
Sign up and join the conversation on Discord