Find answers from the community

Updated 4 months ago

@Logan M for Groq what could be the

At a glance

MMaverick

for Groq what could be the optimal embed_model and tokenizer be?

8 comments

LLogan M

you can use any embed model, the LLM and embed model are indepdendant

MMaverick

hmm ... but just curious abt the tokenizer they don't seem to work as expected

LLogan M

Not sure what you mean?

MMaverick

for instance, in OpenAI it was something like this tokenizer=tiktoken.encoding_for_model(model) ... which help me compute the token_counter based on the same ... but in Groq that doesn't seem to quite work

LLogan M

Right, I have no idea what tokenizer groq uses lol (or if they even expose one). Tiktoken will get you an approximate count I suppose

MMaverick

basically I'm referring to the Llama2 that's been hosted on Groq

LLogan M

oh its llama2. You can just set the tokenizer then to something like tokenizer=AutoTokenizer.from_pretrained("<some llama2 model>").encode

LLogan M

just using some llama2 tokenizer from huggingface

Add a reply