Find answers from the community

Updated 2 months ago

LlamaIndex uses the Hugging Face Tokenizers library for tokenization when using a local embedding model and local language model through Ollama.

At a glance

The community members are discussing the tokenizer used by LlamaIndex when using a local embedding model and local language model through Ollama. One community member suggests that the tokenizer part is handled at the Ollama side and not the LlamaIndex side, based on the provided code reference. Another community member asks if there is no need to provide a tokenizer when using LlamaIndex's Ollama(), and a third community member believes this is the case.

Useful resources

JJatin.K

A) What tokenizer does LlamaIndex uses when I am supplying local embedding model and local language model(through Ollama) like in this code? B) And how to supply a tokenizer for a LLM that I am pulling from my own Ollama repo?

Attachment

3 comments

WWhiteFang_Jr

Ollama wrapper only passes the text in required format to your hosted llm model. I feel the tokenizer part is handled at the Ollama side only and not the llama-index side.

https://github.com/run-llama/llama_index/blob/fd1edffd20cbf21085886b96b91c9b837f80a915/llama-index-integrations/llms/llama-index-llms-ollama/llama_index/llms/ollama/base.py#L306

JJatin.K

so does that means whenever I am using LlamaIndex's Ollama(), there is no need to provide tokenizer like in this scrreenshot?

Attachment

WWhiteFang_Jr

I believe yes

Add a reply