Find answers from the community

S
Simon
Offline, last seen 3 months ago
Joined September 25, 2024
Hi all - I'm trying to make the simplest possible RAG pipeline calling a local model. If I simply use 'local' for model name, I get back expected results from the model query, but if I hardcode the model name to point at my local '/ai/Mistral-7B-v0.1' directory, I get:

/site-packages/transformers/tokenization_utils_base.py", line 2707, in _get_padding_truncation_strategies raise ValueError(ValueError: Asking to pad but the tokenizer does not have a padding token. Please select a token to use as pad_token (tokenizer.pad_token = tokenizer.eos_token e.g.) or add a new pad token via tokenizer.add_special_tokens({'pad_token': '[PAD]'})

The code is:
local_model = '/ai/Mistral-7B-v0.1' llm = HuggingFaceLLM(model_name=local_model) embed_model = HuggingFaceEmbedding(model_name=local_model, tokenizer_name=local_model) chroma_client = chromadb.PersistentClient() chroma_collection = chroma_client.create_collection("quickstart") vector_store = ChromaVectorStore(chroma_collection=chroma_collection) storage_context = StorageContext.from_defaults(vector_store=vector_store) service_context = ServiceContext.from_defaults(llm=llm, embed_model=embed_model) documents = SimpleDirectoryReader("data").load_data() VectorStoreIndex.from_documents(documents, storage_context=storage_context, service_context=service_context)
19 comments
L
S
A