Vicuna

zeroaone · 2023-05-18T06:56:02.824Z

When I try to use the vicuna-13b-1.1 model, the following problem occurs:

A lot of this depends on how you are setting up the model. Looks like the tokenizer is getting some kwargs it doesn't like

zzeroaone

import torch
query_wrapper_prompt = SimpleInputPrompt(
"Below is an instruction that describes a task. "
"Write a response that appropriately completes the request.\n\n"
"### Instruction:\n{query_str}\n\n### Response:"
)
hf_predictor = HuggingFaceLLMPredictor(
max_input_size=2048,
max_new_tokens=512,
temperature=0.25,
do_sample=False,
query_wrapper_prompt=query_wrapper_prompt,
tokenizer_name="eachadea/vicuna-13b-1.1",
model_name="eachadea/vicuna-13b-1.1",
device_map="auto",
tokenizer_kwargs={"max_length": 2048},
# uncomment this if using CUDA to reduce memory usage
model_kwargs={"torch_dtype": torch.float16}
)
prompt_helper = PromptHelper(max_input_size=2048, num_output=512, max_chunk_overlap=50)
embed_model = LangchainEmbedding(HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2"))
service_context = ServiceContext.from_defaults(chunk_size_limit=512, llm_predictor=hf_predictor, embed_model=embed_model, prompt_helper=prompt_helper)
index = GPTVectorStoreIndex.from_documents(documents, service_context=service_context)
index.storage_context.persist(persist_dir="./storage")
query_engine = index.as_query_engine(streaming=True, similarity_top_k=4, service_context=service_context)

LLogan M

Ah, I see the issue actually. Huggingface recently added this error when extra kwargs are input to the model 🙄 even though those arguments come from the tokenizer..

Let me see if there's a way to avoid this without needing a PR lol

LLogan M

You could try downgrading transformers if possible in the meantime. I think 4.21.3 should work

LLogan M

This check they added is pretty silly

Find answers from the community

Vicuna