Find answers from the community

Updated 3 months ago

Custom llm

Hi! I am using a custom LLM (gpt2 from HF) and getting this error on query:
Plain Text
 Asking to pad but the tokenizer does not have a padding token
. Trying to find out where I need to pass the tokenizer - I have already set it on the prompt helper. Any ideas?
L
u
17 comments
Can you share how you are setting up the model?
Plain Text
model_name = "gpt2"
tokenizer = AutoTokenizer.from_pretrained(
    pretrained_model_name_or_path=model_name)
tokenizer.pad_token = tokenizer.eos_token
print(f"Tokenizer settings: {tokenizer}")

prompt_helper = PromptHelper(max_input_size=max_input_size,num_output=num_output, max_chunk_overlap=max_chunk_overlap,chunk_size_limit=200, tokenizer=tokenizer)
class CustomLLM(LLM):
    pipeline = pipeline(
        task="text-generation",
        model=model_name,
        device="cuda:0",
    )

    def _call(self, prompt: str, stop: Optional[List[str]] = None) -> str:
        prompt_length = len(prompt)
        response = self.pipeline(prompt, max_new_tokens=num_output)[0]["generated_text"]
        # only return newly generated tokens
        return response[prompt_length:]

    @property
    def _identifying_params(self) -> Mapping[str, Any]:
        return {"name_of_model": model_name}

    @property
    def _llm_type(self) -> str:
        return "custom"

def setup_index(service_context: ServiceContext | None) -> BaseGPTIndex:
    INDEX_PATH = "DATA_PATH"
    if os.path.exists(INDEX_PATH):
        return GPTSimpleVectorIndex.load_from_disk(save_path=INDEX_PATH, service_context=service_context)
    PandasCSVReader = download_loader("PandasCSVReader")
    loader = PandasCSVReader()
    documents: List[Document] = loader.load_data(
        file=Path('./data/articles.csv'))
    index: BaseGPTIndex = GPTSimpleVectorIndex.from_documents(
        documents=documents, service_context=service_context)
    index.save_to_disk(INDEX_PATH)
    return index

llm_predictor = LLMPredictor(llm=CustomLLM())
embed_model = LangchainEmbedding(HuggingFaceEmbeddings(model_name=model_name), tokenizer=tokenizer)
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor,prompt_helper=prompt_helper,embed_model=embed_model)
index = setup_index(service_context)
one thing I see that could be a problem is:
Plain Text
WARNING:sentence_transformers.SentenceTransformer:No sentence-transformers model found with name /home/ubuntu/.cache/torch/sentence_transformers/gpt2. Creating a new one with MEAN pooling
Seems like there is no way the tokenizer is getting passed through to the Huggingface embedding
Yea for the embedding model, gpt2 is not used for embeddings. The default one it loads without providing a model name is probably fine.

Also maybe don't pass the tokenizer into the prompt helper
ah ok let me try. How can I find the list of models (huggingface) that work with embeddings?
thanks! trying it out
hmm, got some other weird behaviour with the defaults
Plain Text
No model was supplied, defaulted to gpt2 and revision 6c0e608 (https://huggingface.co/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.
INFO:sentence_transformers.SentenceTransformer:Load pretrained SentenceTransformer: sentence-transformers/all-mpnet-base-v2

followed by
Plain Text
../aten/src/ATen/native/cuda/Indexing.cu:1146: indexSelectLargeIndex: block: [8,0,0], thread: [95,0,0] Assertion `srcIndex < srcSelectDimSize` failed
....
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Is that error during indexing or query?
oh, and don't pass the tokenizer to the huggingface embeddings
missed that one
embed_model = LangchainEmbedding(HuggingFaceEmbeddings())
should be all you need for embeddings
yeah, that is during query. (Thats with the tokenizer removed from HF embeddings)
Hmm maybe try calling/testing the pipeline on its own to help debug the issue

Tbh though, gpt2 will not be very good... it's pretty old now, and not great at following instructions
Add a reply
Sign up and join the conversation on Discord