Custom llm

Hi! I am using a custom LLM (gpt2 from HF) and getting this error on query:

Plain Text

 Asking to pad but the tokenizer does not have a padding token

. Trying to find out where I need to pass the tokenizer - I have already set it on the prompt helper. Any ideas?

17 comments

LLogan M

Can you share how you are setting up the model?

uurjit

sure

uurjit

Plain Text

model_name = "gpt2"
tokenizer = AutoTokenizer.from_pretrained(
    pretrained_model_name_or_path=model_name)
tokenizer.pad_token = tokenizer.eos_token
print(f"Tokenizer settings: {tokenizer}")

prompt_helper = PromptHelper(max_input_size=max_input_size,num_output=num_output, max_chunk_overlap=max_chunk_overlap,chunk_size_limit=200, tokenizer=tokenizer)
class CustomLLM(LLM):
    pipeline = pipeline(
        task="text-generation",
        model=model_name,
        device="cuda:0",
    )

    def _call(self, prompt: str, stop: Optional[List[str]] = None) -> str:
        prompt_length = len(prompt)
        response = self.pipeline(prompt, max_new_tokens=num_output)[0]["generated_text"]
        # only return newly generated tokens
        return response[prompt_length:]

    @property
    def _identifying_params(self) -> Mapping[str, Any]:
        return {"name_of_model": model_name}

    @property
    def _llm_type(self) -> str:
        return "custom"

def setup_index(service_context: ServiceContext | None) -> BaseGPTIndex:
    INDEX_PATH = "DATA_PATH"
    if os.path.exists(INDEX_PATH):
        return GPTSimpleVectorIndex.load_from_disk(save_path=INDEX_PATH, service_context=service_context)
    PandasCSVReader = download_loader("PandasCSVReader")
    loader = PandasCSVReader()
    documents: List[Document] = loader.load_data(
        file=Path('./data/articles.csv'))
    index: BaseGPTIndex = GPTSimpleVectorIndex.from_documents(
        documents=documents, service_context=service_context)
    index.save_to_disk(INDEX_PATH)
    return index

llm_predictor = LLMPredictor(llm=CustomLLM())
embed_model = LangchainEmbedding(HuggingFaceEmbeddings(model_name=model_name), tokenizer=tokenizer)
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor,prompt_helper=prompt_helper,embed_model=embed_model)
index = setup_index(service_context)

uurjit

one thing I see that could be a problem is:

Plain Text

WARNING:sentence_transformers.SentenceTransformer:No sentence-transformers model found with name /home/ubuntu/.cache/torch/sentence_transformers/gpt2. Creating a new one with MEAN pooling

uurjit

Seems like there is no way the tokenizer is getting passed through to the Huggingface embedding

LLogan M

Yea for the embedding model, gpt2 is not used for embeddings. The default one it loads without providing a model name is probably fine.

Also maybe don't pass the tokenizer into the prompt helper

uurjit

ah ok let me try. How can I find the list of models (huggingface) that work with embeddings?

LLogan M

By default, it will load this model: https://huggingface.co/sentence-transformers/all-mpnet-base-v2

Probably anything in the same category will work too: https://huggingface.co/models?pipeline_tag=sentence-similarity

uurjit

thanks! trying it out

uurjit

hmm, got some other weird behaviour with the defaults

Plain Text

No model was supplied, defaulted to gpt2 and revision 6c0e608 (https://huggingface.co/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.
INFO:sentence_transformers.SentenceTransformer:Load pretrained SentenceTransformer: sentence-transformers/all-mpnet-base-v2

followed by

Plain Text

../aten/src/ATen/native/cuda/Indexing.cu:1146: indexSelectLargeIndex: block: [8,0,0], thread: [95,0,0] Assertion `srcIndex < srcSelectDimSize` failed
....
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

LLogan M

Is that error during indexing or query?

LLogan M

oh, and don't pass the tokenizer to the huggingface embeddings

LLogan M

missed that one

LLogan M

embed_model = LangchainEmbedding(HuggingFaceEmbeddings())

LLogan M

should be all you need for embeddings

uurjit

yeah, that is during query. (Thats with the tokenizer removed from HF embeddings)

LLogan M

Hmm maybe try calling/testing the pipeline on its own to help debug the issue

Tbh though, gpt2 will not be very good... it's pretty old now, and not great at following instructions

Add a reply

Find answers from the community

Custom llm