Hi, I am using hf model & created pipeline, but unable to write code for service context as it allows input to llm model but not pipelines . Can someone tell how to service context when using hf llm pipelines?
But, I created textgeneration pipeline & tried putting in like
servicecontext( llm= pipeline..) It resulted in few param missing - text_inputs & system_prompt
The reason I am using pipeline is, it provided flexibility while changing parameters which were default in llm model, eg- eos_token, padding_side, etc
Error got with model- No chat template is defined for this tokenizer - using the default template for the GPT2TokenizerFast class. If the default is not appropriate for your model, please set tokenizer.chat_template to an appropriate template.
Setting pad_token_id to eos_token_id:50256 for open-end generation. A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set padding_side='left' when initializing the tokenizer. [ ]
@Logan M is there a design reason why HF pipelines aren't supported? are there plans for that? i am hoping to contribute optimum intel for better cpu performance and would be good to know