Find answers from the community

Updated 3 months ago

Llamacpp

Just ran into the same issue. No idea why. 😒
L
e
4 comments
@Elesbueno maybe try setting the context window a little lower? 3700? The token counting isn't perfect sometimes
@Logan M @Elesbueno In case anyone still cares, it appears that this lama2 13B version of OpenBuddy is just broken. If you ask it to write a poem about anything, it will not respond, even if you use llama_cpp directly. (I can't get it to respond to anything on its huggingface space at the moment, but the other versions work fine.) Oddly it will respond to the planets question if you use llama_cpp directly. It just doesn't seem to like how llama_index passes it the prompt period, even though it looks like it should be fine. πŸ˜• tldr; Don't use this model.
Hmm weird. Many models have a specific input format

No idea what openbuddy is lol but for normal llama2-chat, we have utility functions to transform prompts to use the proper [INST] and [\INST] tokens for example
Just to see what would happen, I modified the complete like so:
@llm_completion_callback() def complete(self, prompt: str, **kwargs: Any) -> CompletionResponse: self.generate_kwargs.update({"stream": False}) is_formatted = kwargs.pop("formatted", False) max_tokens = kwargs.pop("max_tokens", 64) stop = kwargs.pop("stop", "\n") echo = kwargs.pop("echo", True) if not is_formatted: prompt = self.completion_to_prompt(prompt) else: #response = self._model(prompt=prompt, **self.generate_kwargs) response = self._model("Q: " + prompt + " A: ", max_tokens=max_tokens, stop=stop, echo=echo) return CompletionResponse(text=response["choices"][0]["text"], raw=response)
So I could bypass the formatting and use it just like it would be called from the direct llamacpp example:
response = llm.complete("Name the planets in the solar system.", formatted=True, max_tokens=300, stop=["Q:", "\n"], echo=True)
And that does work. But don't ask it to write a poem. Β―\_(ツ)_/Β―
Add a reply
Sign up and join the conversation on Discord