Find answers from the community

Updated 2 months ago

Hi need feedback on the following

Hi, need feedback on the following
If I were to use any opensource LLM for generating responses. GPU Size constraint for the LLM is 24GB. I'm looking into using opensource LLM in place of OpenAI.
Tried the following
  • Camel 5B
  • Stable LM 3B
  • dolly-v2-3B
What do you guys suggest.
Feedback highly appreciated! Thanks!
L
C
8 comments
is 7B too big for your GPU? (I don't actually know haha)
This is highly dependent on your use-case. There are a few truly general purpose models (Nous-Hermes is my favorite), but many are especially suited to certain tasks. Also, with that much GPU you should be able to use quantized 30B models. Anything from Wizard to Orca to Chronos, etc. You might even be able to squeez ein Falcon.
Nah, if they use, say a q4 GGML quant they can fit a 33B model, maybe even a 40B (e.g. Falcon). That would be with llama.cpp.
Does speed become a problem at that point with GGML/llama.cpp? Llama-Index will constantly push the LLM to the edge of it's context size, and I know bigger inputs mean slower responses

I haven't played around enough with opensource to know for sure
Definitely something to keep an eye on, and again use cases are key. The new SuperHOT models support 8K context, but there are still a lot of issues being shaken out. You can definitely get solid 4K context with e.g. Nous-Hermes, and that covers a decent amount of ground, esp with sound prompt engineering.
Speed itself is decent. I can get upwards of 15tok/sec on an Nvidia 3060. People using exllama are geting double that.
Whoa! You might be interested in this thread. I blinked and people are already talking their experiences with 16K context locally-hosted models (using the ROPE technique) 🤯
https://www.reddit.com/r/LocalLLaMA/comments/14t4lbc/highlight_on_some_interesting_8k_and_16k_models/
Airoboros BTW is a very good model for strictly following provided context, so would probably be a nice fit with Llama-Index
Add a reply
Sign up and join the conversation on Discord