Find answers from the community

Updated 5 months ago

Hi need feedback on the following

At a glance

Hi, need feedback on the following
If I were to use any opensource LLM for generating responses. GPU Size constraint for the LLM is 24GB. I'm looking into using opensource LLM in place of OpenAI.
Tried the following

Camel 5B
Stable LM 3B
dolly-v2-3B

What do you guys suggest.
Feedback highly appreciated! Thanks!

8 comments

LLogan M

is 7B too big for your GPU? (I don't actually know haha)

CCodeGriot

This is highly dependent on your use-case. There are a few truly general purpose models (Nous-Hermes is my favorite), but many are especially suited to certain tasks. Also, with that much GPU you should be able to use quantized 30B models. Anything from Wizard to Orca to Chronos, etc. You might even be able to squeez ein Falcon.

CCodeGriot

Nah, if they use, say a q4 GGML quant they can fit a 33B model, maybe even a 40B (e.g. Falcon). That would be with llama.cpp.

LLogan M

Does speed become a problem at that point with GGML/llama.cpp? Llama-Index will constantly push the LLM to the edge of it's context size, and I know bigger inputs mean slower responses

I haven't played around enough with opensource to know for sure

CCodeGriot

Definitely something to keep an eye on, and again use cases are key. The new SuperHOT models support 8K context, but there are still a lot of issues being shaken out. You can definitely get solid 4K context with e.g. Nous-Hermes, and that covers a decent amount of ground, esp with sound prompt engineering.

CCodeGriot

Speed itself is decent. I can get upwards of 15tok/sec on an Nvidia 3060. People using exllama are geting double that.

CCodeGriot

Whoa! You might be interested in this thread. I blinked and people are already talking their experiences with 16K context locally-hosted models (using the ROPE technique) 🤯
https://www.reddit.com/r/LocalLLaMA/comments/14t4lbc/highlight_on_some_interesting_8k_and_16k_models/

CCodeGriot

Airoboros BTW is a very good model for strictly following provided context, so would probably be a nice fit with Llama-Index

Add a reply