Find answers from the community

Updated last year

Is anyone aware of an SLM good enough

At a glance

Is anyone aware of an SLM good enough for a Q&A RAG usecase? I was using llama-2 7B (4-bit quantized) so far but it still consumes a lot of computing resources. The goal is to find something that can work completely on CPU. I have recently discovered TinyLlama/TinyLlama-1.1B-Chat-v1.0 but would like to know my other options. Thank you!

5 comments

LLogan M

Tiny llama might be a good choice, but I have some hesitation to recommend a model that small for general RAG. I haven't tried it yet though!

zeyphr-3b does not bad, you can see a demo here
https://colab.research.google.com/drive/1USBIOs4yUkjOcxTKBr7onjlzATE-974T?usp=sharing

Keep in mind that these smaller models should probably not be used for more complex reasoning (routing, structured outputs, agents)

AAnurag Agrawal

Thanks @Logan M ! That makes sense

bbetermagne

Hmm so these models can be run on Google colabs? I was wondering if instead of downloading the whole models, is there a cheap or free inference hosting for small language models like these

bbetermagne

altho it seems to not work on colabs right now

Attachment

RRamin

I think that was a transient error in the environment. It went away for me today.

Add a reply