Find answers from the community

Updated 10 months ago

Is anyone aware of an SLM good enough

Is anyone aware of an SLM good enough for a Q&A RAG usecase? I was using llama-2 7B (4-bit quantized) so far but it still consumes a lot of computing resources. The goal is to find something that can work completely on CPU. I have recently discovered TinyLlama/TinyLlama-1.1B-Chat-v1.0 but would like to know my other options. Thank you!
1
L
A
b
5 comments
Tiny llama might be a good choice, but I have some hesitation to recommend a model that small for general RAG. I haven't tried it yet though!

zeyphr-3b does not bad, you can see a demo here
https://colab.research.google.com/drive/1USBIOs4yUkjOcxTKBr7onjlzATE-974T?usp=sharing

Keep in mind that these smaller models should probably not be used for more complex reasoning (routing, structured outputs, agents)
Thanks @Logan M ! That makes sense
Hmm so these models can be run on Google colabs? I was wondering if instead of downloading the whole models, is there a cheap or free inference hosting for small language models like these
altho it seems to not work on colabs right now
Attachment
image.png
I think that was a transient error in the environment. It went away for me today.
Add a reply
Sign up and join the conversation on Discord