Tiny llama might be a good choice, but I have some hesitation to recommend a model that small for general RAG. I haven't tried it yet though!
zeyphr-3b does not bad, you can see a demo here
https://colab.research.google.com/drive/1USBIOs4yUkjOcxTKBr7onjlzATE-974T?usp=sharingKeep in mind that these smaller models should probably not be used for more complex reasoning (routing, structured outputs, agents)