Yea open-source llms still have a lot of catching up to do it seems
With only 16gb, if you cant get stablelm to generate bettrr, you could look into using llama2 or similar with llama.cpp. if you install llama.cpp with gpu support, it's decently fast and easy to use.
Langchain also has a llama.cpp integration you can use with llama index
You cam use any LLM from langchain, as long as you wrap it with our wrapper
from llama_index.llms import LangChainLLM
from llama_index import ServiceContext, set_global_service_context
llm = LangChainLLM(<langchain llm>)
ctx = ServiceContext.from_defaults(llm=llm)
set_global_service_context(ctx)