It's pretty tough, there's no way to know 100% every time.
You could try fiddling with the prompt templates, but you likely won't see much difference
You could also try using an external library to evaluate responses
https://github.com/explodinggradients/ragas