We are using GPT4 to make a chatbot that answers product-related questions on e-commerce websites. Regardless of how much we do prompt engineering, some GPT4 responses are superb while some have scope of improvement. We would love to use RLHF or similar technique to teach GPT4 from the human feedback on the quality of responses.
Additional info: We are already using finetuned models. We would ideally like to make this human feedback as a continous learning machine