Stick to the retrieval, figure out why this is happening. e.g. maybe theres just too much data, maybe the chunk size is not ok etc.. Then start experimenting with other methods, like different indexation, using hierarchy, maximal marginal relevance etc..
Once you have the retrieval solved (and I really recommend using the evaluation framework developped by LlamaIndex to create chunk questions etc..) I would see if gpt is hallucinating or not