Hey, I have an application that allows the user to upload financial reports (analyst briefs on specific stocks) PDFs which then automatically turn into GPTSimpleVectorIndex embeddings.
How could I improve the performance since currently when querying the index the results it returns tend to be filled with filler information such as disclaimers and warnings?
Is there a way to filter out this information using the GPT-index libraries or has someone experimented with another method like fine-tuning for this purpose?
If you can preprocess the text remove unrelated info before creating your index, that might help. Otherwise, you might need to look into some prompt engineering. You can customize the prompts in Llama Index pretty easily
Plain Text
from gpt_index.prompts.prompts import RefinePrompt, QuestionAnswerPrompt
qa = """The prompt should include {query_str} and {context_str} variables"""
qa_template = QuestionAnswerPrompt(qa)
index.query("query", text_qa_template=qa_template)