Find answers from the community

s
F
Y
a
P
Updated 2 years ago

Hi I d like to process the documents

Hi, I'd like to process the documents ( lemmatization and removing stopwords) before creating index. And when query, I want use the processed text to find related resources but use original question when sending to gpt-3.5. Thess are my steps:

  1. Process documents and generate vector store index
  2. Process the question
  3. Find top k related sources using processed question
  4. Send original question and processed sources context to GPT to answer the question
Is it possible to do this in llama index ? (maybe using QuestionAnswerPrompt template ?)
L
J
6 comments
Hmmm this would be pretty difficult to do currently

Is the goal here to reduce token usage? You could try the optimizer https://github.com/jerryjliu/llama_index/blob/main/examples/optimizer/OptimizerDemo.ipynb
It's not . It for finding more related sources.
For these two questions what is the operating margin of Apple? and what is the operating margin of apple? . I realize that different related sources documents are used and one question cannot be answer based on the sources. In this case I'd like to preprocess both index and question when finding the sources
Hmm I think the only current possible way to do what you describe is to construct all your node objects before inserting into the index, and then for each node, generate an embedding for it based on the processed text (while keeping the node text itself the same) and assign that embedding to each node

If the nodes have an embedding set before being used in an index, it shouldn't be overwritten
I now process all documents text before creating index.
What I want to do is using processed question when finding related documents but sending the original question to GPT
Plain Text
DEFAULT_TEXT_QA_PROMPT_TMPL = (
    "Context information is below. \n"
    "---------------------\n"
    "{context_str}" # use original context_str. I guess maybe allowing pass a function to the template can solve this 
    "\n---------------------\n"
    "Given the context information and not prior knowledge, "
    "answer the question: {query_str}\n"
)
Add a reply
Sign up and join the conversation on Discord