Find answers from the community

Updated 2 years ago

Hi I d like to process the documents

At a glance

Hi, I'd like to process the documents ( lemmatization and removing stopwords) before creating index. And when query, I want use the processed text to find related resources but use original question when sending to gpt-3.5. Thess are my steps:

Process documents and generate vector store index
Process the question
Find top k related sources using processed question
Send original question and processed sources context to GPT to answer the question

Is it possible to do this in llama index ? (maybe using QuestionAnswerPrompt template ?)

6 comments

LLogan M

Hmmm this would be pretty difficult to do currently

Is the goal here to reduce token usage? You could try the optimizer https://github.com/jerryjliu/llama_index/blob/main/examples/optimizer/OptimizerDemo.ipynb

JJW

It's not . It for finding more related sources.

JJW

For these two questions what is the operating margin of Apple? and what is the operating margin of apple? . I realize that different related sources documents are used and one question cannot be answer based on the sources. In this case I'd like to preprocess both index and question when finding the sources

LLogan M

Hmm I think the only current possible way to do what you describe is to construct all your node objects before inserting into the index, and then for each node, generate an embedding for it based on the processed text (while keeping the node text itself the same) and assign that embedding to each node

If the nodes have an embedding set before being used in an index, it shouldn't be overwritten

JJW

I now process all documents text before creating index.

JJW

What I want to do is using processed question when finding related documents but sending the original question to GPT

Plain Text

DEFAULT_TEXT_QA_PROMPT_TMPL = (
    "Context information is below. \n"
    "---------------------\n"
    "{context_str}" # use original context_str. I guess maybe allowing pass a function to the template can solve this 
    "\n---------------------\n"
    "Given the context information and not prior knowledge, "
    "answer the question: {query_str}\n"
)

Add a reply