Torben

Chat

Which components can i best use to build a RAG-Chat that works for short inquires, after first answer was given?
I am currently using ContextChatEngine, which works fine for the first interaction, but if a user poses a follow up question, which is short and more related to the previous interaction as to the content in the vector store.

To make it understandable hopefully:

Vectorstore Content [".... Granny Smith Apples ... ", " ... Green Frog...",...]
Question: What colors can apples have?
Answer by ContextChat Engine: Apples can have red, green (like Granny Smith) or yellow color
Question: Give me some more green varieties?
Answer by ContextChat Engine: .... Green Frog ... <= So the similarity to green frog is closest to green frog and the answer would be something related to frogs

4 comments

TTorben

Analyze Document | 🦜🔗 Langchain

Hi! I want to parse longer articles from multiple sources like mail, pdf files etc., which sometimes exceed the token limit so the output gets truncated.

Details: The articles are surrounded with noise, like page information, header, footer etc. The noise is not easily removed, with e.g. BeautifulSoup or similar, because they are all different.
Status: For articles that fit into the token limit it already works fine and only the resulting article with a SEO friendly headline is produced. This is what should happen.
Issue: Longer texts... So i do not want to summarize and only pick some chunks, but the whole article without any words changed.

Question: Are there options for a multi prompt setup that e.g. each prompt takes some chunks and then the output is recombined? As far as i understand: https://python.langchain.com/docs/use_cases/question_answering/how_to/analyze_document is just summarising, which is not what i want. Has somebody a better idea on how to approach it?

1 comment

TTorben

KeywordTableIndex

What is the best retriever to be used for input that sometimes require exact matches (similarity search fails).
So the most basic and stupid idea i had was to let the llm generate keywords and search for these. Problem here would probably be, that keywords would be generated, that have a high similarity but exist in many documents

1 comment

Find answers from the community

Chat

Analyze Document | 🦜🔗 Langchain

KeywordTableIndex