Find answers from the community

s
sl33p
Offline, last seen 3 months ago
Joined September 25, 2024
there is a way i can tell to the ContextChatEngine use only the documents / nodes of the current index? because the problem now is that if i use an index example and example has no info about cars the RAG will always reply
6 comments
s
L
s
sl33p
·

```

Plain Text
 Querying with: Come posso arrivare alla fiera?
Oct 11 08:32:40 messe-rag-chatbot app/web.1 **********
Oct 11 08:32:40 messe-rag-chatbot app/web.1 Trace: chat
Oct 11 08:32:40 messe-rag-chatbot app/web.1     |_CBEventType.TEMPLATING ->  4.4e-05 seconds
Oct 11 08:32:40 messe-rag-chatbot app/web.1     |_CBEventType.LLM ->  2.075113 seconds
Oct 11 08:32:40 messe-rag-chatbot app/web.1     |_CBEventType.QUERY ->  2.032007 seconds
Oct 11 08:32:40 messe-rag-chatbot app/web.1       |_CBEventType.RETRIEVE ->  2.027069 seconds
Oct 11 08:32:40 messe-rag-chatbot app/web.1       |_CBEventType.SYNTHESIZE ->  0.004779 seconds
Oct 11 08:32:40 messe-rag-chatbot app/web.1         |_CBEventType.TEMPLATING ->  3.3e-05 seconds
Oct 11 08:32:40 messe-rag-chatbot app/web.1         |_CBEventType.LLM ->  0.0 seconds
Oct 11 08:32:40 messe-rag-chatbot app/web.1     |_CBEventType.LLM ->  0.0 seconds
Oct 11 08:32:40 messe-rag-chatbot app/web.1 **********
18 comments
L
s
Hi Guys, i have archichetture question, i have success build an example that read from a mongo-collection and allow user to chat with that data, is the best pratice to rebuild the index for every chat (for get the updated cases) or i need to store that index somewhere and update it every N time (and if i do that in real-time how much more expensive is instead of save it)
5 comments
s
E
s
sl33p
·

LLM time

hi guys i have finish my first company chatbot on custom docs, and it work very weel (very nice lib guys!) i have one question for performance. i attack the current code to make the query engine and the chatEngine, i use recursiveRetriver because i have the information in different data sources (yes it's pretty shit but i can't change the data sources and use something else like SubQueryEngine) and after that i pick the first 2 results of each index and use that results in the CondenseChatEngine, it all work well but there is any way to reduce latency? i have tried different chunk sizes, limiting the context ecc and streaming, the problem seems to be the first LLM call of condenseChatEngine that is pretty slow (2-3 seconds) and so i try the others engines but they produce to me less quality results, any hint is appreciated 🙂
13 comments
s
L
hi guys one question, i am building a query composed graph engine but it take a lot of time to respond (20 seconds) it is possible to reduce that time? what are those best pratices? i will provide the code:
24 comments
s
L
here my actual test code, i want to limit the GPT responses for only my docs, for example if i ask what is vesuvio? it response me with a correct response but it was not in my docs
16 comments
s
L
all the docs in the folder are in italian but sometimes i get english responses
2 comments
s
L