sl33p

there is a way i can tell to the

there is a way i can tell to the ContextChatEngine use only the documents / nodes of the current index? because the problem now is that if i use an index example and example has no info about cars the RAG will always reply

6 comments

ssl33p

```

Plain Text

 Querying with: Come posso arrivare alla fiera?
Oct 11 08:32:40 messe-rag-chatbot app/web.1 **********
Oct 11 08:32:40 messe-rag-chatbot app/web.1 Trace: chat
Oct 11 08:32:40 messe-rag-chatbot app/web.1     |_CBEventType.TEMPLATING ->  4.4e-05 seconds
Oct 11 08:32:40 messe-rag-chatbot app/web.1     |_CBEventType.LLM ->  2.075113 seconds
Oct 11 08:32:40 messe-rag-chatbot app/web.1     |_CBEventType.QUERY ->  2.032007 seconds
Oct 11 08:32:40 messe-rag-chatbot app/web.1       |_CBEventType.RETRIEVE ->  2.027069 seconds
Oct 11 08:32:40 messe-rag-chatbot app/web.1       |_CBEventType.SYNTHESIZE ->  0.004779 seconds
Oct 11 08:32:40 messe-rag-chatbot app/web.1         |_CBEventType.TEMPLATING ->  3.3e-05 seconds
Oct 11 08:32:40 messe-rag-chatbot app/web.1         |_CBEventType.LLM ->  0.0 seconds
Oct 11 08:32:40 messe-rag-chatbot app/web.1     |_CBEventType.LLM ->  0.0 seconds
Oct 11 08:32:40 messe-rag-chatbot app/web.1 **********

18 comments

ssl33p

Hi Guys i have archichetture question i

Hi Guys, i have archichetture question, i have success build an example that read from a mongo-collection and allow user to chat with that data, is the best pratice to rebuild the index for every chat (for get the updated cases) or i need to store that index somewhere and update it every N time (and if i do that in real-time how much more expensive is instead of save it)

5 comments

ssl33p

LLM time

hi guys i have finish my first company chatbot on custom docs, and it work very weel (very nice lib guys!) i have one question for performance. i attack the current code to make the query engine and the chatEngine, i use recursiveRetriver because i have the information in different data sources (yes it's pretty shit but i can't change the data sources and use something else like SubQueryEngine) and after that i pick the first 2 results of each index and use that results in the CondenseChatEngine, it all work well but there is any way to reduce latency? i have tried different chunk sizes, limiting the context ecc and streaming, the problem seems to be the first LLM call of condenseChatEngine that is pretty slow (2-3 seconds) and so i try the others engines but they produce to me less quality results, any hint is appreciated 🙂

13 comments

ssl33p

hi guys one question i am building a

hi guys one question, i am building a query composed graph engine but it take a lot of time to respond (20 seconds) it is possible to reduce that time? what are those best pratices? i will provide the code:

24 comments

ssl33p

here my actual test code i want to limit

here my actual test code, i want to limit the GPT responses for only my docs, for example if i ask what is vesuvio? it response me with a correct response but it was not in my docs

Find answers from the community

there is a way i can tell to the

```

Hi Guys i have archichetture question i

LLM time

hi guys one question i am building a

here my actual test code i want to limit

all the docs in the folder are in