Find answers from the community

Home
Members
OmriNach
O
OmriNach
Offline, last seen 2 days ago
Joined September 25, 2024
Has anyone figured out how to trigger RAG retrieval inside a reasoning models <think> steps? Similar to the Search-o1 paper here:https://search-o1.github.io

In my opinion, this would be the best way to use deepseek to optimize RAG since new retrieve knowledge can push the reasoning chain in a different direction.

I tried to make the system prompt of deepseek llama 8b amd qwen distil 32b output <search> tokens but it does not like to follow instructions !
5 comments
O
L
I transitioned from LLM OpenAI to AzureOpenAI gpt4o deployment, but I can't get the model to produce more than 1000 tokens. I have not set up max_tokens and confirmed its None in Settings.llm. Not sure what settings im missing here. any one experience the same?
5 comments
L
O
I have a create llama backend running on ec2 with a load balancer and auto scaling group, etc. it works perfectly fine serving my llama front end chat website on most wifis, but on certain corporate wifis I get ERR_CERT_AUTHORITY_INVALID trying to ping the backend. Does anyone know a solution to this? Our SSL certificate is provided by Aws as well, and gets an A grade from SSL labs
3 comments
L
O
With arrival of the new workflows architecture it opens the possibility of making the powerful COA chain of abstraction approach even better by making it execute in steps rather than one shot, it can then revise the execution plan as data comes in and even progressively refine an answer until a validator triggers a stop event. @Logan M have you thought about making a notebook on how to implement a COA rag pipeline with the new workflows?
18 comments
m
L
W