Find answers from the community

Updated 2 months ago

Anyone has some good reading material on

Anyone has some good reading material on using llamaindex llamacpp ie local llm on low spec devises for rag, ?? Iam looking for things to do to optimize my request as much as possible before sending them to the llm since in my case right now the llm is the bottleneck mostly looking i to rag features
L
h
9 comments
sadly the LLM is always the bottleneck πŸ˜…

Best things to do are
a) reduce the size of inputs to the LLM
b) reduce the number of LLM calls
Yeah iam experimenting with simillarity and trying to resuce the top k to use the most refrenced file and then do a retrive from rhe vector store on only that file ( havent got the last part right yet) i will post my findings here though i thinl there are many like me out there
Any thoughts on implementing a reverse index like lucene ad an index option ? I dont know how that compares to the vectors though
We have stuff like bm25 or colbert, which would be quite a bit more lightweight.
Yeah using bm25 scoring right now
Any idea on how to query a vector index for nodes only from a specific doc id ?
Found it i belive i can use filters on exact match for doc id, will try that, is filtering done before or after retrival ?
filtering is typically done before retrieval
Add a reply
Sign up and join the conversation on Discord