Find answers from the community

Updated 5 months ago

Anyone has some good reading material on

At a glance

Anyone has some good reading material on using llamaindex llamacpp ie local llm on low spec devises for rag, ?? Iam looking for things to do to optimize my request as much as possible before sending them to the llm since in my case right now the llm is the bottleneck mostly looking i to rag features

9 comments

LLogan M

sadly the LLM is always the bottleneck 😅

Best things to do are
a) reduce the size of inputs to the LLM
b) reduce the number of LLM calls

hhansson0728

Yeah iam experimenting with simillarity and trying to resuce the top k to use the most refrenced file and then do a retrive from rhe vector store on only that file ( havent got the last part right yet) i will post my findings here though i thinl there are many like me out there

hhansson0728

Any thoughts on implementing a reverse index like lucene ad an index option ? I dont know how that compares to the vectors though

LLogan M

We have stuff like bm25 or colbert, which would be quite a bit more lightweight.

hhansson0728

Yeah using bm25 scoring right now

hhansson0728

Any idea on how to query a vector index for nodes only from a specific doc id ?

hhansson0728

Found it i belive i can use filters on exact match for doc id, will try that, is filtering done before or after retrival ?

hhansson0728

https://docs.llamaindex.ai/en/stable/optimizing/basic_strategies/basic_strategies.html

LLogan M

filtering is typically done before retrieval

Add a reply