okey, so i have my custom retriver to query my indexes and return Nodes, i want to combine it with a chatMemory and use it as a Chat engine, how would i go about doing that ? i dont want to use a response syntesiser since i want to keep my LLM calls to a minimum, ideas ?
any ideas on how to normalize BM25 score with vector scores without using the Reciprocal Rerank Fusion Retriever.. since i collect my nodes my self.....
Anyone has some good reading material on using llamaindex llamacpp ie local llm on low spec devises for rag, ?? Iam looking for things to do to optimize my request as much as possible before sending them to the llm since in my case right now the llm is the bottleneck mostly looking i to rag features
i have a vector db created with an ingestion pipeline, can i after the db has been populated, augemnt it with for example more meta data like titleextractor ? without rebuilding the whole index ?
when i have created an chromaDB with vector indexes, how can i get the filename metadata so i can compare it against incoming docs, to prevent from indexing the same document several times ? anyone ?
i have Python llamacpp running in an container exposing the API, how can i connect to it with Llamaindex ?? when i import (from llama_index.llms import LlamaCPP) it wants to run llamaCPP on my local host but i want to connect to another host
i have a bunch of documents in a vector store, and when i do retrival from the store the score is the same for all chunks recevided from each document, 5 chunks from docA has the exact same score, then 5 chunks from docB has the exact same score, as i understand the embedding is on document level. but then how can i know wich of the 5 chunks from doc A that really has the best match score ? i could do reranking... but lets say i do top_K 200 and i have 3 documents with 100 chunks each, doc a returns 100chunks with the same score and docB returns 100 chunks with same score, but in reality 1 chunk of the 100 chunks from docC is the "right" chunk.... i dont understand
hmm iam working on a project, where iam very limited in GPU/CPU resources, and i want to limit my LLM calls as much as possible, focusing more on retreival. how can i pass my retrivals to the LLM without usig the response_synthesizer wich calls the LLM several times depending on the retrived reults.
Ive read the redis ingestion pipeline and the redis index + docstore, i would really want to combine them in other words build the summary and key index in the ingestion pipeline is that possible ?
Iam trying alot of things now and i was wondering wich kinds of indexes can i store in pg ? I saw the documentation for vectordb but what about docstore, summaruindexes and so forth?