Are there use cases where decomposable graph makes more sense than subquestion query? I feel like sub-question can handle everything graph does?
Maybe this has to do with node post-processing. Is there a dynamic way to set similarity_top_k so I always use the maximum # that can fit inside the context window?
Does llama index offer any smart chunking algorithms? For example, instead of a fix length cutoff, can I do it by paragraphs or contextual topics?
Curious if anyone tried using Llama.cpp with LlamaIndex so they can easily access quantized models with CPU only setup. Will using CustomLLM do the trick?
High level question on structuring nodes/docs. So far, I broke down one 10-k document and set each section as a node along with metadata of that section. But that is for one year and one company. How should I think about the structure if I want to have multiple years and documents? I would like to start simple, but maybe this requires composability.
Anyone else having issues loading the 70B llama2 model on LlamaCPP? I was successful with the 7B and 13B models but I’m getting a vague error for 70B. (See attached image)
My cluster is CPU only but has up to 96 workers and 768GB ram.