Find answers from the community

Home
Members
Digital Rally
D
Digital Rally
Offline, last seen 3 months ago
Joined September 25, 2024
a new RAG Benchmark https://huggingface.co/datasets/google/frames-benchmark
has anyone ever heard of oracle retrieval?
4 comments
D
L
just fyi there are a few None on the Documentation Examples:
1 comment
W
@Logan M im currently trying to switch from llama.cpp to Ollama but the same model give me different responses. The output from Llama.cpp is correct and in the right language. The output from Ollama is wrong and sometimes in the wrong language... I have also talked to the Ollama community but we have no solution so far.... maybe it has to do with the implementation in llama index?

I have already compared all the settings i could find.
I can provide you with whatever infos you need.

We could (from my viewpoint) greatly increase the quality of ollama if we could find out what is different.
6 comments
D
L
i get an error when i start my llama index app via fastapi and uvicorn and make parallel requests on its endpoint.
10 comments
k
D
L
am i blind or is there no way to specify the grammar path (for json) when using LLamaCPP?:
Plain Text
class Llama:
    def __init__(
        n_gpu_layers: int = 0,
        split_mode: int = llama_cpp.LLAMA_SPLIT_MODE_LAYER,
        main_gpu: int = 0,
        tensor_split: Optional[List[float]] = None,
        vocab_only: bool = False,
        use_mmap: bool = True,
        use_mlock: bool = False,
        kv_overrides: Optional[Dict[str, Union[bool, int, float, str]]] = None,
        # Context Params
        seed: int = llama_cpp.LLAMA_DEFAULT_SEED,
        n_ctx: int = 512,
        n_batch: int = 512,
        n_threads: Optional[int] = None,
        n_threads_batch: Optional[int] = None,
        rope_scaling_type: Optional[int] = llama_cpp.LLAMA_ROPE_SCALING_TYPE_UNSPECIFIED,
        pooling_type: int = llama_cpp.LLAMA_POOLING_TYPE_UNSPECIFIED,
        rope_freq_base: float = 0.0,
        rope_freq_scale: float = 0.0,
        yarn_ext_factor: float = -1.0,
        yarn_attn_factor: float = 1.0,
        yarn_beta_fast: float = 32.0,
        yarn_beta_slow: float = 1.0,
        yarn_orig_ctx: int = 0,
        logits_all: bool = False,
        embedding: bool = False,
        offload_kqv: bool = True,
        flash_attn: bool = False,
        # Sampling Params
        last_n_tokens_size: int = 64,
        # LoRA Params
        lora_base: Optional[str] = None,
        lora_scale: float = 1.0,
        lora_path: Optional[str] = None,
        # Backend Params
        numa: Union[bool, int] = False,
        # Chat Format Params
        chat_format: Optional[str] = None,
        chat_handler: Optional[llama_chat_format.LlamaChatCompletionHandler] = None,
        # Speculative Decoding
        draft_model: Optional[LlamaDraftModel] = None,
        # Tokenizer Override
        tokenizer: Optional[BaseLlamaTokenizer] = None,
    ):
6 comments
D
L
Hello, im following this notebook: https://github.com/run-llama/llamacloud-demo/blob/main/examples/advanced_rag/corrective_rag_workflow.ipynb
i want to run it fully locally... what is the local equivalent of tavily?
Plain Text
# If any document is found irrelevant, transform the query string for better search results.
        if "no" in relevancy_results:
            prompt = DEFAULT_TRANSFORM_QUERY_TEMPLATE.format(query_str=query_str)
            result = self.llm.complete(prompt)
            transformed_query_str = result.text

            # Conduct a search with the transformed query string and collect the results.
            search_results = self.tavily_tool.search(
                transformed_query_str, max_results=5
            )
            search_text = "\n".join([result.text for result in search_results])
        else:
            search_text = ""
3 comments
L
D
2 comments
L
any idea when the new gemma2 will be working?
Plain Text
llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'gemma2'

cant wait to test it out 😍
4 comments
D
W