Digital Rally

DDigital Rally

google/frames-benchmark · Datasets at Hu...

a new RAG Benchmark https://huggingface.co/datasets/google/frames-benchmark
has anyone ever heard of oracle retrieval?

4 comments

DDigital Rally

just fyi there are a few None on the

just fyi there are a few None on the Documentation Examples:

1 comment

DDigital Rally

@Logan M im currently trying to switch

@Logan M im currently trying to switch from llama.cpp to Ollama but the same model give me different responses. The output from Llama.cpp is correct and in the right language. The output from Ollama is wrong and sometimes in the wrong language... I have also talked to the Ollama community but we have no solution so far.... maybe it has to do with the implementation in llama index?

I have already compared all the settings i could find.
I can provide you with whatever infos you need.

We could (from my viewpoint) greatly increase the quality of ollama if we could find out what is different.

6 comments

DDigital Rally

i get an error when i start my llama index app via fastapi and uvicorn and make parallel r

i get an error when i start my llama index app via fastapi and uvicorn and make parallel requests on its endpoint.

10 comments

DDigital Rally

am i blind or is there no way to specify

am i blind or is there no way to specify the grammar path (for json) when using LLamaCPP?:

Plain Text

class Llama:
    def __init__(
        n_gpu_layers: int = 0,
        split_mode: int = llama_cpp.LLAMA_SPLIT_MODE_LAYER,
        main_gpu: int = 0,
        tensor_split: Optional[List[float]] = None,
        vocab_only: bool = False,
        use_mmap: bool = True,
        use_mlock: bool = False,
        kv_overrides: Optional[Dict[str, Union[bool, int, float, str]]] = None,
        # Context Params
        seed: int = llama_cpp.LLAMA_DEFAULT_SEED,
        n_ctx: int = 512,
        n_batch: int = 512,
        n_threads: Optional[int] = None,
        n_threads_batch: Optional[int] = None,
        rope_scaling_type: Optional[int] = llama_cpp.LLAMA_ROPE_SCALING_TYPE_UNSPECIFIED,
        pooling_type: int = llama_cpp.LLAMA_POOLING_TYPE_UNSPECIFIED,
        rope_freq_base: float = 0.0,
        rope_freq_scale: float = 0.0,
        yarn_ext_factor: float = -1.0,
        yarn_attn_factor: float = 1.0,
        yarn_beta_fast: float = 32.0,
        yarn_beta_slow: float = 1.0,
        yarn_orig_ctx: int = 0,
        logits_all: bool = False,
        embedding: bool = False,
        offload_kqv: bool = True,
        flash_attn: bool = False,
        # Sampling Params
        last_n_tokens_size: int = 64,
        # LoRA Params
        lora_base: Optional[str] = None,
        lora_scale: float = 1.0,
        lora_path: Optional[str] = None,
        # Backend Params
        numa: Union[bool, int] = False,
        # Chat Format Params
        chat_format: Optional[str] = None,
        chat_handler: Optional[llama_chat_format.LlamaChatCompletionHandler] = None,
        # Speculative Decoding
        draft_model: Optional[LlamaDraftModel] = None,
        # Tokenizer Override
        tokenizer: Optional[BaseLlamaTokenizer] = None,
    ):

6 comments

DDigital Rally

llamacloud-demo/examples/advanced_rag/co...

Hello, im following this notebook: https://github.com/run-llama/llamacloud-demo/blob/main/examples/advanced_rag/corrective_rag_workflow.ipynb
i want to run it fully locally... what is the local equivalent of tavily?

Plain Text

# If any document is found irrelevant, transform the query string for better search results.
        if "no" in relevancy_results:
            prompt = DEFAULT_TRANSFORM_QUERY_TEMPLATE.format(query_str=query_str)
            result = self.llm.complete(prompt)
            transformed_query_str = result.text

            # Conduct a search with the transformed query string and collect the results.
            search_results = self.tavily_tool.search(
                transformed_query_str, max_results=5
            )
            search_text = "\n".join([result.text for result in search_results])
        else:
            search_text = ""

3 comments

DDigital Rally

ollama/docs/modelfile.md at main · ollam...

Is it Possible to use a custom Model File when using Ollama? https://github.com/ollama/ollama/blob/main/docs/modelfile.md

2 comments

DDigital Rally

any idea when the new gemma2 will be

any idea when the new gemma2 will be working?

Plain Text

llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'gemma2'

cant wait to test it out 😍

4 comments

Find answers from the community

google/frames-benchmark · Datasets at Hu...

just fyi there are a few None on the

@Logan M im currently trying to switch

i get an error when i start my llama index app via fastapi and uvicorn and make parallel r

am i blind or is there no way to specify

llamacloud-demo/examples/advanced_rag/co...

ollama/docs/modelfile.md at main · ollam...

any idea when the new gemma2 will be