LeMoussel

When I load model [stabilityai/stablelm-

When I load model stabilityai /stablelm-2-zephyr-1 _6blike this:

Plain Text

llm = HuggingFaceLLM(
    # https://huggingface.co/stabilityai/stablelm-2-zephyr-1_6b
    model_name="stabilityai/stablelm-2-zephyr-1_6b",
    tokenizer_name="stabilityai/stablelm-2-zephyr-1_6b",
    query_wrapper_prompt=PromptTemplate("<|system|>\n\n<|user|>\n{query_str}\n<|assistant|>\n"),
    context_window=3900,
    max_new_tokens=256,
    model_kwargs={"trust_remote_code": True},
    #tokenizer_kwargs={"max_length": 2048},
    generate_kwargs={"temperature": 0.7, "top_k": 50, "top_p": 0.95, "do_sample":True},
    messages_to_prompt=messages_to_prompt,
    device_map="auto",
    # uncomment this if using CUDA to reduce memory usage
    #model_kwargs={"torch_dtype": torch.float16}
)

although I set trust_remote_code to True, I still have the question: Do you wish to run the custom code? [y/N]

Plain Text

.....

model.safetensors: 100%
3.29G/3.29G [00:39<00:00, 111MB/s]
generation_config.json: 100%
121/121 [00:00<00:00, 7.17kB/s]
tokenizer_config.json: 100%
825/825 [00:00<00:00, 37.4kB/s]

The repository for stabilityai/stablelm-2-zephyr-1_6b contains custom code which must be executed to correctly load the model. You can inspect the repository content at https://hf.co/stabilityai/stablelm-2-zephyr-1_6b.
You can avoid this prompt in future by passing the argument `trust_remote_code=True`.
**
Do you wish to run the custom code? [y/N] y**

tokenization_arcade100k.py: 100%
9.89k/9.89k [00:00<00:00, 463kB/s]

....

`
Any idea to avoid this?

1 comment

LLeMoussel

I got this `LangChainDeprecationWarning`

I got this LangChainDeprecationWarning

Plain Text

/home/dev/.local/lib/python3.10/site-packages/langchain/chat_models/__init__.py:31: LangChainDeprecationWarning: Importing chat models from langchain is deprecated. Importing from langchain will no longer be supported as of langchain==0.2.0. Please import from langchain-community instead:

`from langchain_community.chat_models import ChatAnyscale`.

To install langchain-community run `pip install -U langchain-community`.
  warnings.warn(
/home/dev/.local/lib/python3.10/site-packages/langchain/chat_models/__init__.py:31: LangChainDeprecationWarning: Importing chat models from langchain is deprecated. Importing from langchain will no longer be supported as of langchain==0.2.0. Please import from langchain-community instead:

`from langchain_community.chat_models import ChatOpenAI`.

I installed langchain-community but I mainly get this warning message

4 comments

LLeMoussel

openchat/openchat_3.5 · Hugging Face

Is it possible to use openchat /openchat _3 .5 with LlamaIndex?
How to do?

5 comments

LLeMoussel

chat agent with OSS models

Is it possible to create a chat agent (chatbot) with a local LLM (e.g. llm = LlamaCPP(...))?

With an OpenAI LLM instance, I do this

Plain Text

llm = OpenAI(model="gpt-3.5-turbo-0613")
agent = OpenAIAgent.from_tools([weather_tool], llm=llm, verbose=True)
response = agent.chat(
    "What's the weather like in San Francisco, Tokyo, and Paris?"
)

1 comment

LLeMoussel

In certain cases the answer indicated by

In certain cases the answer indicated by query_engine.query(QUERY) is truncated.
For example:
Explication : Cette scène se déroule à l'automne, puisque l'on parle de l'automne dans le texte suivant : "Un jour, elle tomba malade ; elle avait la fièvre ; elle se coucha ; elle ne se leva plus. Elle mourut en trois semaines, à l'autom
the answer should be
Explication : Cette scène se déroule à l'automne, puisque l'on parle de l'automne dans le texte suivant : "Un jour, elle tomba malade ; elle avait la fièvre ; elle se coucha ; elle ne se leva plus. Elle mourut en trois semaines, à l'automne

4 comments

LLeMoussel

How to get the completed prompt?

How to get the completed prompt?
LlamaIndex uses a set of default prompt templates.
To get the prompts from the query engine, I do this :

Plain Text

# define prompt viewing function
def display_prompt_dict(prompts_dict):
    for k, p in prompts_dict.items():
        text_md = f"**Prompt Key**: {k}<br>" f"**Text:** <br>"
        display(Markdown(text_md))
        print(p.get_template())
        display(Markdown("<br><br>"))

prompts_dict = query_engine.get_prompts()
display_prompt_dict(prompts_dict)

which gives me this view of these prompts:

Plain Text

**Prompt Key:** response_synthesizer:text_qa_template
**Text:**

Context information is below.
---------------------
{context_str}
---------------------
Given the context information and not prior knowledge, answer the query.
Query: {query_str}
Answer: 



**Prompt Key:** response_synthesizer:refine_template
**Text:**

The original query is as follows: {query_str}
We have provided an existing answer: {existing_answer}
We have the opportunity to refine the existing answer (only if needed) with some more context below.
------------
{context_msg}
------------
Given the new context, refine the original answer to better answer the query. If the context isn't useful, return the original answer.
Refined Answer:

=> Is it possible to get the completed prompt with context_msg & query_str completed/informed?

2 comments

LLeMoussel

TrafilaturaWebReader

When I try to use Trafilatura Website Loader, I get this error:

Plain Text

/usr/local/lib/python3.10/dist-packages/llama_index/readers/download.py in download_loader(loader_class, loader_hub_url, refresh_cache, use_gpt_index_import, custom_path)
    138         library = json.loads(library_raw_content)
    139         if loader_class not in library:
--> 140             raise ValueError("Loader class name not found in library")
    141 
    142         loader_id = library[loader_class]["id"]

ValueError: Loader class name not found in library

Python code:

Plain Text

from llama_index import download_loader

TrafilaturaWebReader = download_loader("TrafilaturaWebReader")

loader = TrafilaturaWebReader()
documents = loader.load_data(urls=['https://google.com'])

11 comments

LLeMoussel

Doc Summary Index

Hi! , Is it poosible to build Document Summary Index in French ?
I do this

Plain Text

response_synthesizer = get_response_synthesizer(response_mode="tree_summarize", use_async=True)
doc_summary_index = DocumentSummaryIndex.from_documents(
    [data_document],
    service_context=service_context,
    response_synthesizer=response_synthesizer,
    show_progress=True,
)
doc_summary_index.storage_context.persist("index_summary")

But the result of doc_summary_index.get_document_summary(DOC_ID) is in English not in French.
Rem: [data_document] contain text in French.

9 comments

LLeMoussel

scrapeghost

Scrapeghost ?

5 comments

LLeMoussel

Is it possible to offload GPU VRAM?

Is it possible to offload GPU VRAM?
Indeed, I load CUDA GPU with an embedding model like this :

Plain Text

    Settings.node_parser = SentenceSplitter(chunk_size=512, chunk_overlap=64)
    Settings.chunk_size = 512
    Settings.chunk_overlap = 64

    # https://huggingface.co/OrdalieTech/Solon-embeddings-large-0.1
    embeded_model_name = "OrdalieTech/Solon-embeddings-large-0.1"
    embed_model = HuggingFaceEmbedding(model_name=embeded_model_name)
    Settings.embed_model = embed_model

.....
    vector_store_index = VectorStoreIndex.from_documents(documents=documents, show_progress=True)

Then for another calculation I have to load another embedding model, but I get the CUDA Out of memory error since the previous model is still present in the GPU VRAM.

1 comment

LLeMoussel

Embeddings

With langchain It's possible to set device CPU for embedding
For example:

Plain Text

from langchain.embeddings.huggingface import HuggingFaceEmbeddings

       embedding_model = HuggingFaceEmbeddings(
            model_name="sentence-transformers/all-mpnet-base-v2", 
            model_kwargs={"device": "cpu"}, # Use CPU for embedding 
       )

I couldn't find anything in the documentation LlamaIndex HuggingFaceEmbedding (from llama_index.embeddings.huggingface import HuggingFaceEmbedding)
Is this possible with LlamaIndex HuggingFaceEmbedding ?

3 comments

LLeMoussel

Observability

Is it possible to remove handler observability ?
I set with

Plain Text

from llama_index import set_global_handler

# general usage
set_global_handler("<handler_name>", **kwargs)

But how can I do remove global handler?

3 comments

LLeMoussel

I want to generate a list of questions

I want to generate a list of questions in French.
To generate the questions, I use data_generator.generate_questions_from_nodes()
This API generates the following prompt:

Plain Text

Context information is below.
---------------------
{context_str}
---------------------
Given the context information and not prior knowledge.
generate only questions based on the below query.
{query_str}

Is it possible to define a custom prompt for question generation? How can I do this?

Ref: https://gpt-index.readthedocs.io/en/latest/examples/evaluation/QuestionGeneration.html

6 comments

LLeMoussel

I do this

I do this :

Plain Text

# https://github.com/abetlen/llama-cpp-python
# GPU llama-cpp-python
!CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python==0.1.78 --force-reinstall --upgrade --no-cache-dir --verbose

# https://github.com/run-llama/llama_index
!pip install llama-index

Plain Text

import logging
import sys

from llama_index.callbacks import CallbackManager, LlamaDebugHandler
from llama_index.llms.llama_utils import messages_to_prompt, completion_to_prompt
from llama_index.llms import LlamaCPP

logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)  # Change INFO to DEBUG if you want more extensive logging
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

llama_debug = LlamaDebugHandler(print_trace_on_end=True)
callback_manager = CallbackManager([llama_debug])

# https://gpt-index.readthedocs.io/en/stable/examples/llm/llama_2_llama_cpp.html
llm = LlamaCPP(
    model_url="https://huggingface.co/TheBloke/Mistral-7B-v0.1-GGUF/resolve/main/mistral-7b-v0.1.Q4_K_M.gguf",
    # optionally, you can set the path to a pre-downloaded model instead of model_url
    #model_path="mistral-7b-v0.1.Q4_K_M.gguf",

    temperature=0.0,
    max_new_tokens=1024,

    # llama2 has a context window of 4096 tokens, but we set it lower to allow for some wiggle room
    context_window=3900,  # note, this sets n_ctx in the model_kwargs below, so you don't need to pass it there.

    # kwargs to pass to __call__()
    generate_kwargs={},

    # kwargs to pass to __init__()
    # set to at least 1 to use GPU
    model_kwargs={"n_gpu_layers": 1},

    # transform inputs into Llama2 format
    messages_to_prompt=messages_to_prompt,
    completion_to_prompt=completion_to_prompt,
    verbose=True,
)

3 comments

Find answers from the community

When I load model [stabilityai/stablelm-

I got this `LangChainDeprecationWarning`

openchat/openchat_3.5 · Hugging Face

chat agent with OSS models

In certain cases the answer indicated by

**How to get the completed prompt?**

TrafilaturaWebReader

Doc Summary Index

scrapeghost

Is it possible to offload GPU VRAM?

Embeddings

Observability

I want to generate a list of questions

I do this

How to get the completed prompt?