Find answers from the community

a
anjin
Offline, last seen 2 months ago
Joined September 25, 2024
trying to run this code for structured output
from llama_index.core.llms import ChatMessage sllm = llm.as_structured_llm(output_cls=Content) input_msg = ChatMessage.from_str(f"Generate a poem from the content: {content}") response = sllm.chat(input_msg)

and getting this error
Input should be a valid dictionary or instance of ChatMessage
i checked the input_msg and it is a ChatMessage object
3 comments
L
hi what would be the best way to assign system prompt to a ReAct agent when I instantiate it? thank you!
1 comment
L
do we have support for the new NVLM from NVIDIA?
7 comments
L
a
what are the ways I can let an LLM run the code it generated. like in AutoGen they have LocalCommandLineCodeExecutor, DockerCommandLineCodeExecutor and JupyterCodeExecutor
https://microsoft.github.io/autogen/docs/tutorial/code-executors/
1 comment
L
so trying to run a simple call to OpenAI API inside a workflow. however the query always return the same result even though I want to be random
llm = OpenAI(model="gpt-4o-mini", temperature=0.8)
response = await llm.acomplete("Pick a random word. ONLY RETURN THE WORD")
how can I turn off this behavior and get a new word each time this run?
3 comments
L
a
hi guys so I am trying to use the new Command R 4 bit model with LlamaIndex. my machine uses the model just fine using transformers code from HF, but when I tried to wrap it in LlamaIndex it is giving OOM
this is my LlamaIndex code
Plain Text
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.llms.huggingface import HuggingFaceLLM
from llama_index.core import Settings
from llama_index.core import PromptTemplate
import torch

# # This will wrap the default prompts that are internal to llama-index
query_wrapper_prompt = PromptTemplate("<BOS_TOKEN><|START_OF_TURN_TOKEN|><|USER_TOKEN|>{query_str}<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|>")

llm = HuggingFaceLLM(
    context_window=16384,
    max_new_tokens=4096,
    generate_kwargs={"temperature": 0.7, "do_sample": True},
    query_wrapper_prompt=query_wrapper_prompt,
    tokenizer_name="CohereForAI/c4ai-command-r-v01-4bit",
    model_name="CohereForAI/c4ai-command-r-v01-4bit",
    device_map="auto",
    # tokenizer_kwargs={"max_length": 4096},
    # uncomment this if using CUDA to reduce memory usage
    # model_kwargs={"torch_dtype": torch.float16}
)

Settings.llm = llm
Settings.chunk_size = 1024
documents = SimpleDirectoryReader('data').load_data()
index = VectorStoreIndex.from_documents(documents)

query_engine = index.as_query_engine()
print(query_engine.query("Could you summarize the given context in 3 paragraphs? Return your response which covers the key points of the text and does not miss anything important, please."))

the error message
Plain Text
ValueError: 
                    Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to fit the
                    quantized model. If you want to dispatch the model on the CPU or the disk while keeping these modules
                    in 32-bit, you need to set `load_in_8bit_fp32_cpu_offload=True` and pass a custom `device_map` to
                    `from_pretrained`.
12 comments
L
a
So I want to ask for advice on 2 related topics:

  1. If I have a corpus of many documents embedded in a vector store, how can I dynamically select (by metadata, for example) a subset of them and only perform retrieval on that subset for answer generation.
  1. I want LLaMa to be able to say I DO NOT KNOW if the context it retrieved cannot answer the question. This behavior is not stable yet from what I have seen.
Thank you so much!
1 comment
T
hi guys do you know if LlamaIndex has an equivalence to PandasAgent or CSVAgent of Langchain? I have a big csv file with both numerical and free text data in them and want to chat with it. Examples questions would be something like: 'Tell me about all research trends in cancer therapies published in the last month'. In other words the agent would need to do some numerical reasoning (filtering research by date) and then text synthesis (look at the filtered studies and write a summary).
1 comment
T