anjin

Generating a Poem from Content Using Structured LLM

trying to run this code for structured output

from llama_index.core.llms import ChatMessage

sllm = llm.as_structured_llm(output_cls=Content)
input_msg = ChatMessage.from_str(f"Generate a poem from the content: {content}")

response = sllm.chat(input_msg)

and getting this error
Input should be a valid dictionary or instance of ChatMessage
i checked the input_msg and it is a ChatMessage object

3 comments

aanjin

Assigning System Prompt to a ReAct Agent

hi what would be the best way to assign system prompt to a ReAct agent when I instantiate it? thank you!

1 comment

aanjin

Nvidia nvlm support

do we have support for the new NVLM from NVIDIA?

7 comments

aanjin

what are the ways I can let an LLM run the code it generated. like in AutoGen they have LocalCommandLineCodeExecutor, DockerCommandLineCodeExecutor and JupyterCodeExecutor
https://microsoft.github.io/autogen/docs/tutorial/code-executors/

1 comment

aanjin

so trying to run a simple call to OpenAI

so trying to run a simple call to OpenAI API inside a workflow. however the query always return the same result even though I want to be random
llm = OpenAI(model="gpt-4o-mini", temperature=0.8)
response = await llm.acomplete("Pick a random word. ONLY RETURN THE WORD")
how can I turn off this behavior and get a new word each time this run?

3 comments

aanjin

hi guys so I am trying to use the new

hi guys so I am trying to use the new Command R 4 bit model with LlamaIndex. my machine uses the model just fine using transformers code from HF, but when I tried to wrap it in LlamaIndex it is giving OOM
this is my LlamaIndex code

Plain Text

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.llms.huggingface import HuggingFaceLLM
from llama_index.core import Settings
from llama_index.core import PromptTemplate
import torch

# # This will wrap the default prompts that are internal to llama-index
query_wrapper_prompt = PromptTemplate("<BOS_TOKEN><|START_OF_TURN_TOKEN|><|USER_TOKEN|>{query_str}<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|>")

llm = HuggingFaceLLM(
    context_window=16384,
    max_new_tokens=4096,
    generate_kwargs={"temperature": 0.7, "do_sample": True},
    query_wrapper_prompt=query_wrapper_prompt,
    tokenizer_name="CohereForAI/c4ai-command-r-v01-4bit",
    model_name="CohereForAI/c4ai-command-r-v01-4bit",
    device_map="auto",
    # tokenizer_kwargs={"max_length": 4096},
    # uncomment this if using CUDA to reduce memory usage
    # model_kwargs={"torch_dtype": torch.float16}
)

Settings.llm = llm
Settings.chunk_size = 1024
documents = SimpleDirectoryReader('data').load_data()
index = VectorStoreIndex.from_documents(documents)

query_engine = index.as_query_engine()
print(query_engine.query("Could you summarize the given context in 3 paragraphs? Return your response which covers the key points of the text and does not miss anything important, please."))

the error message

Plain Text

ValueError: 
                    Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to fit the
                    quantized model. If you want to dispatch the model on the CPU or the disk while keeping these modules
                    in 32-bit, you need to set `load_in_8bit_fp32_cpu_offload=True` and pass a custom `device_map` to
                    `from_pretrained`.

12 comments

aanjin

So I want to ask for advice on 2 related

So I want to ask for advice on 2 related topics:

If I have a corpus of many documents embedded in a vector store, how can I dynamically select (by metadata, for example) a subset of them and only perform retrieval on that subset for answer generation.

I want LLaMa to be able to say I DO NOT KNOW if the context it retrieved cannot answer the question. This behavior is not stable yet from what I have seen.

Thank you so much!

1 comment

aanjin

hi guys do you know if LlamaIndex has an

hi guys do you know if LlamaIndex has an equivalence to PandasAgent or CSVAgent of Langchain? I have a big csv file with both numerical and free text data in them and want to chat with it. Examples questions would be something like: 'Tell me about all research trends in cancer therapies published in the last month'. In other words the agent would need to do some numerical reasoning (filtering research by date) and then text synthesis (look at the filtered studies and write a summary).

1 comment

Find answers from the community

Generating a Poem from Content Using Structured LLM

Assigning System Prompt to a ReAct Agent

Nvidia nvlm support

Code Executors | AutoGen

so trying to run a simple call to OpenAI

hi guys so I am trying to use the new

So I want to ask for advice on 2 related

hi guys do you know if LlamaIndex has an