bszaniecki

Retrieving Chunks as ChatResponse Objects from an Agent

Hello there!
I'm experiencing an issue - i want to retrieve chunks in form of ChatResponse objects from an agent.
i did a following:

Plain Text

response_generator = self.agent.stream_chat(message=messages[-1].content, chat_history=messages[:-1]).chat_stream
                for token in response_generator:
                    yield token

but i'm getting:

Plain Text

ValueError: generator already executing

when using response_gen instead of chat_stream it's working flawless. However, i truly need that ChatResponse objects

9 comments

bbszaniecki

Hi guys, haven't been using llama index

Hi guys, haven't been using llama index for a while but all of a sudden my basic script for launching self hosted llm stopped working. Here's the thing:

Plain Text

def run_llm_old(system_prompt, model_name='meta-llama/Llama-2-13b-chat-hf',
                quantization_config=None):
    query_wrapper_prompt = PromptTemplate(
        "[INST]<<SYS>>\n" + system_prompt + "<</SYS>>\n\n{query_str}[/INST] "
    )
    if quantization_config:
        model_kwargs = {"quantization_config": quantization_config}
    else:
        model_kwargs = {"torch_dtype": torch.float16}
    logging.info(f"Configuring model {model_name}")
    llm = HuggingFaceLLM(
        model_name=model_name,
        tokenizer_name=model_name,
        context_window=8192,
        max_new_tokens=1024,
        model_kwargs=model_kwargs,
        query_wrapper_prompt=query_wrapper_prompt,
        generate_kwargs={"temperature": 0.35, "top_k": 40, "top_p": 0.9, "do_sample": True},
        device_map="auto",
    )
    logging.info(f"LLM {model_name} configured successfully")
    return llm

This function was working flawless few months ago, now when using logging.basicConfig(level=logging.DEBUG, format='%(asctime)s - %(levelname)s - %(message)s')
it only comes to this point:

Plain Text

2024-06-21 10:14:51,099 - INFO - Configuring model meta-llama/Llama-2-13b-chat-hf
2024-06-21 10:14:51,101 - DEBUG - Starting new HTTPS connection (1): huggingface.co:443
2024-06-21 10:14:51,398 - DEBUG - https://huggingface.co:443 "HEAD /meta-llama/Llama-2-13b-chat-hf/resolve/main/config.json HTTP/1.1" 200 0

and gets stuck without loading model shards. When removing whole model directory shards are being downloaded but its stuck after:

Plain Text

model-00003-of-00003.safetensors: 100%|[...]|
Downloading shards: 100%|[...]|

1 comment

bbszaniecki

Hi guys!

Hi guys!
Maybe a bit stupid question but do i need to use the same embbeding model for indexing and retrival? Or maybe it would be possible to index data using some large model like Alibaba-NLP/gte-Qwen2-7B-instruct and then use BAAI/bge-large-en-v1.5 for creating query_engine from vector index?

5 comments

bbszaniecki

Hi! I have an issue loading data from

Hi! I have an issue loading data from previously used Milvus.
I've load the data using:

Plain Text

documents = SimpleDirectoryReader("/home/eouser/lynx/example_data").load_data()
service_context = ServiceContext.from_defaults(llm=llm, embed_model="local:WhereIsAI/UAE-Large-V1")
vector_store = MilvusVectorStore(uri='http://<IP>:19530', dim=1024)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
vector_index = VectorStoreIndex.from_documents(documents, service_context=service_context, storage_context=storage_context)

And it works just fine. Now i want to load previously stored data by replacing

Plain Text

vector_index = VectorStoreIndex.from_documents(documents, service_context=service_context, storage_context=storage_context)

with

Plain Text

vector_index = VectorStoreIndex.from_vector_store(vector_store=vector_store, service_context=service_context,
                                               storage_context=storage_context)

The issue it that I'm getting either:

Plain Text

AttributeError: 'Response' object has no attribute 'print_response_stream'

when trying to stream response or

Plain Text

Empty Response

when trying to print response. What would be the proper way to do so?

1 comment

Find answers from the community

Retrieving Chunks as ChatResponse Objects from an Agent

Hi guys, haven't been using llama index

Hi guys!

Hi! I have an issue loading data from