Find answers from the community

Home
Members
bszaniecki
b
bszaniecki
Offline, last seen 3 months ago
Joined September 25, 2024
Hi guys, haven't been using llama index for a while but all of a sudden my basic script for launching self hosted llm stopped working. Here's the thing:
Plain Text
def run_llm_old(system_prompt, model_name='meta-llama/Llama-2-13b-chat-hf',
                quantization_config=None):
    query_wrapper_prompt = PromptTemplate(
        "[INST]<<SYS>>\n" + system_prompt + "<</SYS>>\n\n{query_str}[/INST] "
    )
    if quantization_config:
        model_kwargs = {"quantization_config": quantization_config}
    else:
        model_kwargs = {"torch_dtype": torch.float16}
    logging.info(f"Configuring model {model_name}")
    llm = HuggingFaceLLM(
        model_name=model_name,
        tokenizer_name=model_name,
        context_window=8192,
        max_new_tokens=1024,
        model_kwargs=model_kwargs,
        query_wrapper_prompt=query_wrapper_prompt,
        generate_kwargs={"temperature": 0.35, "top_k": 40, "top_p": 0.9, "do_sample": True},
        device_map="auto",
    )
    logging.info(f"LLM {model_name} configured successfully")
    return llm

This function was working flawless few months ago, now when using logging.basicConfig(level=logging.DEBUG, format='%(asctime)s - %(levelname)s - %(message)s')
it only comes to this point:
Plain Text
2024-06-21 10:14:51,099 - INFO - Configuring model meta-llama/Llama-2-13b-chat-hf
2024-06-21 10:14:51,101 - DEBUG - Starting new HTTPS connection (1): huggingface.co:443
2024-06-21 10:14:51,398 - DEBUG - https://huggingface.co:443 "HEAD /meta-llama/Llama-2-13b-chat-hf/resolve/main/config.json HTTP/1.1" 200 0

and gets stuck without loading model shards. When removing whole model directory shards are being downloaded but its stuck after:
Plain Text
model-00003-of-00003.safetensors: 100%|[...]|
Downloading shards: 100%|[...]|
1 comment
W
b
bszaniecki
·

Hi guys!

Hi guys!
Maybe a bit stupid question but do i need to use the same embbeding model for indexing and retrival? Or maybe it would be possible to index data using some large model like Alibaba-NLP/gte-Qwen2-7B-instruct and then use BAAI/bge-large-en-v1.5 for creating query_engine from vector index?
6 comments
b
L
P
Hi! I have an issue loading data from previously used Milvus.
I've load the data using:
Plain Text
documents = SimpleDirectoryReader("/home/eouser/lynx/example_data").load_data()
service_context = ServiceContext.from_defaults(llm=llm, embed_model="local:WhereIsAI/UAE-Large-V1")
vector_store = MilvusVectorStore(uri='http://<IP>:19530', dim=1024)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
vector_index = VectorStoreIndex.from_documents(documents, service_context=service_context, storage_context=storage_context)

And it works just fine. Now i want to load previously stored data by replacing
Plain Text
vector_index = VectorStoreIndex.from_documents(documents, service_context=service_context, storage_context=storage_context)

with
Plain Text
vector_index = VectorStoreIndex.from_vector_store(vector_store=vector_store, service_context=service_context,
                                               storage_context=storage_context)

The issue it that I'm getting either:
Plain Text
AttributeError: 'Response' object has no attribute 'print_response_stream'

when trying to stream response or
Plain Text
Empty Response

when trying to print response. What would be the proper way to do so?
1 comment
T