Find answers from the community

E
Emre
Offline, last seen 3 months ago
Joined September 25, 2024
Hello, I have been following up this tutorial: https://gpt-index.readthedocs.io/en/latest/examples/llm/llama_2_llama_cpp.html. I have a problem, the query function takes extremely long time. (like 8-10 mins). I know that this is common problem when it's come to llamacpp but llamacpp work pretty ok with just prompting and answering. Problem starts with qa, indexing, embedding and so on. I can share my code as well if needed. Any help is appreciated.
41 comments
a
E
L
Hello, I am working on a project for document qa. But the thing is I need to do it with multilingual supported llm and embedding. So I am using hugging face embedding and llm but my outputs are complete nonsense. Can you help me?
from llama_index import VectorStoreIndex, ServiceContext
from llama_index import PromptHelper
from llama_index.llms import HuggingFaceLLM
from langchain.embeddings.huggingface import HuggingFaceBgeEmbeddings

embed_model = HuggingFaceBgeEmbeddings(model_name='sentence-transformers/all-MiniLM-L6-v2')

import torch

llm = HuggingFaceLLM(
tokenizer_name="bigscience/bloomz-1b7",
model_name="bigscience/bloomz-1b7",
model_kwargs={"load_in_8bit": True,"torch_dtype": torch.float16},
generate_kwargs={
"do_sample": True,
"top_k": 4,
"penalty_alpha": 0.6,
}
)

prompt_helper = PromptHelper(context_window=512, chunk_size_limit=200, num_output=100)
service_context = ServiceContext.from_defaults(llm=llm, embed_model=embed_model, prompt_helper=prompt_helper)
index = VectorStoreIndex(nodes=nodes, service_context=service_context)
query_engine = index.as_query_engine()
6 comments
L
E