Find answers from the community

c
codydh
Offline, last seen 3 months ago
Joined September 25, 2024
The example of using the Replicate API with LLaMa2 uses a completion_to_prompt and messages_to_prompt, but those using a HuggingFaceLLM() seem to use a system_prompt and query_wrapper_prompt. How do I migrate from the former to the latter correctly?
1 comment
L
I'm following along with the Custom LLM guide here: https://gpt-index.readthedocs.io/en/latest/examples/customization/llms/SimpleIndexDemo-Huggingface_stablelm.html
I've replicated the code locally, but I keep getting the error:
"ValueError: The current device_map had weights offloaded to the disk. Please provide an offload_folder for them. Alternatively, make sure you have safetensors installed if the model you are using offers the weights in this format."

I'm having trouble troubleshooting this in this context, any advice?
3 comments
c
L
Hello! I'm wondering if there's any way to use LlamaIndex with the Hugging Face Inference API, for example to use falcon-7b-instruct?
1 comment
L
c
codydh
·

Custom llms

(For more context, I have figured out how to load GPT4All using from langchain.llms import GPT4All and chat with it directly, but it seems index = GPTVectorStoreIndex.from_documents(documents, service_context=service_context) is specifically for OpenAI?)
12 comments
c
B
D
L
Is there a good way to find a list or suggestions of non-OpenAI models that would work best for something like LlamaIndex? The only resource I've found is https://chat.lmsys.org/?leaderboard but I'm not sure if there are better/more specific resources?
2 comments
c
L
I'm attempting to use:
Plain Text
llm = HuggingFaceLLM(
...
tokenizer_name="meta-llama/Llama-2-7b-chat-hf",
model_name="meta-llama/Llama-2-7b-chat-hf",
...)


But I keep getting an error: "ValueError: Need either a state_dict or a save_folder containing offloaded weights.". I've tried specifying an empty save_folder right in the HuggingFaceLLM() call, but that's an unexpected keyword, and I've also tried adding it to generate_kwargs={} and tokenizer_kwargs={} without success. I suspect it's not just looking for a blank folder, either. Any ideas?
13 comments
L
c
Is there yet an example of running LLaMa-2 (-7B-chat) using an interface with something like llama.cpp vs. the Replicate API or the HuggingFace local interface (which seems slow)?
48 comments
a
L
c
Sorry, I've got one more question today. Using the example where llm = HuggingFaceLLM(), where do I set trust_remote_code?
2 comments
c
L
Are there any good examples of using llama_index with a model on the Hugging Face Inference API? I know I'll load the model using llm = HuggingFaceHub(...), but (a) I seem to still need a local embedding model? and (b) Even when I use a local embedding model, I get "Empty Response" in an app where using llm = GPT4All(...) works well.
2 comments
c
L
c
codydh
·

Huggingface

Does anyone have experience using Llama Index with Dolly v2? I think the best way is to use HuggingFacePipeline.from_model_id() alongisde HuggingFaceEmbeddings() and pass that as the ServiceContext to a GPTVectorStoreIndex.from_documents().as_query_engine(), but I'm getting a few lines of sensible responding followed by a bunch of repetition and nonsense. Not sure if I just need to tweak parameters and response length, or if I'm producing Frankenstein's Monster here.
3 comments
c
L
c
codydh
·

Chat

Another day, another question! I've got a pretty cool gradio interface set up with my own index and a QuestionAnswerPrompt against the index.as_query_engine. However, it seems that each question is distinct (e.g., there's no continuity from message to message). Is there a way to begin with the prompt, and then ask follow-up questions in this context? A la ChatGPT?
3 comments
c
L
Ok, seems for v0.6.0 the Starter Tutorial doesn't work? ImportError: cannot import name 'GPTSimpleVectorIndex'
3 comments
L
c