codydh

The example of using the Replicate API

The example of using the Replicate API with LLaMa2 uses a completion_to_prompt and messages_to_prompt, but those using a HuggingFaceLLM() seem to use a system_prompt and query_wrapper_prompt. How do I migrate from the former to the latter correctly?

1 comment

ccodydh

I m following along with the Custom LLM

I'm following along with the Custom LLM guide here: https://gpt-index.readthedocs.io/en/latest/examples/customization/llms/SimpleIndexDemo-Huggingface_stablelm.html
I've replicated the code locally, but I keep getting the error:
"ValueError: The current device_map had weights offloaded to the disk. Please provide an offload_folder for them. Alternatively, make sure you have safetensors installed if the model you are using offers the weights in this format."

I'm having trouble troubleshooting this in this context, any advice?

3 comments

ccodydh

Huggingface inference

Hello! I'm wondering if there's any way to use LlamaIndex with the Hugging Face Inference API, for example to use falcon-7b-instruct?

1 comment

ccodydh

Custom llms

(For more context, I have figured out how to load GPT4All using from langchain.llms import GPT4All and chat with it directly, but it seems index = GPTVectorStoreIndex.from_documents(documents, service_context=service_context) is specifically for OpenAI?)

12 comments

ccodydh

Chat with Open Large Language Models

Is there a good way to find a list or suggestions of non-OpenAI models that would work best for something like LlamaIndex? The only resource I've found is https://chat.lmsys.org/?leaderboard but I'm not sure if there are better/more specific resources?

2 comments

ccodydh

I m attempting to use

I'm attempting to use:

Plain Text

llm = HuggingFaceLLM(
...
tokenizer_name="meta-llama/Llama-2-7b-chat-hf",
model_name="meta-llama/Llama-2-7b-chat-hf",
...)

But I keep getting an error: "ValueError: Need either a state_dict or a save_folder containing offloaded weights.". I've tried specifying an empty save_folder right in the HuggingFaceLLM() call, but that's an unexpected keyword, and I've also tried adding it to generate_kwargs={} and tokenizer_kwargs={} without success. I suspect it's not just looking for a blank folder, either. Any ideas?

13 comments

ccodydh

Is there yet an example of running LLaMa

Is there yet an example of running LLaMa-2 (-7B-chat) using an interface with something like llama.cpp vs. the Replicate API or the HuggingFace local interface (which seems slow)?

48 comments

ccodydh

Trust remote code

Sorry, I've got one more question today. Using the example where llm = HuggingFaceLLM(), where do I set trust_remote_code?

2 comments

ccodydh

Are there any good examples of using

Are there any good examples of using llama_index with a model on the Hugging Face Inference API? I know I'll load the model using llm = HuggingFaceHub(...), but (a) I seem to still need a local embedding model? and (b) Even when I use a local embedding model, I get "Empty Response" in an app where using llm = GPT4All(...) works well.

2 comments

ccodydh

Huggingface

Does anyone have experience using Llama Index with Dolly v2? I think the best way is to use HuggingFacePipeline.from_model_id() alongisde HuggingFaceEmbeddings() and pass that as the ServiceContext to a GPTVectorStoreIndex.from_documents().as_query_engine(), but I'm getting a few lines of sensible responding followed by a bunch of repetition and nonsense. Not sure if I just need to tweak parameters and response length, or if I'm producing Frankenstein's Monster here.

3 comments

ccodydh

Chat

Another day, another question! I've got a pretty cool gradio interface set up with my own index and a QuestionAnswerPrompt against the index.as_query_engine. However, it seems that each question is distinct (e.g., there's no continuity from message to message). Is there a way to begin with the prompt, and then ask follow-up questions in this context? A la ChatGPT?

3 comments

ccodydh

Starter tutorial

Ok, seems for v0.6.0 the Starter Tutorial doesn't work? ImportError: cannot import name 'GPTSimpleVectorIndex'

3 comments

Find answers from the community

The example of using the Replicate API

I m following along with the Custom LLM

Huggingface inference

Custom llms

Chat with Open Large Language Models

I m attempting to use

Is there yet an example of running LLaMa

Trust remote code

Are there any good examples of using

Huggingface

Chat

Starter tutorial